[MPlayer-dev-eng] kernel: to do_select(pipe) or not to do

Mon Nov 22 04:25:12 CET 2004

Hi,
thank you for your quick reply.
I am sending this mail both to mplayer-dev maillist and Cc to you.
That's why it contains both your letter. It would be good if we
keep threaded mails in mplayer-dev maillist.

Somehow I expected your explanetion as it follow the path of less
resistance. More comments bellow.

On Sat, 20 Nov 2004 17:18:17 -0800 (PST)
Linus Torvalds <torvalds at osdl.org> wrote:

> Btw, feel free to forward this to the mplayer lists, so that the
> people on that list can learn how nonblocking pipes are supposed to be
> used.
> 
> On Sun, 21 Nov 2004, Ivan Kalvachev wrote:
> > 
> > Create e pipe, and write one (in this case 4) byte in it.
> > If you use select() to see if the another write will block
> > (in other words the pipe is full), you will get report that
> > it will. Actually the pipe is not really full and you can write
> > another symbol. The real block will occure when the pipe is really
> > full that should be about 4096 bytes.
> 
> That's because the pipe _will_ block under certain circumstances,
> because pipes have special magic rules for writes <= PIPE_SIZE, namely
> that they get written atomically. In particular, that means that since
> "select()"  has no way to know _how_ big the write will be, it will
> have to assume that the thing will block due to PIPE_SIZE guarantees.
> 
> The only case that isn't true is when the pipe is empty: that's when
> even a full write is guaranteed to not block.

You are very wrong at this point. A full write WILL block if it writes
an PIPE_SIZE+1 bytes. And it have nothing to do with select()!

> 
> IOW, it's not a bug.
> 
> Also, your program is ridiculous. Nobody should ever use "select()" on
> a blocking pipe anyway, you should makr the file descriptor
> non-blocking, and use select only after getting a EAGAIN on the write.
> In other words, your problem is _entirely_ due to your mis-use of
> basic UNIX file descriptor behaviour.

This code is not mine;) The author may flame you separately;)

> > anyway, make into account that select() documentation states:
> > 
> > "The way that select is usually used is to block while waiting for 
> > a "change  of status"  on one or more of the file descriptors. A
> > "change of status" is when .... ... space  becomes available within
> > the kernel's internal buffers for more to be written to the file
> > descriptor..."
> 
> Exactly. Here is a silly example of how you can use select _properly_.
> Namely put the file descriptors into nonblocking mode, and then always
> do the actual IO unconditionally - and use select only when that IO
> has been shown to not have worked. This way you can do batched IO (ie
> you do lots of writes) and only when it _fails_ do you actually end up
> sleeping.
> 
> NOTE! This is also _exactly_ why select() only returns an OK on write
> when the pipe is empty. Because with this kind of approach, where the
> writer sends "packets" of data, the next write may be a full-sized
> one, and if select returns "you can write" even though a full-sized
> packet wouldn't actually fit in the pipe, then a program like this
> (which is _correctly_ written) would busy-loop, trying to do a big
> write, failing, doing a select, saying "ok, you can write", then going
> back to trying to write, and failing again. Only when the pipe has
> room for PIPE_SIZE bytes can select say for _sure_ that the write will
> succeed.
> 
> 		Linus

No need to send me samples, I have not send you the patch that I commit
few weeks back, and it does indeed use O_NONBLOCK.

But there is an fundamental problem with select() and pipes (not sure
for other types). We don't know how big will be the write! So in case of
1 byte it does too paranoid and unnecessary blocking. In case of big
write it will block.

> -----
> #include <sys/select.h>
> #include <unistd.h>
> #include <fcntl.h>
> #include <stdio.h>
> #include <errno.h>
> #include <stdlib.h>
> 
> static void pipe_write(int to)
> {
> 	int i;
> 	fd_set wait;
> 
> 	FD_ZERO(&wait);
> 	for (i = 0; i < 1000000; i++) {
> 		for (;;) {
> 			int ret;
> 			/* POSIX guarantees that this write is atomic */
> 			ret = write(to, &i, sizeof(i));
> 			if (!ret) {
> 				fprintf(stderr, "Out of space?");
> 				exit(1);
> 			}
> 			if (ret > 0)
> 				break;
> 
> 			if (errno != EAGAIN) {
> 				perror("write");
> 				exit(1);
> 			}
> 			printf("write waiting at count %d\n", i);
> 			FD_SET(to, &wait);
> 			select(to+1, NULL, &wait, NULL, NULL);
> 		}
> 	}
> }
> 
> static void pipe_read(int from)
> {
> 	int i = 0;
> 	int array[1000];
> 	fd_set wait;
> 
> 	FD_ZERO(&wait);
> 	for (;;) {
> 		int ret = read(from, &array, sizeof(array));
> 
> 		/* EOF? */
> 		if (!ret)
> 			break;
> 		if (ret > 0) {
> 			i += ret / sizeof(int);
> 			continue;
> 		}
> 		if (errno != EAGAIN) {
> 			perror("read");
> 			exit(1);
> 		}
> 		FD_SET(from, &wait);
> 		select(from+1, &wait, NULL, NULL, NULL);
> 		printf("read waited at count %d\n", i);
> 
> 		/* Simulate some work-related pausing.. */
> 		usleep(100);
> 	}
> 	printf("Got %d integers\n", i);
> }		
> 
> int main(int argc, char **argv)
> {
> 	int fd[2];
> 	pid_t pid;
> 
> 	if (pipe(fd) < 0) {
> 		perror("pipe");
> 		exit(1);
> 	}
> 
> 	/* Make the pipes nonblocking */
> 	fcntl(fd[0], F_SETFL, O_NONBLOCK);
> 	fcntl(fd[1], F_SETFL, O_NONBLOCK);
> 
> 	pid = fork();
> 	if (pid < 0) {
> 		perror("fork");
> 		exit(1);
> 	}
> 
> 	if (!pid) {
> 		/* I'm the child - I'll do reading */
> 		close(fd[1]);
> 		pipe_read(fd[0]);
> 		exit(0);
> 	}
> 
> 	/* I'm the parent - I'll do writing */
> 	close(fd[0]);
> 	pipe_write(fd[1]);
> 	exit(0);
> }

On Sat, 20 Nov 2004 17:25:53 -0800 (PST)
Linus Torvalds <torvalds at osdl.org> wrote:

> 
> 
> On Sat, 20 Nov 2004, Linus Torvalds wrote:
> > 
> > The only case that isn't true is when the pipe is empty: that's when
> > even a full write is guaranteed to not block.
> 
> Side note: we _could_ make the pipe buffers be bigger than PIPE_BUF,
> at which point we could return true whenever the free buffer space is
> bigger or equal to PIPE_BUF. As it is, PIPE_BUF is also the exact size
> of the kernel buffer (that's the traditional behaviour), so the only
> case this is true is when the pipe is empty.
> 
> I've considered doing that a few times, but the fact is that it only
> makes it easier to write broken programs. So even if I ever decide to
> allow having more than PIPE_BUF in flight, the current behaviour is
> the _correct_ one, and mplayer should be fixed if it has problems with
> it.
> 
> 		Linus

Well the current behaviour is not correct. As I explained, if select()
blocks after first write then it should also block at empty pipe. 
This is because bigger than PIPE_BUF write will block, too. 
Probably forbidding usage on pipe at select() would be better as it
will bomb immediately. 
The current behaviour is also very badly documented and predispose to
bad usage.

I think it is reasonable behavior to resize the pipe buffer so it
could contain fully the last write (the one that fills the buffer). Then
select() may block.
I'm not sure if this adds security risk of DoS (memory eating), but if
this is so you can implement the following paranoid workaround: 
Just write in the documentation that if select() says it won't block,
this means that the buffer is less than half empty and could be
written PIPE_BUF/2 bytes without blocking. You could also increase the
size to 8192 so current behavior will be preserved.

No mater if you do fix kernel or not, but select() documentation
should be updated !! !! !! !! !! (I'm sure they will listen to you)

BTW there is another way to set nonblocking pipe, this time using
O_NDELAY, I could find any particular documentation stating the
difference between both methods (probably that on O_NDELAY write()
will return 0 instead of -1 error). I was wondering how portables are
these flags.

Wish You Best
   Ivan Kalvachev
  iive