[NUT-devel] [nut]: r604 - docs/nutissues.txt

Wed Feb 13 19:11:52 CET 2008

On Wed, Feb 13, 2008 at 03:54:44PM +0100, Michael Niedermayer wrote:
> On Tue, Feb 12, 2008 at 08:37:21PM -0500, Rich Felker wrote:
> > On Tue, Feb 12, 2008 at 08:24:21PM +0100, Michael Niedermayer wrote:
> > > > Also keep in mind that
> > > > what I said does NOT only apply to the mmap design but to a nice clean
> > > > read-based design with low copying that probably IS similar to various
> > > > current implementations.
> > > 
> > > I dont see how
> > > It wouldnt affect libavformat, also it wouldnt affect the simple:
> > > x= malloc(size+header)
> > > put header in x
> > > fread into x+header
> > 
> > This design is highly inefficient for high bitrate streams. The
> > efficient implementation is:
> 
> And why exactly is above inefficient? The kernel internally has its
> own cache and prereads an appropriate amount, at least it should.

If it prereads, then there's an extra memcpy from the kernel buffer to
the userspace buffer. However, preread and buffering can be disabled
by posix_fadvise() on systems that fully support it. Then, the only
activity on read() will be a single read from the disk directly to
userspace buffers, or a single memcpy if some other process had
already caused the file to be cached (but never a read into one buffer
followed by a memcpy into another).

Also, keep in mind not every platform/device will even have a
kernel/user split. Data might directly move from the media reader
device into a buffer that the demuxer has direct access to.

> Duplicating the kernel cache seems rather the inefficient one. Also
> its much more compelx ...

Mildly, but keep in mind that you can't _just_ use the kernel cache.
Your example involved using the libc's stdio buffering too; otherwise
the small reads for processing the container data would be incredibly
expensive.

On the old glibc I used to use, I measured fread performance being
atrociously bad, always going through the stdio buffer and therefore
double-copying. I don't know if it's as bad anymore, but my libc's
implementation outperformed (old) glibc by several times on a
loop-and-read-file benchmark with certain read-chunk sizes and
fread(). I suspect this sort of badness is more the norm than the
exception.

> Also do you have some benchmarks, iam curious what the real difference
> is.

I suppose I could counstruct one. It might also be documented
somewhere in the Austin Group/Open Group notes on why posix_fadvise
and posix_madvise were adopted...

> > > But iam not that much interrested in huge frames, a simple
> > > "header elision with frames > 1024 shall be considered an error"
> > > would probably be ok.
> > 
> > That would make me happy. I would even be happy allowing sizes up to
> > 4096 or so if you think it would help.
> 
> I think it would, especially with mpeg headers ... you know 00 00 01 blah
> startcode shit ;)
> Also see svn log, ive just added elision headers, comments welcome ...
> Also time for nut to become a negative overhead container :)

=) =) =)

> > > > For PCM, there's no seeking issue because one always knows how to seek
> > > > to an arbitrary sample in PCM 'frames'. In this case I would just
> > > > recommend having a SHOULD clause that PCM frames SHOULD be same or
> > > > shorter duration than the typical frame durations for other streams in
> > > > the file (as a way of keeping interleaving clean and aiding players
> > > > that don't want to do sample-resolution seeking inside PCM frames).
> > > 
> > > Well ...
> > > I do have a AVI with all audio in a single chunk (something like 10 or 20mb), 
> > > do you want that
> > > .... in nut? mplayers avi demuxer can even seek in that avi :)
> > > 
> > > so honestly i dont think a should requirement alone would be a good idea.
> > 
> > Of course I don't want that in NUT. I suppose we should have a nice
> > strong technical requirement to prevent it. How about just a limit to
> > 4096 samples per frame? If one uses the maximum allowed frame size
> > then, the overhead would be trivial (1 byte per 4096 bytes in the
> > worst case, i.e. 0.025%, and much less with 16bit/stereo/etc.).
> 
> ok, ill add a 4096 sample limit

Sounds good.
And for uncompressed video a frame must be exactly one picture
regardless of size. Is that okay?

Rich