Re: [NUT-devel] [nut]: r604 - docs/nutissues.txt

13 Feb 2008

      On Wed, Feb 13, 2008 at 03:54:44PM +0100, Michael Niedermayer wrote:
...
On Tue, Feb 12, 2008 at 08:37:21PM -0500, Rich Felker wrote:
...
On Tue, Feb 12, 2008 at 08:24:21PM +0100, Michael Niedermayer wrote:
...
...
Also keep in mind that
what I said does NOT only apply to the mmap design but to a nice clean
read-based design with low copying that probably IS similar to various
current implementations.
I dont see how
It wouldnt affect libavformat, also it wouldnt affect the simple:
x= malloc(size+header)
put header in x
fread into x+header
This design is highly inefficient for high bitrate streams. The
efficient implementation is:
And why exactly is above inefficient? The kernel internally has its
own cache and prereads an appropriate amount, at least it should.
If it prereads, then there's an extra memcpy from the kernel buffer to
the userspace buffer. However, preread and buffering can be disabled
by posix_fadvise() on systems that fully support it. Then, the only
activity on read() will be a single read from the disk directly to
userspace buffers, or a single memcpy if some other process had
already caused the file to be cached (but never a read into one buffer
followed by a memcpy into another).

Also, keep in mind not every platform/device will even have a
kernel/user split. Data might directly move from the media reader
device into a buffer that the demuxer has direct access to.
...
Duplicating the kernel cache seems rather the inefficient one. Also
its much more compelx ...
Mildly, but keep in mind that you can't _just_ use the kernel cache.
Your example involved using the libc's stdio buffering too; otherwise
the small reads for processing the container data would be incredibly
expensive.

On the old glibc I used to use, I measured fread performance being
atrociously bad, always going through the stdio buffer and therefore
double-copying. I don't know if it's as bad anymore, but my libc's
implementation outperformed (old) glibc by several times on a
loop-and-read-file benchmark with certain read-chunk sizes and
fread(). I suspect this sort of badness is more the norm than the
exception.
...
Also do you have some benchmarks, iam curious what the real difference
is.
I suppose I could counstruct one. It might also be documented
somewhere in the Austin Group/Open Group notes on why posix_fadvise
and posix_madvise were adopted...
...
...
...
But iam not that much interrested in huge frames, a simple
"header elision with frames > 1024 shall be considered an error"
would probably be ok.
That would make me happy. I would even be happy allowing sizes up to
4096 or so if you think it would help.
I think it would, especially with mpeg headers ... you know 00 00 01 blah
startcode shit ;)
Also see svn log, ive just added elision headers, comments welcome ...
Also time for nut to become a negative overhead container :)
=) =) =)
...
...
...
...
For PCM, there's no seeking issue because one always knows how to seek
to an arbitrary sample in PCM 'frames'. In this case I would just
recommend having a SHOULD clause that PCM frames SHOULD be same or
shorter duration than the typical frame durations for other streams in
the file (as a way of keeping interleaving clean and aiding players
that don't want to do sample-resolution seeking inside PCM frames).
Well ...
I do have a AVI with all audio in a single chunk (something like 10 or 20mb), 
do you want that
.... in nut? mplayers avi demuxer can even seek in that avi :)
so honestly i dont think a should requirement alone would be a good idea.
Of course I don't want that in NUT. I suppose we should have a nice
strong technical requirement to prevent it. How about just a limit to
4096 samples per frame? If one uses the maximum allowed frame size
then, the overhead would be trivial (1 byte per 4096 bytes in the
worst case, i.e. 0.025%, and much less with 16bit/stereo/etc.).
ok, ill add a 4096 sample limit
Sounds good.
And for uncompressed video a frame must be exactly one picture
regardless of size. Is that okay?

Rich