
On Wed, Feb 13, 2008 at 03:54:44PM +0100, Michael Niedermayer wrote:
On Tue, Feb 12, 2008 at 08:37:21PM -0500, Rich Felker wrote:
On Tue, Feb 12, 2008 at 08:24:21PM +0100, Michael Niedermayer wrote:
Also keep in mind that what I said does NOT only apply to the mmap design but to a nice clean read-based design with low copying that probably IS similar to various current implementations.
I dont see how It wouldnt affect libavformat, also it wouldnt affect the simple: x= malloc(size+header) put header in x fread into x+header
This design is highly inefficient for high bitrate streams. The efficient implementation is:
And why exactly is above inefficient? The kernel internally has its own cache and prereads an appropriate amount, at least it should.
If it prereads, then there's an extra memcpy from the kernel buffer to the userspace buffer. However, preread and buffering can be disabled by posix_fadvise() on systems that fully support it. Then, the only activity on read() will be a single read from the disk directly to userspace buffers, or a single memcpy if some other process had already caused the file to be cached (but never a read into one buffer followed by a memcpy into another). Also, keep in mind not every platform/device will even have a kernel/user split. Data might directly move from the media reader device into a buffer that the demuxer has direct access to.
Duplicating the kernel cache seems rather the inefficient one. Also its much more compelx ...
Mildly, but keep in mind that you can't _just_ use the kernel cache. Your example involved using the libc's stdio buffering too; otherwise the small reads for processing the container data would be incredibly expensive. On the old glibc I used to use, I measured fread performance being atrociously bad, always going through the stdio buffer and therefore double-copying. I don't know if it's as bad anymore, but my libc's implementation outperformed (old) glibc by several times on a loop-and-read-file benchmark with certain read-chunk sizes and fread(). I suspect this sort of badness is more the norm than the exception.
Also do you have some benchmarks, iam curious what the real difference is.
I suppose I could counstruct one. It might also be documented somewhere in the Austin Group/Open Group notes on why posix_fadvise and posix_madvise were adopted...
But iam not that much interrested in huge frames, a simple "header elision with frames > 1024 shall be considered an error" would probably be ok.
That would make me happy. I would even be happy allowing sizes up to 4096 or so if you think it would help.
I think it would, especially with mpeg headers ... you know 00 00 01 blah startcode shit ;) Also see svn log, ive just added elision headers, comments welcome ... Also time for nut to become a negative overhead container :)
=) =) =)
For PCM, there's no seeking issue because one always knows how to seek to an arbitrary sample in PCM 'frames'. In this case I would just recommend having a SHOULD clause that PCM frames SHOULD be same or shorter duration than the typical frame durations for other streams in the file (as a way of keeping interleaving clean and aiding players that don't want to do sample-resolution seeking inside PCM frames).
Well ... I do have a AVI with all audio in a single chunk (something like 10 or 20mb), do you want that .... in nut? mplayers avi demuxer can even seek in that avi :)
so honestly i dont think a should requirement alone would be a good idea.
Of course I don't want that in NUT. I suppose we should have a nice strong technical requirement to prevent it. How about just a limit to 4096 samples per frame? If one uses the maximum allowed frame size then, the overhead would be trivial (1 byte per 4096 bytes in the worst case, i.e. 0.025%, and much less with 16bit/stereo/etc.).
ok, ill add a 4096 sample limit
Sounds good. And for uncompressed video a frame must be exactly one picture regardless of size. Is that okay? Rich