[NUT-devel] CVS: main/DOCS/tech mpcf.txt,1.117,1.118

Michael Niedermayer michaelni at gmx.at
Fri Mar 3 01:12:55 CET 2006


Hi

On Thu, Mar 02, 2006 at 06:11:17PM -0500, Rich Felker wrote:
[...]
> > > my proposed header compression, which has negligible complexity would reduce
> > > the overhead by ~1% and was rejected based on nonexistant kernel and demuxer
> > > architectures
> > 
> > Scratch kernel; the kernel architecture for it already exists. It's in
> > POSIX and called posix_madvise. There is no demuxer to do zerocopy
> > demuxing, but in the case where decoded frames fit in L2 cache easily,
> > but the compressed frame is very large (i.e. high quality, high
> > bitrate files -- the very ones where performance is a problem)
> > zerocopy will make a significant improvement to performance.
> > Sacrificing this to remove 1% codec overhead in crappy codecs is not a
> > good tradeoff IMO. It would be easier to just make "MN custom MPEG4"
> > codec that doesn't have the wasted bytes to begin with...
> 
> One other thing with this that I forgot to mention: it would be
> possible to support zerocopy for non-"header-compressed" files even if
> header compression were supported. My reason for not wanting to have
> this option was that it forces any demuxer with zerocopy support to
> also have a duplicate demuxing system for the other case. If this can
> be shown not to be a problem (i.e. a trivial way to support both
> without significant additional code or slowdown) I'm not entirely
> opposed to the idea.

here are a few random problems you will have with this zero copy demuxing
all solvable sure but its alot of work for very questionable gain

* some bitstream readers in lavc have strict alignment requirements, frames
  cannot be aligned with zerocopy
* the vlc decoding of all mpeg and h26x codecs in lavc needs a bunch of
  zero bytes at the end to gurantee error detection before segfaulting
* several (not few) codecs write into the bitstream buffer either to fix
  big-little endian stuff or in at least one case reverse some lame
  obfuscation of a few bytes
* having the bitstream initially not in the L2 cache (i think that would
  be the case if you read by dma/busmastering) will mean that accesses to
  the uncompressed frame and bitstream will be interleaved, todays ram
  is optimized for sequential access this making the already slowest part
  even slower
* and yeah the whole buffer management with zerocopy will be a nightmare
  especially for a generic codec-muxer architecture where codec and muxer
  could run with a delay or on different threads

basically my oppinion on this is that its like the video filter architecture
very strict idealistic goals which may or may not be all achievable at the
same time but which almost certainly will never be implemented as the code
is to complex and too many things depend on too many

[...]
-- 
Michael




More information about the NUT-devel mailing list