[MPlayer-dev-eng] rethinking nut!

Sun Mar 28 09:27:31 CEST 2004

I got to thinking some more about NUT (formerly known as MPCF, i.e.
MPlayer Container Format), and well, I think we might want to make
some major design changes.

Why? Well, Alex was implementing NUT support in ffmpeg/libavformat,
and one of the main reasons he slacked off was that it was too painful
to do the audio subpacket buffering stuff right with libavformat's
design. Yes, maybe this means lavf sucks...but IMO it also means NUT
is overcomplicated.

So why are audio subpackets (what Matroska calls lacing) bad? Suppose
you're writing a file. Keep in mind, it has to be properly
interleaved! So let's say you want to put 20 audio subpackets in a
packet to save space. You can't write the combined packet to the file
until you've encoded all the subpackets. But by then, 10-20 video
frames might have been captured/encoded too, and they can't be written
until after all 20 audio subpackets are complete (due to interleaving
constraints). This means you have to buffer LOTS of data before
writing it to the file, especially if your video is high-bitrate or
uncompressed.

Conversely, when decoding you have to buffer the whole audio packet
until all the subpackets inside have been used. This probably doesn't
matter for PCs but it's a pain for hardware players, and it goes
against the simplicity-based design we originally wanted, IMO.

So what now? As Alex found while testing nut in ffmpeg, file size
overhead sucks without audio subpackets. In fact it was worse than MKV
in some cases! :) So something needs to be done to fix it.

IMO the solution is to greatly cut down the headers. Current spec
calls for:

1 byte of flags
1+ byte forward pointer (vlc)
1+ byte backward pointer (vlc)
1+ byte stream id (vlc)
1+ byte timestamp (vlc)
some other optional stuff

Without subpackets, this means at least 5% overhead for 128kbit mp3
audio, and probably worse for typical vorbis. Very bad. How could we
improve the situation:

1. Store stream id in with flags. It should be a 3-bit field, with the
   special value of 111 meaning there's a separate vlc field in the
   entry afterwards storing the stream id (used only when there are
   more than 7 streams).

2. Make flags a vlc field, with the essentials at the beginning, and
   video-specific stuff later (so we don't waste space on those fields
   for audio/subtitle packets).

3. Make timestamp optional, with a default timestamp-delta in the
   stream header. Optionally have a 2-bit timestamp selector field,
   with 11=full timestamp in its own vlc field, and 00,01,10 being 3
   predefined deltas from the stream header (useful for vorbis audio
   and post-ivtc mixed-fps material :).

4. Eliminate backwards pointers. I know this one will be really
   controversial since they do add robustness for damaged files, but
   IMO a whole vlc field is a lot to spend on each frame. This point
   definitely calls for discussion. Maybe there's a way we could use
   backwards pointers in some packets but not others...?

If all these changes are made, the new overhead per frame is:

1+ byte flags/stream id (vlc, only 1 byte for audio)
1+ byte forward pointer (vlc)
0+ byte timestamp (vlc, usually omitted for audio)
some other optional stuff

With only 2-3 bytes per packet, we're doing really well. I expect this
is comparable to the average overhead with our old subpacket design,
assuming reasonable numbers of subpackets.

Alex, Michael, and others interested in container formats: please post
your thoughts on the matter. But remember: IMO it's better to design
something simple that we'll actually implement, rather than bloatware
specs like Matroska that would require hundreds of KB of code to
use... (which we're too lazy to write!)

Rich