[MPlayer-dev-eng] Nut and Matroska comparison :)))

D Richard Felker III dalias at aerifal.cx
Mon May 3 09:28:12 CEST 2004


I've seen some misleading statements about NUT and MKV lately, so I
thought I should clear some things up and write again about where I
stand on MKV. :))

[NOTE: if you want to take the bait and start flaming, please address
different topics in separate replies to this mail. That way we can at
least keep the flames organized into threads about specific aspects of
NUT and MKV, rather than trying to respond to everything at once.]

My goals in working on NUT (which are not identical, but still
similar, to the overall project goals) are:

* Very low overhead
* Complete pts for all frames (no packing frames together!)
* Simple to implement (de/)muxer
* Support for live streaming/pipes (no seeking/buffering in muxer)
* Efficient O(log n) seeking without index
* Optional compact index for O(1) seeking
* Support for any present or future codec
* Damage to files does not prevent playing the nondamaged frames

On the other hand, Matroska's claimed goals are:

- Streamable over internet (HTTP and RTP)
- Fast seeking in the file
- High error recovery
- Menus (like DVDs have)
- Chapter entries
- Selectable subtitle streams
- Selectable audio streams
- Modularly Extendable

(Naturally NUT also has selectable audio and subtitle streams, but I
consider this so basic that I wouldn't call it a goal/feature.)

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

First let's compare overhead. Low overhead is not an explicit goal of
Matroska, but they do have an overhead page at their website, and it
will come up later anyway.

A block in Matroska can contain one or more frames. The theoretical
overhead for Matroska is at least 6 bytes per block, or at least 6+N
if N frames are "laced" into one block. Lacing has the disadvantages
that the timestamp is only known for the first frame, and that muxing
a laced file introduces latency between the file the frames are
written by the muxer and when they can be read by a demuxer.

Overhead in NUT ranges from 1-4 bytes per frame, and any frames
that would be suitable for lacing in Matroska will have 1-byte or at
worst 2-byte headers.

Matroska estimates 2-6% overhead (depending on the amount of lacing)
for 8kbps audio-only files. This is assuming 100 byte compressed
frames, which is roughly three times the average size of actual vorbis
frames at this bitrate. So I think we can safely say Matroska has
6-18% overhead in this situation. On the other hand, NUT would have
(at 1 byte per frame), approximately 3% overhead.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The discussion of overhead naturally leads into _streamability_, which
is one of Matroska's goals. If you're wasting 10% of the bitrate on
overhead and you only have 8kbps to use, that's a big problem.
Furthermore, one of the main purposes of streaming is for _live_
content (otherwise it's a waste of bandwidth -- people should just
download the file once and keep it!). Lacing adds latency, the exact
thing you don't want with live content. So the only way to get the
latency down with Matroska is to increase the overhead significantly,
which is the last thing you want to do!

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Next, error recovery. The only thing I can see resembling error
recovery in Matroska is checksums. And a checksum can do only one
thing: tell you that an error has occurred. It does not itself help in
recovering after an error. I don't see any explanation in the Matroska
specs of how you're supposed to find the next valid block after
hitting a region of corruption in your file, and in fact, the MKV
demuxer in MPlayer just dies when this happens...

NUT, on the other hand, has startcodes (similar to MPEG) that can be
used to resync after errors. The relative frequency of these
startcodes is up to the muxer, allowing users with high-quality
archival needs to maximize error resilience while not forcing it on
users who want to fit the maximum capacity on a CD and use backups as
their protection against errors. The current draft has some minor
problems -- the delta-predicted timestamps can go haywire after
"resyncing" -- but Michael and I both have ideas for how to deal with
the situation. Through the discussions we've had in other threads,
it's clear that a simple NUT demuxer can always recover from errors at
the very next startcode, and an advanced demuxer could even recover at
the very next frame after the damage.

While we're on this issue, Ivan has complained/flamed that NUT is not
error resilient enough. I don't know what he wants, short of error
_correction_ codes. These will not be added to the NUT spec, since (a)
even if you can correct for damaged headers, the frames themselves
will probably still be damaged, so it doesn't really help to get them,
and (b) effective use of error correction codes precludes sequential
writes (streaming) without buffering/seeking. Presumably a user who
really wanted such a feature could design an auxiliary file to be kept
along with the NUT containing the error recovery data for the main
file.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Next comes CHJW's favorite thing to criticize NUT about, extensibility
to future codecs. Matroska has an elaborate scheme of representing
dependency between frames, storing fake "virtual" frames in display
order with references to the out-of-order coded frame, etc., and
apparently certain Matroska developers believe this makes Matroska
more extensible than NUT. I believe this is entirely untrue.

NUT's method of dealing with strange frame reordering is simple and to
the point. We first suppose the existence of keyframes. A keyframe is
any frame which does not itself depend on previous frames, and such
that all frames following it do not depend prior frames either. If
keyframes do not exist, then it's impossible to seek, and thus knowing
dependency information between frames is useless.

We make a second assumption as well, that there is an upper bound on
the number of out-of-order frames before the next frame in display
order. Actually, this distance is always bounded by the maximum
keyframe interval, but in practice it's much smaller. For files with
the MPEG 1/2/4 IPB structure, this number is 1.

Under these two assumptions, it's possible to do two things:

1. Begin decoding at any keyframe.
2. Always know the next timestamp at which a frame must be presented.

These are the only two operations required to play movies with
out-of-order frames, regardless of how insane the ordering is.

Matroska's system does have some slight advantages, in that you can
know at the demuxer level which frames can be dropped without breaking
future frames, or which future frames will be broken if you drop a
particular frame. This could be useful for framerate-decimation when
streaming video over a network, aka temporal scalability. However, it
also has a significant price in overhead. I believe that if such
functionality is needed, the application will already be so specific
that it's acceptible to use codec-specific code for obtaining the
dependency information.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

What about efficient seeking? NUT specifically requires correct
interleaving, and explicitly disallows timestamps to reset or jump
around in the middle of the file. This allows O(log n) seeking with no
index. I am not aware of any such requirement in Matroska, meaning
that seeking will be O(n) (very slow!) if there is no index.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Time units: Matroska defines a single time unit for the whole file,
while NUT has separate time bases for each stream. This allows NUT to
use very efficient storage for timestamps, and to store additional
logical information. For example if you have a timebase of 25/1, the
player knows the frames are timed as PAL video, and timestamps
correspond exactly to frame numbers. This information, which could be
useful when editing a video or retiming it for NTSC telecine, simply
isn't there in Matroska, so you have to guess.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Now some of Matroska's strong points: chapters and menus. NUT has no
spec for either of these yet, but nothing to preclude them either.
I'm actually not quite sure how they work in Matroska, so I can't
comment a lot on whether it's a good design or not, but at least it's
there. Personally I don't like menus so I'm not a good one to ask to
evaluate them. But I do like chapter divisions!

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Finally, a few last arguments^Wflames I've heard that I'd like to
counter:

Flame: "NUT is just efficiency at all costs!!! That's stupid!!!"

Reply: What costs?? NUT has better error resilience than everything
except MPEG-TS, NUT stores as much or more information (full PTS for
each packet, divisions between each packet) than any other format, and
NUT is among the simplest formats to mux or demux. Etc. etc. etc.

Flame: "NUT is stupid because if you want to make a format that's all
about efficiency, I can do it a lot smaller!!"

Reply: Yes, maybe you can. Store one stream after another,
noninterleaved. Store no timestamps, only the length of each frame.
You can even omit the frame lengths if the codec doesn't require
frames to be separated, for O(1) overhead that approaches 0% as the
file gets large!! But guess what?? It will suck. NUT does not suck.
NUT is designed to be the size-optimal GOOD container, not the
smallest possible container.

Flame: "NUT isn't extensible and won't support next-gen codecs!!"

Reply: Um, yeah.. Whatever. This claim is unsubstantiated. Back it up,
or shut up.

Flame: "Having separate time bases for each stream makes A/V sync
difficult!!"

Reply: Is it really _THAT_ hard to compare rational numbers? I don't
think so....

Flame: "NUT IS TEH BAD BECUZ I SAY SO!!!111!!11!!!!!1!!"

Reply: ....

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

OK I think that's about it. Happy flaming... :)

Rich







More information about the MPlayer-dev-eng mailing list