[MPlayer-dev-eng] Flaming NUT

D Richard Felker III dalias at aerifal.cx
Fri May 7 04:21:12 CEST 2004


On Fri, May 07, 2004 at 03:48:02AM +0300, Ivan Kalvachev wrote:
> Ok, here is resume of all my flames.

And mine:

> 1. startcodes are good for recovering after error. They are not good for check
> (sums) and confirming validity of any data.

Checksums aren't good for confirming validity of data either.

> 2. type0 frames would cause disaster if they are not heavy limited by
> design!

Frame types no longer exist (or rather all frames are "type0").

> 3. Extendability are more important than overhead.

Extensibility and low overhead aren't conflicting goals. Maybe we
haven't thought enough about extensibility though.

> 4. Error resistance could be better.

Yes, we could store the whole file twice in case one copy is corrupt.
Actually we have a name for that already: backups. :)

Seriously, I'm open to suggestions that actually improve error
recovery without destroying the other goals of NUT. But checksums do
not help!

> 5. Seeking should not be done by searching startcodes. (without index)

How else do you do it? There's only one other way, linear search...

> 6. Buffering is actually needed for streaming. Find out why vbv_buffer
> exist in MPEG.

No it's not. If you're going to claim it is, explain.

> > On Tue, May 04, 2004 at 01:25:19AM +0300, Ivan Kalvachev wrote:
> > >
> > > Hello,
> > > Maybe you have noticed that I am not including myself in nut discussions
> anymore.
> > > Why?
> > > I looked at older (20030906) version on nut and I found that I like it
> > > more, as it is nearly what I have proposed. From this fact I think that
> > > nut is heading in a wrong direction. Michael is great optimizer,
> > > but I will repeat again (and again) that reducing overhead is not our
> > > primary goal!
> >
> > This is not about overhead, altho the old design SUCKED and had
> > considerably worse overhead than most other containers, unless you
> > used subpackets. But subpackets preclude live streaming, and live
> > low-bitrate streaming is where overhead matters most!!
> 
> Hmm, as usual you use the word "suck" without any reason.

IMO that paragraph explained and justified "suck" a plenty well...

> I just love the way you use word "overhead",
> proving yourself wrong in the very same sentence.
> 
> Let me repeat it.
> "This is not about _overhead_, altho the old design SUCKED and had
> considerably worse _overhead_ ..."
> 
> I don't think I could convince you about anything as long as
> your primary target and compartment criteria is overhead.
> 
> In future I will ignore all your arguments that contain the
> word overhead.

The old design had subpackets because it was clear that the overhead
would be _insane_ (worse than avi, perhaps? :) without using them.
Thus, it was unusable for live streaming because you had to make a
choice between high latency and huge overhead, neither of which is
suitable for the task. Is it clear now?

> > If you notice, all the design changes we have made recently in NUT
> > fulfill MULTIPLE goals, which include:
> >
> > - allowing zero-latency muxing (streaming)
> 
> This is pointless. Completely useless.
> I'm sure we could discuss on it farther.

No it's not. Suppose your voip-phone wants to use vorbis audio, with
nut as the container. Do you want lagged conversations??

> > - reducing overhead
> 
> OVERHEAD, OVEARHEAD, go over my head...

This was ONE of SEVERAL goals.

> > - improving error resilience
> 
> So far, I haven't sow anything that improve error resistance.

Maybe current error resilience is comparable to the original spec,
which you love so much and which I really dislike. I'm talking about
changes since the second spec. The recent changes that improve error
resilience are:

- Removing delta-predicted sizes and timestamps.
- Adding max_distance for startcodes.
- Adding short startcodes (not in the spec yet, I know...)

> All changes improve overhead and decrease error resistance.
> Short synccodes are not part of the drafts yet. (atm of writing
> original flame)

The current draft says type2 (long) startcode has to come every 16k.
That's very often... Adding short startcodes lets us make this
interval larger, to reduce overhead even more while keeping error
recovery the same or better (better if we use a shorter interval like
4k).

> > > I found something interesting too. The old draft have more
> > > requirements for error resistance! Cheaters.)
> >
> > Yes, it had one more requirement: the ability to identify damaged
> > frame _contents_ in order to re-download. We decided this is stupid
> > because it belongs at the protocol level (ed2k, bittorrent, rtsp, tcp,
> > whatever).
> 
> Do you know the SEP syndrome? It is very popular in (post) socialist
> countries with heavy bureaucracy. It mean "Somebody Else Problem".
> You simple try to move the problem, not to solve it. The fun comes when
> the one that you have moved the problem to, comes with SEP in turn...

Hmm, I guess "cat" suffers from SEP syndrome too since it doesn't let
you make regex replacements? And "sed" suffers from SEP because it
doesn't load files from over the network by itself? And "emacs"
suffers from SEP because.....oh wait, never mind! :)))

> OGG people did that. They throw away all problems. I see that Michael
> is very happy because of this :P

Hm?

> The checksum are to make checks. I am bashing you for at least one byte
> or optional checksum that guarantee that the packet_header is not damaged.
> But this break your HIGHEST TOP NUMBER 1 priority.

The walk-the-packets test is much more reliable than a one-byte
checksum.

> The very same argument could be used for all extensions!
> Here you remove the requirement for extendibility.
> What was the right term of that - demagogy?

Looks like Michael put it back. :)
Although maybe we could make a little more effort to make sure it's
efficient.

> > > After all, the main function of container is to keep the frame data
> > > and all other meta data that is required for it.
> > >
> > > Just turn back and see what beautiful hack you have done for DTS support.
> >
> > Actually the dts "hack" is only there so we can formally specify what
> > proper interleaving means in the spec. In practice, a demuxer/decoder
> > never needs to know dts. Instead it just uses "delay" as the decoding
> > latency, and thus demuxes "delay" packets ahead at all times, which
> > will give the same effect.
> >
> > So, our "beautiful hack" is really just a clever way of (formally)
> > writing down what everyone has known for years.
> 
> Saying that decoder never need to know decode timestamp is somehow ...
> flowed ;)

Decoding does not needs to be timed, just _sequenced_. As long as it's
in the right sequence and every frame is decoded before its
_presentation_ time stamp, everything is fine.

> DTS could be just frame number. Why it should be in the same timebase
> as the presentation time?

It shouldn't. Again, it's just a sequential ordering, not an actual
time. Once the PTS numbers are converted to DTS, they have meaning
_only_ in their ordering relation to one another, not as timestamps.

> Just to serve a rule without real value?
> Also, the current scheme "may" lead to problems with variable
> framerate formats.

Would you care to explain? I think you misunderstand the spec. DTS=x
does _NOT_ mean decode the frame at time x, so perhaps DTS is a
misnomer. Instead it should perhaps be called DSN (decode sequence
number).

IMO Michael's algorithms for "DTS" only serve two purposes:

1. To determine the next presentation timestamp in an out-of-order
   movie (but this can be done more easily just by demuxing "delay"
   frames ahead).

2. To specify the rule for interleaving (monotonicity applies only to
   frames with PTS=DTS).

If you still think there's a problem, please explain it.

> > > 2. I wanted more checksums (in frame headers too). Rich explained me
> > > that they are not necessary as backward/forward pointers are the
> > > biggest part of the header and breaking one of them is easy to be
> > > found. Few days later Michael removed the backward pointers.
> > > Of course nobody want to spend  precious bytes for checksums.
> >
> > Checksums ARE NOT NECESSARY to identify damage. Startcode checking
> > does a much better job of the same thing, with less computational
> 
> That's the most stupid thing I've heard from skilled developer.
> Startcodes prove ONLY that THEY are not damaged. Nothing else.

They don't even do that. It might not be a startcode, but random
corrupt data that just happens to contain a startcode...

> Even one simple XOR of all bytes is better than nothing. Or lower
> meaning part of the bytes. Both ways were used at Apple II times,
> so they ARE fast.(XOR was faster, but it may be slower on crappy P4)

...but checksums don't prove that any data is valid either. They just
prove that a certain collection of bytes happen to XOR to a particular
number.

Now, _GIVEN_ that the startcode/checksum is valid, a startcode doesn't
tell you anything about the other data being valid, while a checksum
can tell you that it "might be valid" or "definitely is not valid".
But you're not given that the checksum is valid. For all you know, it
could be random junk.

> > overhead (I sure as hell don't want to be wasting cpu cycles on
> > checksums when I'm trying to get my k6 to decode 720x480 video with
> > he-aac audio.....).
> 
> try to avoid EMMS, it is slow as hell (about 300 cycles).

I'm told it's 1 cycle, and I believe Michael checked it. At least for
K6-3...might be slow for the original K6. Anyway, mplayer uses femms
on AMD. BTW isn't this highly off-topic?

> > > 3. Seeking ATM is something horrible. The place of index is not
> > > determined by any way. It may come to the confusing situation to read
> > > the half (or whole) file until index is found:O
> >
> > There is no good spec for the index right now, so your complaint is
> > about the incompleteness of NUT, not a flaw in NUT. Rest assured we
> > WILL specify the index properly, but not necessarily before the "1.0"
> > NUT spec. Seeking in NUT is very efficient without an index.
> >
> > > Seeking without index is very similar to the mpeg stream seeking.
> >
> > Nope. Correct seeking in mpeg is O(n) because timestamps are entirely
> > random.
> 
> LOL. I hope you can prove that. Otherwise it is very stupid flame.

Pop in a DVD. Timestamps are likely to reset at chapter boundaries.

Cat two mpeg files together. The resulting file is considered "valid"
but the timestamp resets in the middle. (On the other hand, such a NUT
file is not valid. Even if the spec didn't explicitly forbid the bad
timestamps, stream numbers and framecodes are likely not to be the
same, so it won't be playable. :)

Because of this, you _cannot_ seek with binary search. If you try,
you're likely to seek to the wrong "time X" since there can be two or
more occurrances of "time X".

> > In practice mpeg players implement incorrect bitrate-based
> > seeking, which usually seeks by the wrong amount (thus everyone
> > complaining about -ss with mpeg files...).
> 
> This bitrate base seeking is actually attempt for O(1). If MPlayer
> want correct seeking then it need to do "binary search".

Binary search isn't possible in MPEG, see above.

> > Seeking in NUT with no index is O(log n). You always seek to the exact
> > time you want.
> 
> I'm already sick of this O(log n). IT IS FALSE. IT IS NOT CORRECT.
> It is larger.
> Why? Because you need to find frame_header first. You jump in a middle
> of the frame data. Then you need to start reading forward until you find
> startcode. This takes time. So it depends on data_size and type0 max size.

The _only_ factor that has any weight in seeking time is access to the
media, and in particular, the number of times you have to seek on the
media. Even searching the computer's entire memory space for a
startcode takes less time than seeking on a cdrom... :)

If you want the full statement, seeking is O(max_distance * log n),
but max_distance is independent of n, and will be small. So for our
purposes, it can be seen as a small constant.

> > > With the tiny detail that nothing guarantee that these startcodes are
> > > unique, just very improbable.
> >
> > Guess what? Nothing guarantees that a "packet" with the right checksum
> > is valid either. It's just incredibly improbable to get a bad packet
> > with a good checksum. The probability of a false positive for your 32
> > bit checksum is 1/2^32. The probability of false positive for NUT
> > startcodes is 1/2^64.
> 
> You miss something. Startcodes are always one and same. What would you do
> if M$ try to sabotage the NUT, by making the very same startcodes
> part of their next codec? Sue them? Change nut?

Do you _really_ think anyone using NUT is going to encode with
proprietary MS codecs??? Actually it would be interesting though. MS
could probably be sued under antitrust law for using its monopoly
power to intentionally create incompatibility with competitors
products. In any case, see below for why the same problem can occur
with checksums, etc.

> > > Do you remember the nut it nut scenario.
> >
> > Nut is not allowed to contain other containers (including itself),
> > only raw codec frames.
> >
> > > It could make nut demuxer
> > > completely nuts on seeking, dumping errors that don't really exist.
> > > Ya,ya even ogg don't do it anymore.
> >
> > You can come up with pathologies that break any system of sync/error
> > recovery.
> 
> But I am not talking about breaking error recovery. I talk of
> breaking the _NORMAL_ seeking.

I said both sync and error recovery. You ALWAYS must have some sort of
codes to sync onto if you're going to perform binary-search seeking.
This can be startcodes, or forward/backward pointers, or checksums, or
anything similar.

If you refuse to use such codes to sync, then your only alternative is
linear search, walking the whole file. This is O(n) and is completely
unusable for long-distance seeks.

> > > 4. I am serious about NUT in MPEG-TS. Really.
> > > You are trying to reduce the overhead of NUT, with words that it is
> > > good for very low bitrate streams. But these streams are good only for
> > > streaming. And NUT is not suited for streaming. No packet priority
> > > no retransmition etc...
> >
> > This belongs in the PROTOCOL, not the file format. They are two
> > entirely separate layers.
> 
> SEP

No, it's called proper unix philosophy.

> > > 5. The current level of error resistance ability is somehow lower
> > > than MPEG-(P)ES. Why? If there is broken frame_header of type0 frame
> > > then we will loose all frames until next type2. This may mean up to
> > > 0.5 seconds (15 frames) or even more (>16kb) data.
> >
> > Nonsense. See below for explanations.
> >
> > > Just for compartment with mpeg elementary stream you may resync on next
> > > slice in the same picture!
> >
> > You can do this just fine with nut. If the damage is only inside the
> > frame (and not in the nut headers), then the codec does its own
> 
> I SAID FRAME_HEADER!!! I'm going to use Ascii Art next time, if you
> miss the point (again;)
> 
> In your scenario frame_header of type0 is ONE (1) byte. If it is broken
> you loose all frames until next synccode.

I have explained several ways around this before, as long as there are
not multiple points of damage between startcodes.

> According to the standard (now) synccode should be used at least every
> 0.5 seconds or max_type0_size (16kb) data. As max_type_0_size is now
> optional it WILL lead to disaster.

It's not optional. Try cvs update... max_type_0_size doesn't even
exist anymore since "type0" is deprecated terminology (there are no
frame types).

> > resyncing with error concealment, and you'll never notice any problem.
> > Also, if you know your nut file contains (for example) only mp3 audio
> > and mpeg-4 video, you can use mpeg headers to resync. BUT THIS IS
> > CODEC-SPECIFIC, and so is the resyncing you're talking about in
> > MPEG-ES. It does not work in general.
> 
> Isn't that what I am saying?

Yes. You're free to do it with NUT, but it's not an acceptable general
solution because it's codec-specific.

> Then why I need nut? It don't even have total time with it:(

MPEG doesn't have total time either...

> Oh, yee. But at the time you have found that something is wrong your
> whole image would be full of flashing blocks and lavc would print
> 2 tome encyclopedia of errors.

What do you expect from a corrupt file? Yes, this is a possibility if
you don't use good error resilience settings. Again, your choice. But
I would rm the file if it's corrupt anyway (or redownload the damaged
parts).

> Guess what? Next thing you will do when (if) nut become popular is for
> these idiots that set too big max_type0_sizes and break the last
> chance of recovery. Not to mention that they have to actually hack the
> muxer source to tweak it as I like it.

The idiots you talk about are the ones releasing "backups" of movies
on their release-groups channels, bittorrent, p2p networks, ftps, etc.
They HAVE NO TOLERANCE for corrupt files. Every file has external
checksums (or in bt/p2p case, handled by the network), and files that
are corrupt are NOT TO BE TRADED. So if you're worried about getting
corrupt NUT files from these types of people, STOP USING SHIT LIKE
KAZAA AND GNUTELLA and get a real p2p network.

On the other hand, if you're worried about losing your family home
videos to corruption, configure the muxer properly for error
resilience!

It's not that hard to see that different situations have different
requirements for error resilience. We will NOT force everyone to waste
tons of space to meet your requirements, since their requirements
might be very different.

> As you said it is trade off. And you value overhead too much.

I value choice.

> > On the other hand, if you're copying a DVD to share on P2P ;), or
> > streaming video (where the PROTOCOL ALREADY HAS ERROR DETECTION AND
> > RETRANSMISSION), then you and use very few startcodes, since you have
> > other ways of repairing the data if it's damaged.
> 
> Looks like you haver had dealed with floppy disk, or cd-roms, or dvd
> burners. What would you do if this is the only copy that exist?
> Relaying on pirate networks is not something I would do.

You have several choices:

- Make backups. Always the best.
- If the movie if your own, mux with high error resilience.
- If the movie is from a "pirate network", you're free to remux it
  once you get it. This will make it impossible to use the p2p network
  to repair errors in the file, but you don't seem to want to do that
  anyway.

> > > > We need an real stress test, for worst scenario.
> >
> > Stress test is 8kbit/sec vorbis. I demonstrated that the base overhead
> > for this case is 1 byte-per-packet. You can't get lower than that. Add
> > whatever level of error resilience you like on top, but IMO 8kbit/sec
> > is normally for streaming where the PROTOCOL ALREADY RECOVERS ERRORS!
> 
> Oh yee. That's what I mean. All test scenarios I see so far show nut
> in superior view. All test are made to demonstrate how frame_header==1 byte
> performs so great.

Because it's the critical case...

>There is no real tests with real movie. How about
> storing DVD into NUT? With multiangel, multiple langueges, probably with
> 5.1 vorbis? Something that would actually make nut spend bytes. Don't forget
> that nut don't have size limitations. The largest the file the bigger overhead.

Roughly speaking (this is very rough since bytes are discrete and
we're talking about very few) the overhead of NUT in bytes per frame
is O(log encoded_framesize*num_streams). In particular, this means
that the overhead _percentage_ decreases as the bitrates (and thus
frame sizes) increase, except right near the discontinuities where
size increases by a byte.

In your example, assuming 1200kbit/sec video, a typical video frame
would have 3 bytes overhead, compared to about 6k encoded frame size.
This comes out to 0.05%. If the 5.1 vorbis is 320 kbit/sec, each audio
frame would have 2-3 bytes overhead, for something like 0.8%.

> BTW I know that there are speech compressions that work well with
> 1-2kbits, but I don't know an general audio compression algorithm
> that could produce something acceptable at that bitrate.

Hmm, well with my prior voip-phone example this is interesting...

> > Making container better than avi is not big deal.
> >
> > Tell that to the OGG people... :)) They still can't seem to do it.
> Microsoft DID. ASF/WMV is the superior of avi.

ASF/WMV is very bad. The overhead is almost as bad as AVI (even with
no index!), and the variable-framerate is botched (just like Matroska)
with low-precision approximate timestamps.

Rich





More information about the MPlayer-dev-eng mailing list