[MPlayer-dev-eng] Cleaning your nuts

Thu Apr 22 04:11:16 CEST 2004

OK, there's been some discussion/complaints/objections regarding some
decisions in nut spec, and imo we need to figure out what the issues
are and fix them so we can move on and finish it.

Here are Ivan's complaints as I understand them:
- error recovery sucks now
- framecode is obfuscated nonsense

In principle I agree with both of these. But all of Ivan's proposals
to fix them have been bad and contradicted what I see as the main
goals of nut:
- correct & strictly ordered timestamps
- read/write without seeking (including live streaming)
- minimal filesize
- simple implementation

That's in order of importance to me (the first two have to be first
because if you give up on them, filesize overhead can easily be an
arbitrarily small constant).

I would like to reconcile error recovery with these goals, but quite
frankly it seems very difficult to do. Personally I'm inclined to
think that error recovery does not belong in the file, but in the
transport/storage medium. At the same time, however, it would be nice
to be able to play incomplete downloads or recover movies from heavily
damaged media.

My challenge to us is to come up with a system that allows at least a
minimal level of recovery/resync (without waiting for next syncpoint)
while not increasing the filesize/overhead beyond that of any other
container format.

As of Michael's latest draft, there are only two types of frames:
type2, with startcode to identify them and full vlc timestamps &
datasize, and type0, with no stardcode and predicted timestamp & size.
Neither has backward-pointers anymore, so the only way to recover
after an error is to search for a startcode. This means that all type0
frames until the next type2 frame will be lost. Unless we use lots of
type2 frames (which bloats the file size), that means losing a big
chunk of data.

Ivan has pointed out that the bidirectional pointers are better in
some cases, since you can walk backwards from the next startcode to
get data after the point of corruption. But this doesn't seem to work
well with header prediction/compression. Also, checking that
forward/backward pointers match isn't necessarily so easy, since a
common form of corruption is a uniform byte value over the damaged
region. And this sort of recovery requires seeking quite a few times,
rather than just continuing to read forward.

Finally, I have some proposals of my own. I'm not sure if they're good
as-is, but I want to discuss them anyway.

1. Require perfect interleaving.

This means if packet1 comes before packet2, packet1's timestamp is
less than or equal to that of packet2. Presumably we have an exception
for out-of-order frames, allowing them to be stored anywhere between
the surrounding two frames in decode-order.

Rationale: Demuxing AVI is hell because idiots make files with broken
interleaving or no interleaving. We should stop this before it starts
by strictly specifying the interleaving, and the only natural choice
is monotone ordering.

Caveats: Before writing any packet, the muxer must know that no other
stream will want to write a packet before it. Either the muxer can
buffer one packet in each stream, or the calling app can just call the
muxer in the proper order.

2. Get rid of framecode and replace it with something non-obfuscated. 

Only if we can avoid increasing overhead, of course. My original
proposal was to replace the flags byte (which then became framecode)
with a bitfield containing streamid and pts delta predictor, and maybe
some flags too. Each field could be optionally vlc-coded after the
bitfield byte too, in case of overflow. Leave the size (forward
pointer) coded in a traditional way (without predication) so that it
can't be ruined by past corrupt frames, allowing a slightly better
degree of recovery.

Rationale: Meets the simplicity goal and perhaps improves error
recovery.

Caveats: Might increase file size too much.

3. Store size (forward pointer) with bias.

For audio streams, we may have very small frames. Somewhere around 128
bytes. 128 is a magic number, because beyond 127 we need two bytes for
vlc. Michael's solution is to use predictors and just code lsb of
size, which is probably a good idea. But another idea I had (worth
thinking about) is to store in the stream header a "base size" for
each stream (minimum likely size of a packet) and have the forward
pointer be relative to that. This way we could extend the range of a
1byte vlc up to 150 or 200 or something. A bit in the flags could
indicate "absolute size" for rare packets that are smaller than the
base size.

Of course, it might be more efficient to just use several bits of the
flags/bitfield byte for lsb of the size. For example 2 bits for stream
id, 2 bits for pts predictor, and 3 or 4 bits for size lsb. That would
allow packets up to 1024 bytes (or 2048) with just one byte spent for
size.

At the very least, we should make the size coding a little more
efficient if we don't want to use predictors. For example, 0-byte
packet is never possible, and neither is 1-byte. So we could always
add 2 to the size. To allow a range of 2-1025 instead of 0-1023.

4. Use bitstreams rather than bytestreams for the packet headers.

This is just a random thought Ivan and I had, but maybe we could make
headers more efficient this way. Basically we want to move away from
the "minimal supercompressed header" as much as possible (altho I
don't want to sacrifice size!) so that some sort of error recovery is
possible.

Rich