[MPlayer-dev-eng] more NUT questions

Sat Apr 17 03:34:07 CEST 2004

Michael Niedermayer said:
> Hi
>
> > On Wednesday 14 April 2004 21:31, Ivan Kalvachev wrote:
> >
> > THAT"S THE POINT. We have 256 frame_codes (hmm less than 256),
> > so we can code 256 lsb combinations.
> > But frame_code is used also for flags(128 possible)
> no, there just 24 legal combinations of the flags
> packet_type(P)        msb_size(D)     pts(TTT)        keyframe(K)
> 0             X               XXX             X               20 cases
> 1             X               101             X               4 cases
>
hmm. Now I see why we need frame_code! flags are not compressed
optimally ;)

> > I cannot understand how we syntheses these codes before start encoding
> > (they cannot be written at the end just like the index) and how we are
> > sure that they will be enough.
> basic logic, if we can encode every possible frame, then they are
enough, and
> we just need to use 4 frame_codes of the 255 to ensure this, the remaining
> 251 can be used to store the most likely packets, how does the muxer know
> which are likely? its the problem of the muxer, but its really not
difficult,
If I write muxer, it is my problem. If I don't know anything about the
nature of the data that is coming, I cannot make right prepositions.
If I cannot make right prepositions I won't make optimal stream.
If I cannot make optimal stream from first try, I don't need this
compression scheme!

> just think about it, many combinations simply dont exist, non-keyframes in
> audio streams? full timestamps will be rare, keyframes in video streams are
> rare too, ...
If nut is used for capture it will be used with ffv1,huffyuv,ljpeg codecs
that use only keyframes. I'm not aware if there are any audio codecs
that use compensation, but it should be possible.

> > the rest as msb (mul=2). I will send full timestamps and stream_id. But
> > in this case I will lose all advantages as frame_code is turned info
> > flags.
> even with this intentionally bad choosen frame_code table ur statement is
> false, because if u do store all flag combinations, this also means u can
> always use the timestamp prediction or the lsb timestamp, so u would very
> rarely need to store the full timestamp, i guess in practice this table
would
> result in about 4-5 bytes per frame, 1frame_code, 1stream_id, 2frame_size
I haven't said anything ageist flags :) Actually i like the timestamp
compensation.
Thus I can think that we could tweak it a little more, so it can make better
handle of B-frames timestamps (returning from "future" frame)

> > Maybe i should store all packets in the memory before start writing nut?
> > Or write nut once and then "shrink" it?
> well u could certanly do this and it might result in a 0.001% smaller file,
> but its not really intended to be done that way
This is the only way one could make the Right(tm) nut file.
And if this percentage is all we gain from frame_code, then I prefer to
remove it.

> >
> > Sorry, I prefer simpler format than smaller format.
> it IS simple, maybe the spec doesnt explain it well, and if u have any
> suggestions to improve the format besides just saying its bad, we will
> certainly consider any suggestions u have
Of course I have, but I still don't like them enough;)

> u also should keep in mind that the frame_code stuff just adds optional
> complexity to the muxer, its extreemly simple for the demuxer and even the
> muxer can choose a simple static frame_code table, i guess we should add an
> example frame_code table to the spec so a muxer author could copy&paste it
> instead of thinking about how to generate a optimal one depening upon the
> number of streams and other information which may be available to the muxer
We may even add default frame_code definitions (as last 24 frame_codes),

>
> > I wonder how this funky masking formula come into your mind?
> it just chooses the closest timestamp relative to the last timestamp which
> matches the given lsb, if u have a less funky one just tell me, ill replace
> it
I mean the formula that is used when TTT==100;

> > Won't it have same effect if we use median/average_frame_time?
> huh? what?
I have that in mind:
    timestamp+=average_duration + s();
B-frame recovery would look something like:
    timestamp= previous_previous_timestamp + 2*average_duration;

> > Don't forget that the size/pointers/ are vlb, so we may not be able to
fix
> > them even with seeking (at least I don't know system function for
> > inserting bytes):
> > Hmm, now I see why you define stuffing vlb code (0x80).
> yes, exactly thats why 0x80 is there :)
I don't like it at all. How about setting limit of _header_ to 16k?
So we can synteze the full header in a buffer and write it at once.
Note that I exclude the data part (frames, indexes, strings) and pointers
themself.

I also think that we should write the maximum_size_of_pointer_supported_
by_the_muxer_that_created_this_nut_stream - it would give nice check of
when v() should end (e.g. we may get enormous number if all bytes
in the (damaged) file are bigger than 0x80)

I think that we should create one more packet - something like
mpeg's sequence_start written in the very beginning of the file,
the startcode should be human readable string (something like MIME?)
"NUT\nMultimedia container\n\0",0xRA,0xND,0x0M,0xDA,0xTA and this
new max_muxer_pointer parameter. This is required as main_header
already starts with v() != 0;
We may even write the maximum vlc size used in the file.
This way 64 bit implementation of nut won't try to read
nut file that have 128 bit pointers.

> > But by "guessing" I think that you mean to examine all parameters
> > before writing. Well you can easily compute and crc with it:P
> no, its not easy, its quite complicated, for size we would just need a
> size= get_length(a) + get_length(b) + ...;
> i leave the code for generating the checksum to u ;)
The complexity of get_length() is very close put_v() complexity.
If we use buffer it would simplify all calculations.

> >
> > > > 3. In stream_header fixed_fps is useless,
> > >
> > > its usefull for transcoding to container formats which only support
> > > fixed fps, and its usefull for error detection
> >
> > error detection? HOW do you detect error with this?
> if u demux a stream and find a different timestamp delta its a error
> > Transcoding _TO_ other formats? Why somebody would do that?
> why not? are we microsoft? so we should make it as difficult as possible
...
Bad English. The right question is `Why somebody would want to do that?`

> > Moreover, it is INTEGER! How would you code 23.976? This alone is enough
> > to crap everything.
> its a flag, 0 -> variable fps, 1-> fixed fps
Oh, I have missed that. Sorry.

> > > > maybe average (or median) would
> > > > have more sane (it may even be in timestamp units) e.g.
> > > > time_base=1001/24000, average_frame_time=1; => fps=23,976
> >
> > What's wrong with this?
> u didnt explain for what it would be usefull
In timestamp prediction.

> > > > 5. Is there some protection against start code appearing in the data
> > > > stream?
> > >
> > > no, u will find one approximately every 4 exa byte in a random data
> > > stream
> >
> > Well don't forget that we don't work with random streams. They are quite
> > ordered. I mean that the probability is a little bit higher (only a bit),
> > because the entropy levels are similar (in other words, startcodes don't
> > have repeating symbols and zeroes, and good encoders should not produce
> > repeating symbols and lots of zeroes).
> our startcodes ARE random they dont contain repeating zeros
Oh, bad wording:(
Anyway, this discussion is stupid, so lets end it.

> >
> > > > 8. Hmm, something even more fishy. We have frame_type2_startcode
if we
> > > > have frame_type=2. But frame_type=2 is indicated by
> > > > (flags[frame_code]&1)==1. Yeh, frame_code is written after the
> > > > startcode!
> > >
> > > i dont see the problem here, if theres a frame_type2 start code its a
> > > type 2
> > > frame if theres no such startcode, its not a type 2 frame
> >
> > It's kind a of recursive definition. It's confusing.
> its not recursive
I mean that specs say if it is fame_type2 then there is a code.
frame_type explanetion says that if there is a code then it is frame_type2.
So what's first? chicken or egg?
Just fix wording.

> > I guess it is legacy of the time when there have been type 4.
> wtf?!
Remove the flag&1 from frame_type definition.
(and the strange 0xa0 codes too. BTW how did you made them?)

You may flag thing to requirements it is needed.

> > The bad thing is that we MUST check for startcode.
> we must read the next byte to identify the frame type after the last frame,
> thats all, theres no startcode search, u missunderstand the format if u
> belive that there is
> all startcodes start with 'N', and 'N' is disalowed as frame_code, so if
the
> next byte is N we know its a type 2 frame or a repeated main or stream
> header, if its not 'N' we know its a type 0 or 1 frame, and looking that
byte
> up in the frame_code table will tell us if its type 0 or 1
Well, that's the way it should be explained in the specs.
If you rework startcodes as I request in the next answer, it may
even come naturatlly ;)

> > 10. What's the point of using _BOTH_ MSB and LSB ordering? Just to make
> > sure that demuxer have conversion for both?
> > Unify them.
> i dont understand what u mean, please elaborate
f(x), u(x) we don't need both. Better give startcodes as byte
sequences. So we may call nut - bytestream.

> > 11. There is no way to check is frame_type 0 broken. If there is small
> > broken part in frame_type_0 beginning we will get wrong frame_code.
> > We will read some values. We will seek to some position.
> > And so on, until we jump out of stream, calculate negative or
> > forbidden value.
> yes, its always the case, u parse format foobar until u see a illegal value
I though that error resistance was one of the main goals. We need something
to catch possible error (read - spend few more bytes).
One naked byte sitting in the middle of nowhere. This is the weakest spot
in the chain. After one broken frame_type0, we will lose all frames of
type 0 after it, even if they are not broken.

> > 12. Yet another question. The forward/backward pointers are relative.
> > It is said that forward pointer is the size of current packet and that
> > backward pointer is the size of the previous packet.
> > The problem is that if next packet is frame_type 0 it won't have any
> > pointers.
> > This breaks backward seeking.
> well, try ffmpeg/ffplay/mplayer, they can seek, and ffmpeg doesnt write an
> index
Then they do something else, but not what is written in the specification.
Look below.

> > e.g.
> >
> > We don't have index, and we are about 5/6 position in 4GB file (will be
> > very slow to start from beginning)
> > We are in main_header. Seek backward backward_ptr bytes. We read
> > frame_type_0.
> this is not allowed, the backward pointer MUST point to the last packet
> header, type 0 frame dont have a packet header
THIS EXPLANETION IS NOT CORRECT ACCORDING TO THE SPECIFICATIONS!!!
Read them carefully;)
Packet and frame mean one and same. If there is difference, it is not
explained in the specification.
Specification explicitly say that backward pointer is the size of
the previous packet. Not the size of previous packet and all
frames_type_0 after it!
Same for forward pointer.

Hey, I hope that I haven't missed and some frame reordering, have I?

> > Frame_type_0 don't have startcode, backward_pointer and forward pointer.
> > We can only calculate the data_size, but we already know it.
> >
> > Solution 1.
> > Seek backward until startcode is found. IMHO startcodes should not be
> > used in perfectly valid (not damaged) streams.
> no, its not possible to avoid a startcode search if theres no index,
think of
> a slow cdrom or network, following the pointer chain in either direction
in a
> large file without an index takes too long its O(n) vs. O(log n)
seeking, we
> could make indexes mandatory but its not possible for realtime streams, so
> its not a solution either
How about distributed indexes? Local indexes? Partial indexes?
We could make even realtime stream indexes.
(/me have no idea what is talking about ;)

Just an rough idea.
Make index_packet fixed size. (16k). Write as many entries as possible.
index_packet must have additional (fixed size) pointers to previous/forward
index_packet. sequence_start packet may point directly to the 1'st index.

> > Solution 2.
> > Write forward/backward pointers so they point to the next packet that
> > have such pointers. This will require to buffer of all frames (type 0/1)
> wrong, type 1 have a packet header and a backward pointer, only type 0
must be
> buffered, and again, u complain but u dont suggest an alternative, its easy
> to say buffering sucks, but if u dont ever buffer anything u cannot
store any
> forward pointers, so u would always have to search for the next startcode,
> but thats just what u didnt like above
I have never said that something sux. I may don't like something and usually
there is a reason for this, including misunderstanding.

> > and write them at once. We cannot seek back to write the size, as size
> > may change from one byte to two, or from 2 to 3. If we stuffing, we will
> > spend more bytes for something useless when we make hell-a-lot-of
> > tricks to save few bytes per packet.
> hmm, its extreemly simple, there is a limit of 16k byte max between type1/2
> frames, so u never need to buffer more then 16k, if the next frame is
bigger
> its written as type 1or2 and the buffer is flushed, a realtime muxer which
> need very low delay could choose to not write type 0 packets at all, the
> length of the forward pointer is also guranteed to fit in 2 bytes unless
the
> packet itself is larger then 16k
I complain against this stuffing. I don't have anything against buffers.
>From the answer of nut question from other thread I understand that #2
is the method you are using (well only for frame_type0).
I have missed these limits. But I think that we may use them for good.
E.g. we may limit the size of headers and use this buffer for storing them.

> > Solution 4.
> > GOP. All frame headers at one place. Kinda of #2.
> > Even better, it could be some kind of destributed index.
> > small indexes all over connected with forward/backward pointers.
> > Hmm sound familiar, maybe I have seen it before?
I think that this is the right solution:
Headers of frame_type0 frames should be packet together with the last
packed_header-ed packet. This way they would be protected by the crc
of the same packet. Anyway if the packet is broken usually demux of
next frame_type0 it won't be possible.

And yes, there MUST have crc. We may not calculate crc for the whole
packet, but only for the header(s). This may allow smaller crc (1, 2 bytes).

> > Sorry, but I don't like nut. It tries to be smaller at all cost. And it
> > pays too much. It trade simplicity. It trade stability. And then try to
> > compensate with huge startcodes.
> well, IMHO u shouldnt say such things before u understand the format at all
> simplicity is something subjective so its difficult to argue about,
stability
> is easier, i suggest u damage a nut file and try to play it and seek in it,
> compare it against other formats, and then judge its stability, i attached
> the program i used for such tests
If I haven't understood specification from the 1'st look,
then they are not simple.

Best Regards
   Ivan Kalvachev
  iive

 p.s.
Don't try to contaminate me with GPL code!;) I want to implement it in
BSD license and sell it to Microsoft, Real, Apple and few others;)

Seriously I suggest to make nut implementation in dual license.
One of course is GPL, the other should be something like Java license -
 you can watch the code, you can compile code, you can link the code,
 you can run the code, you can distribute the code,
 but you can not modify it.

Also it may be fun to patent nut once it is finished. This will give us
full control of suspending clones under different licenses.
The only problem is money. (GPL will be ok, if we give free permission
for all GPL-ers )

I would recommend it to matroska team also to patent their format. Microsoft
already patented some XML & binary stuff, maybe you have an prior-art?