
Hi, Here are some things I noticed: On Tuesday 25 September 2007 10:54, Luca Barbato wrote: [..]
1.1. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].
This document refers to the following definitions
definitions: [..]
frame Minimal unit of information that can be decoded completely, it is usually holds a full frame video frame, a group of audio samples or a subtitle line.
it [-is] usually holds Also, I know what a frame is, but the description is a bit confusing. It says it's a minimal unit that can be decoded completely, but below it says you can only start decoding at a keyframe. So you can only decode a given frame/unit completely if it's either a keyframe or you have decoded previous (in dts sense) frames which the current frame depends on.
Keyframe A keyframe is a frame from which you can start decoding. The nth frame is a keyframe if and only if frames n, n+1, ... in presentation order (that are all frames with a pts >= frame[n].pts) can be decoded successfully without reference to frames prior n in storage order (that are all frames with a dts < frame[n].dts). If no such frames exist (for example due to using overlapped transforms like the MDCT in an audio codec), then the definition shall be extended by dropping n out of the set of frames which must be decodable, if this is still insufficient then n+1 shall be dropped, and so on until there is a keyframe. Every frame which is marked as a keyframe MUST be a keyframe according to the definition above, a muxer MUST mark every frame it knows is a keyframe as such, a muxer SHOULD NOT analyze future frames to determine the keyframe status of the current frame but instead just set the frame as non-keyframe.
IMHO the last comma should be replaced by 'and' (i.e. so it says: A, B and C) or the sentence could be split up in three sentences.
1.2. Syntax Convetions
Since NUT heavily uses variable length fields, the simplest way to describe it is using a pseudocode approach instead of graphical bitfield descriptions.
The syntax uses datatypes, tagnames and C-like constructs.
1.2.1. Datatypes
f(n) n fixed bits in bigendian order
big-endian
u(n) Unsigned value encoded in n bits MSB-first
v Unsigned variable length value.
value=0 do{ more_data u(1) data u(7) value= 128*value + data }while(more_data)
I'd prefer spaces between do and { and } and while, but I suppose that's a matter of personal taste. Same for value = instead of value=.
Figure 1: Variable Length Unsigned Value
Values can be encoded using the following logic: the data is in network order, every byte has the most significant bit used as flag and the following 7 used to store the value. The first N bit
bits
are to be taken, where N is number of bits representing the value modulo 7, and stored in the first byte. If there are more bits, the flag bit is set to 1 and the subsequent 7bit are stored in the
7 bits
following byte, if there are remaining bits set the flag to 1 and the same procedure is repeated. The ending byte has the flag bit set to 0.
I find this description a bit confusing, e.g. it's not clear when you talk about the input value and when about the output bytes. [..]
Strings and binary data can be encoded basically writing the byte count as a Variable Length Unsigned Value and the the string. The
and then the
strings MUST be encoded in utf-8
t Variable length binary data (or utf-8 string).
wrong description
tmp v id= tmp % time_base_count value= (tmp / time_base_count) * time_base[id]
Figure 4: Variable Length Timestamp
--Ivo