[MPlayer-dev-eng] NUT (informal) proposal, based on discussions

Oded Shimon ods15 at ods15.dyndns.org
Fri Jan 20 08:53:18 CET 2006


On Wed, Jan 18, 2006 at 11:32:47PM -0500, Rich Felker wrote:
> On Wed, Jan 18, 2006 at 11:23:07PM +0100, Michael Niedermayer wrote:
> > Hi
> > 
> > On Tue, Jan 17, 2006 at 05:59:23PM -0500, Rich Felker wrote:
> > [...]
> > > Comments?
> > 
> > after thinking about this again a little i think i am strongly in favor
> > of a single pts and single ptr per syncpoint and nothing more complicated
> > 
> > with index: optimal seeking
> > without index: seeking to a point which has a keyframe in every stream
> > prior to the user specified timestamp
> > 
> > so even witout an index you can do exact seeking, and normal use of
> > video+audio+subtitle files will be fine, only the case where you
> > want very quick and exact seeking to a subset of streams will not
> > work as well as with the more complicated cases but i really think
> > that simplicity and overhead of the format is more important then
> > that
> 
> ok, with all of our ideas to fix the problem failing, i've talked to
> oded and i'm willing to give up per-stream back_ptr in syncpoints,
> provided we have a provable algorithm to do exact seeking in any
> stream with O(back_ptr) linear search. unless we're mistaken and
> missing a stupid case, i think this can be done without any problem.
> implementations are of course free to skip any further steps and just
> use the back_ptr as-is.
> 
> one of us will send an updated proposal soon.

Here it is..

step-4:
  add EOR
  add coded stream flags
  rearrange header a little bit - the actual changes are that tmp_mul and 
  tmp_pts are after tmp_stream, and everything has a default value.

step-5:
  change goals slightly..
  max_index_distance removed
  index changed to combination of syncpoint index and pts for keyframes, 
  using ideas by michael and myself
  syncpoint still single back_ptr and pts, however back_ptr is changed - 
  points to most correct keyframe, not most recent.
  syncpoint timestamp is max(last_dts)

cosmetic:
  move things around to IMO more logical positions.



whats left:
  info streams, how do they deal with EOR, back_ptr, etc.
  index repetition (atleast the option for it)


I see in my old notes something about "fourcc for PCM", as in, PCM audio 
can't really be stored in NUT easily because you need byte order etc., and 
there is no defined fourcc for PCM's that tells that info, but I guess 
that's not a really worry for the spec...


I want to test the index spec (<100kb might be a lie, but i doubt it), and 
commit step-4, step-5 and cosmetic... Is anyone against?

BTW, does anyone have a good idea for efficient but simple data structure 
for decoded index?... The naive way requires 
streams*syncpoints*sizeof(uint64_t) memory, which is too much. (5mb of 
ram?..) Any other idea I can think of requires some complication for 
just using the values or changing them... (I still want the ability to 
dynamically build the index while playing)

BTW2, with linear interpolation, for a 700mb file, seeks average to 4-10 
underlying seeks for the binary search.

- ods15
-------------- next part --------------
--- mpcf.3.txt	2006-01-20 09:26:15.000000000 +0200
+++ mpcf.4.txt	2006-01-20 09:29:08.000000000 +0200
@@ -1,5 +1,5 @@
 ========================================
-NUT Open Container Format DRAFT 20060105
+NUT Open Container Format DRAFT 20060120
 ========================================
 
 
@@ -134,20 +134,26 @@
     for(i=0; i<256; ){
         tmp_flag                        v
         tmp_fields                      v
-        if(tmp_fields>0) tmp_pts        s
-        if(tmp_fields>1) tmp_mul        v
-        if(tmp_fields>2) tmp_stream     v
-        if(tmp_fields>3) tmp_size       v
+        if(tmp_fields>0) tmp_sflag      v
+        else tmp_sflag=0
+        if(tmp_fields>1) tmp_stream     v
+        else tmp_stream=0
+        if(tmp_fields>2) tmp_mul        v
+        else tmp_mul=1
+        if(tmp_fields>3) tmp_pts        s
+        else tmp_pts=0
+        if(tmp_fields>4) tmp_size       v
         else tmp_size=0
-        if(tmp_fields>4) tmp_res        v
+        if(tmp_fields>5) tmp_res        v
         else tmp_res=0
-        if(tmp_fields>5) count          v
+        if(tmp_fields>6) count          v
         else count= tmp_mul - tmp_size
-        for(j=6; j<tmp_fields; j++){
+        for(j=7; j<tmp_fields; j++){
             tmp_reserved[i]             v
         }
         for(j=0; j<count && i<256; j++, i++){
             flags[i]= tmp_flag;
+            stream_flags[i]= tmp_sflag;
             stream_id_plus1[i]= tmp_stream;
             data_size_mul[i]= tmp_mul;
             data_size_lsb[i]= tmp_size + j;
@@ -208,6 +214,9 @@
     if(flags[frame_code]&1){
         data_size_msb                   v
     }
+    if(flags[frame_code]&2){
+        coded_stream_flags              v
+    }
     for(i=0; i<reserved_count[frame_code]; i++)
         reserved                        v
     data
@@ -302,6 +311,11 @@
     one keyframe for each stream lies between the syncpoint to which
     real_back_ptr points, and the current syncpoint.
 
+    A stream where EOR is set is to be ignored for back_ptr.
+
+    Note: back_ptr can be zero if there is only a single relavent stream
+    and has a keyframe immediately following the syncpoint.
+
 global_key_pts
     After a syncpoint, last_pts of each stream is to be set to:
     last_pts[i] = convert_ts(global_key_pts, timebase[stream], timebase[i])
@@ -417,17 +431,27 @@
     different from the first byte of any startcode
 
 flags[frame_code]
-    first of the flags from MSB to LSB are called KD
-    if D is 1 then data_size_msb is coded, otherwise data_size_msb is 0
-    K is the keyframe_type
-        0 -> no keyframe,
-        1 -> keyframe,
-    flags=4 can be used to mark illegal frame_code bytes
-    frame_code=78 must have flags=4
-    Note: frames MUST NOT depend(1) upon frames prior to the last
-          frame_startcode
-    Important: depend(1) means dependency on the container level (NUT) not
-    dependency on the codec level
+    Bit  Name             Description
+      1  data_size_msb    if set, data_size_msb is at frame header,
+                          otherwise data_size_msb is 0
+      2  more_flags       if set, stream control flags are at frame header.
+      4  invalid          if set, frame_code is invalid.
+
+    frame_code=78 ('N') MUST have flags=64
+
+stream_flags
+    stream_flags is "stream_flags[frame_code] ^ coded_stream_flags"
+
+    Bit  Name               Description
+      1  is_key             if set, frame is keyframe
+      2  end_of_relevance   if set, stream has no relevance on
+                            presentation. (EOR)
+
+    EOR frames MUST be zero-length and must be set keyframe.
+    All streams SHOULD end with EOR, where the pts of the EOR indicates the
+    end presentation time of the final frame.
+    An EOR set stream is unset by the first content frames.
+    When an EOR is unset, dts_cache of the stream is reset to -1.
 
 stream_id_plus1[frame_code]
     must be <250
@@ -476,7 +500,8 @@
     this buffer is initalized with decode_delay -1 elements
 
     Pts of all frames in all streams MUST be bigger or equal to dts of all
-    previous frames in all streams, compared in common timebase.
+    previous frames in all streams, compared in common timebase. (EOR
+    frames are NOT exempt from this rule)
 
 width/height
     MUST be set to the coded width/height
-------------- next part --------------
--- mpcf.4.txt	2006-01-20 09:29:08.000000000 +0200
+++ mpcf.final.txt	2006-01-20 09:36:25.000000000 +0200
@@ -21,13 +21,13 @@
 
 Compact
     ~0.2% overhead, for normal bitrates
-    index is <10kb per hour (1 keyframe every 3sec)
+    index is <100kb per hour
     a usual header for a file is about 100 bytes (audio + video headers together)
     a packet header is about ~1-5 bytes
 
 Error resistant
     seeking / playback without an index
-    headers & index can be repeated
+    headers can be repeated
     damaged files can be played back with minimal data loss and fast
     resync times
 
@@ -130,7 +130,6 @@
     version                             v
     stream_count                        v
     max_distance                        v
-    max_index_distance                  v
     for(i=0; i<256; ){
         tmp_flag                        v
         tmp_fields                      v
@@ -224,12 +223,41 @@
 index:
     index_startcode                     f(64)
     packet header
-    stream_id                           v
     max_pts                             v
-    index_length                        v
-    for(i=0; i<index_length; i++){
-        index_pts                       v
-        index_position                  v
+    syncpoints                          v
+    for(i=0; i<syncpoints; i++){
+        syncpoint_pos_div8              v
+    }
+    for(i=0; i<stream_count; i++){
+        for(j=0; j<syncpoint_count; ){
+            x                           v
+            type= x & 1
+            x>>=1
+            if(type){
+                flag= x & 1
+                x>>=1
+                while(x--)
+                    has_keyframe[j++][i]=flag
+                has_keyframe[j++][i]=!flag;
+            }else{
+                while(x != 1)[
+                    has_keyframe[j++][i]=x&1;
+                    x>>=1;
+                }
+            }
+        }
+        M                               v
+        for(j=0; j<syncpoint_count; j++){
+            if (!has_keyframe[j++][i]) continue
+            if (repeat) repeat--
+            else {
+                A                       v
+                if (A > M) last_diff = A - M
+                else repeat = A
+            }
+            last_pts += last_diff
+            keyframe_pts[j][i] = last_pts
+        }
     }
     reserved_bytes
     checksum                            u(32)
@@ -264,7 +292,7 @@
     coded_pts                           v
     stream = coded_pts % stream_count
     global_key_pts = coded_pts/stream_count
-    back_ptr                            v
+    back_ptr_div8                       v
 
             Complete definition:
 
@@ -290,9 +318,7 @@
         }
     }
     if (next_code == index_startcode){
-        while(!eof){
-            index
-        }
+        index
         index_ptr                       u(64)
     }
 
@@ -304,22 +330,22 @@
     size of the packet data (exactly the distance from the first byte
     after the forward_ptr to the first byte of the next packet)
 
-back_ptr
-    real_back_ptr = back_ptr * 8 + 7
-    real_back_ptr must point to a position such that a syncpoint
-    startcode begins within the next 8 bytes, and such that at least
-    one keyframe for each stream lies between the syncpoint to which
-    real_back_ptr points, and the current syncpoint.
+back_ptr_div8
+    back_ptr = back_ptr_div8 * 8 + 7
+    back_ptr must point to a position within 8 bytes of a syncpoint
+    startcode. This syncpoint MUST be the closest syncpoint such that at
+    least one keyframe with a pts lower or equal to the original syncpoint's
+    global_key_pts for all streams lies between it and the current syncpoint.
 
     A stream where EOR is set is to be ignored for back_ptr.
 
-    Note: back_ptr can be zero if there is only a single relavent stream
-    and has a keyframe immediately following the syncpoint.
-
 global_key_pts
     After a syncpoint, last_pts of each stream is to be set to:
     last_pts[i] = convert_ts(global_key_pts, timebase[stream], timebase[i])
 
+    global_key_pts MUST be the highest dts across all streams for all
+    frames before the syncpoint.
+
 file_id_string
     "nut/multimedia container\0"
 
@@ -358,13 +384,6 @@
     good reason to set it higher, otherwise reasonable error recovery will
     be impossible
 
-max_index_distance
-    max distance of keyframes which are represented in the index, the
-    distance between consecutive entries A and B may only be larger if
-    there are no keyframes within this stream between A and B
-    SHOULD be set to <=32768 or at least <=65536 unless there is a very
-    good reason to set it higher
-
 stream_id
     Stream identifier
     stream_id MUST be < stream_count
@@ -528,23 +547,27 @@
     forward_ptr until last byte before the checksum).
 
 max_pts
-    The highest pts in the stream.
-
-index_pts
-    value of the pts of a keyframe relative to the last keyframe
-    stored in this index
-
-index_position
-    position in bytes of the first byte of a keyframe, relative to the
-    last keyframe stored in this index
-    there MUST be no keyframe with the same stream_id as this index between
-    two consecutive index entries if they are more than max_index_distance
-    apart
+    s = max_pts % stream_count
+    pts = max_pts / stream_count
+    The highest pts in the entire file in the timebase of stream 's' .
+
+syncpoint_pos_div8
+    offset from begginning of file to up to 7 bytes before the syncpoint
+    referred to in this index entry. Relative to position of last
+    syncpoint.
+
+has_keyframe
+    indicates whether this stream has a keyframe between this syncpoint and
+    the last syncpoint.
+
+keyframe_pts
+    The pts of the first keyframe for this stream in the region between the
+    2 syncpoints, in the stream's timebase.
 
 index_ptr
-    Length in bytes from the first byte of the first index startcode
-    to the first byte of the index_ptr. If there is no index, index_ptr
-    MUST NOT be written.
+    Length in bytes from the first byte of the index startcode to the first
+    byte of the index_ptr. If there is no index, index_ptr MUST NOT be
+    written.
 
 id
     the ID of the type/name pair, so it is more compact
-------------- next part --------------
--- mpcf.final.txt	2006-01-20 09:36:25.000000000 +0200
+++ mpcf.cosmetic.txt	2006-01-20 09:35:56.000000000 +0200
@@ -326,26 +326,6 @@
 Tag description:
 ----------------
 
-forward_ptr
-    size of the packet data (exactly the distance from the first byte
-    after the forward_ptr to the first byte of the next packet)
-
-back_ptr_div8
-    back_ptr = back_ptr_div8 * 8 + 7
-    back_ptr must point to a position within 8 bytes of a syncpoint
-    startcode. This syncpoint MUST be the closest syncpoint such that at
-    least one keyframe with a pts lower or equal to the original syncpoint's
-    global_key_pts for all streams lies between it and the current syncpoint.
-
-    A stream where EOR is set is to be ignored for back_ptr.
-
-global_key_pts
-    After a syncpoint, last_pts of each stream is to be set to:
-    last_pts[i] = convert_ts(global_key_pts, timebase[stream], timebase[i])
-
-    global_key_pts MUST be the highest dts across all streams for all
-    frames before the syncpoint.
-
 file_id_string
     "nut/multimedia container\0"
 
@@ -361,10 +341,6 @@
 syncpoint_startcode
     0xE4ADEECA4569ULL + (((uint64_t)('N'<<8) + 'K')<<48)
 
-    syncpoint_startcodes SHOULD be placed immediately before a keyframe if the
-    previous frame of the same stream was a non-keyframe, unless such
-    non-keyframe - keyframe transitions are very frequent
-
 index_startcode
     0xDD672F23E64EULL + (((uint64_t)('N'<<8) + 'X')<<48)
 
@@ -374,12 +350,20 @@
 version
     NUT version. The current value is 2.
 
+forward_ptr
+    size of the packet data (exactly the distance from the first byte
+    after the forward_ptr to the first byte of the next packet)
+
 max_distance
     max distance of syncpoints, the distance may only be larger if
     there is no more than a single frame between the two syncpoints. This can
     be used by the demuxer to detect damaged frame headers if the damage
     results in too long of a chain
 
+    syncpoints SHOULD be placed immediately before a keyframe if the
+    previous frame of the same stream was a non-keyframe, unless such
+    non-keyframe - keyframe transitions are very frequent
+
     SHOULD be set to <=32768 or at least <=65536 unless there is a very
     good reason to set it higher, otherwise reasonable error recovery will
     be impossible
@@ -546,6 +530,22 @@
     including the checksum itself (from first byte after the
     forward_ptr until last byte before the checksum).
 
+back_ptr_div8
+    back_ptr = back_ptr_div8 * 8 + 7
+    back_ptr must point to a position within 8 bytes of a syncpoint
+    startcode. This syncpoint MUST be the closest syncpoint such that at
+    least one keyframe with a pts lower or equal to the original syncpoint's
+    global_key_pts for all streams lies between it and the current syncpoint.
+
+    A stream where EOR is set is to be ignored for back_ptr.
+
+global_key_pts
+    After a syncpoint, last_pts of each stream is to be set to:
+    last_pts[i] = convert_ts(global_key_pts, timebase[stream], timebase[i])
+
+    global_key_pts MUST be the highest dts across all streams for all
+    frames before the syncpoint.
+
 max_pts
     s = max_pts % stream_count
     pts = max_pts / stream_count


More information about the MPlayer-dev-eng mailing list