[FFmpeg-devel] [PATCH] AVCHD/H.264 parser: determination of frame type, question about timestamps

Mon Jan 26 01:16:47 CET 2009

On Mon, Jan 26, 2009 at 12:13:13AM +0100, Ivan Schreter wrote:
> Michael Niedermayer wrote:
> > On Fri, Jan 23, 2009 at 10:46:39PM +0100, Ivan Schreter wrote:
> >   
> >>>> +    while(buf<buf_end){
> >>>> +        buf= ff_find_start_code(buf, buf_end, &state);
> >>>> +        init_get_bits(&h->s.gb, buf, 8*(buf_end - buf));
> >>>> +        switch(state & 0x1F){
> >>>> [...]
> >>>> +        case NAL_IDR_SLICE:
> >>>> +        case NAL_SLICE:
> >>>> +            get_ue_golomb(&h->s.gb);
> >>>> +            s->pict_type= golomb_to_pict_type[get_ue_golomb(&h->s.gb) % 5];
> >>>>     
> >>>>         
> >>> IIRC that should be get_ue_golomb_31() also its missing some check against
> >>> negative values
> >>>       
> >> I don't believe so. First, slice_type is defined in H.264 standard in 
> >> 7.3.3 (Slice header syntax) being ue(v), so not limited to 31. If I 
> >>     
> > this reasoning is flawed, to proof something is allowed you need to find
> > an explicit statement that it is allowed or proof the non existence of a
> > statement making it invalid.
> > Showing that a single line of text at some point doesnt outright contradict
> > it is not enough.
> >   
> Answer see below.
> >> understand golomb.h correctly, this corresponds to get_ue_golomb(). Of 
> >> course, since there are just 5 slice types, this value should be small, 
> >> but some broken encoder could write something else in there. Do we want 
> >> to live with it (and save just a single if per picture)? Further, it is 
> >> unsigned, so it can't be negative. And modulo 5 makes it always being in 
> >> the range 0..4, as in the array.
> >>     
> > -1 % 5
> >   
> You contradict yourself.

no

> On the one side you say get_ue_golomb_31() must 
> be enough (yes, it is, since legal values for slice_type are 0..9), OTOH 
> you want to check it for non-negative (_although_ legal values are in 
> range 0..9, i.e., non-negative).

yes
we cannot just set pict_type to some random value, this could have
unpredictable effects, people can feed all kinds of trash into the
decoder, it must not give them a root shell ;)

> 
> BTW, what I also find funny, get_ue_golomb() functions return int 
> instead of unsigned int, although internal buffer is unsigned int. 
> Similarly, set_ue_golomb() functions take int instead of unsigned int as 
> argument. Maybe this should be corrected as well (and save a few checks 
> and warnings in the process).

maybe
someone has to review all ue_golomb() uses then though to make
sure this has no sideeffects.

> 
> But this is peanuts compared to the really important stuff below...
> > [...]
> >   
> >> I believe the only way is to use recovery point SEI message as 
> >> documented in H.264 spec Chapter D.2.7 and count down recovery_frame_cnt 
> >> frames before issuing a "key frame" in order for seeking to work 
> >>     
> >
> > no, the keyframe flag must be set at the point of the SEI not after it
> > you also should set convergence_duration
> > Still even if we do just part of this its already a step forward, important
> > is that we move toward the correct solution instead of adding random hacks.
> >   
> That's the question. If we set the key flag at SEI recovery point, how 
> do you want to seek to proper place in an application using libavformat 
> and not knowing anything about convergence_duration (at the moment all 
> applications I checked)? As long as any application doesn't handle 
> convergence_duration, it will be broken with H.264.

Keyframes have their semantics like
a frame from which on the decoder can correctly decode all further frames
with no reference to previous frames.

These semantics no longer hold in light of convergence_duration!=0, no
matter which way you twist it, it will not work without some apps needing
to be changed slightly.

[...]

> For instance, imagine a video editing application (like kdenlive) needs 
> to seek to frame X. There is a SEI recovery point at X-3 with 
> convergence_duration == 5. How can the application now get to frame X? 
> It asks to seek to X, which will position the stream at key frame at 
> position X-3.

no it will not do this it will position the stream farther to the left
to ensure that X is after the convergence point when the application
asks for <=X if it asks just for something randomly around X it might
end up at X-3 but then that was what it asked for

[...]

> 
> With new seeking API you proposed, handling of convergence duration 
> (internal or by application) could be selected per flag.
> 
> BTW, I opened a new thread, since this thread is a little off-topic now. 
> Please answer in the new thread, if possible.
> 
> > [...]
> >> which is IMHO broken. Of course, we could communicate with it by setting 
> >> pict_type to FF_I_TYPE for keyframes only (IDR frames and frames after 
> >> recovery point), for other frames containing I- and P-slices to 
> >> FF_P_TYPE and for B-frames to FF_B_TYPE. But I don't like it much. Any 
> >> idea, how to do it correctly without the need to touch other codecs?
> >>     
> >
> > pict_type from the parse context should likely be split in pict_type and
> > keyframe
> >   
> Actually, we already have a flag field on AVPacket coming from the 
> parser, but compute_pkt_fields() doesn't believe it and resets it based 
> on pict_type from parse context instead.

the parser works with char* buffers not AVPackets

> 
> >> Further, I found out that av_read_frame() sometimes reads only one field 
> >> of a frame (especially in interlaced mpegts AVCHD files), which confuses 
> >> the rest of ffmpeg. I suppose, this is a bug in h264_parse(), which 
> >> returns a single field instead of whole frame (when a frame is coded as 
> >> two fields in two pictures), but I didn't find a way yet, how to address 
> >> the problem. Any idea?
> >>     
> > keep in mind that AFIAK h264 allows unpaired fields that is
> > 1 frame, 1 field, 1 frame
> > I have no samples that actually does this so we might get away with ignoring
> > it but if we choose to not support it we should have some argument why
> > it would be too complex ..
> >   
> Umm... You are referring to D.2.2 (picture timing semantics). 

no, i was not refering to that

[...]

> 
> Of course, we still have the problem of frame doubling/tripling and 
> having 3 fields per picture, eventually with one of them repeated 
> (pic_struct codes 5-8).

no we do not have a problem with this, we do not and never will duplicate
anything we just export the information and the app can do with it what it
wants, thats also exactly what we do in mpeg2

> 
> BTW, the reply would probably again make sense in the new thread.
> 

> FYI, I started adding some additional constants to the h264 code for 
> other SEI messages and for parsing SEI recovery point as the beginning, 
> as this will be needed in all cases.

good, and the patch for it?

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

He who knows, does not speak. He who speaks, does not know. -- Lao Tsu
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090126/b602bb7c/attachment.pgp>