[FFmpeg-devel] [PATCH] H.264/AVCHD interlaced fixes
Laurent Aimar
fenrir
Sun Feb 1 11:50:42 CET 2009
Hi,
On Sat, Jan 31, 2009, Ivan Schreter wrote:
> Laurent Aimar wrote:
> > On Sat, Jan 31, 2009, Ivan Schreter wrote:
> >
> >> To support key frames, SEI message recovery point is decoded and recovery
> >> frame count stored on the context. This is then used to set key frame flag
> >> in decoding process.
> >>
> > You are misusing the SEI recovery point semantic.
> > D.2.7 of ITU H264 says:
> > [...]
> > So, a frame count >= 0 does not mean that the frame is a key frame BUT that
> >
> Yes and no. We already had this discussion with Michael and at last I
> agreed to him that "key frames" in the sense of ffmpeg are the frames
> where we can restart decoding safely, i.e., the frames having SEI
> recovery frame count.
>
> > if you reset the decoder, and start by decoding the picture with the SEI and that
> > you throw away the N first decoded frames (outputed in presentation order), then
> > for now on, you have acceptable frames for display.
> >
> Of course, the user needs to decode at least recovery frame count frames
> in order to get pictures of acceptable quality.
Not really drop N decoded frames but drop N outputed frames by the decoder (which
is not the same with reordering).
> There even is a variable
> prepared for this in AVPacket (convergence_duration), which is supposed
> to address this (so the user knows how much to decode).
>
> > Example with a simple GOP structure with standard I/P, you can have
> > Type: I P P P I
> > recovery_frame_cnt: 0 3 2 1 0
> >
> You are mistaken here. SEI recovery point message is generally NOT
> present.
Well you are hoping that, but maybe it is true for streams in the wild.
> The reasoning behind having SEI recovery point is different: In
> H.264, a P or B frame does not necessarily refer only to frames starting
> with last I frame. It can refer also to frames _before_ start of current
> GOP, i.e., to older I/P frames. Well, actually, the term "frame" is
> incorrect here. H.264 uses term "slice", which can represent anything
> from a few macroblocks through a field up to a whole frame. Each slice
> can be I/P/B/SI/SP and not all slices in the frame have to be of same type.
>
> Let's take a simple example: I(0) B(-2) B(-1) P(3) B(1) B(2) P(6) B(4)
> B(5) I (9) B(7) B(8) P(12) B(10) B(11) ...
> Now, let's assume, an object displayed in P(3) got hidden while
> displaying I(9) and reappeared in frame B(10). The encoder can either
> encode the object anew, or it can simply let B(10) refer to P(3).
> However, P(3) is before I(9), so restarting from I(9) would break
> display of B(10).
>
> To address this problem, SEI recovery frame cnt is associated with I(9),
> telling the decoder it has to decode at least recovery_frame_cnt
> (whatever it is) frames, before effects of B(10) will disappear.
>
> exact_match flag specifies, whether it's going to be exact match or
> approximate. For approximate, it's still an acceptable picture for
> display, but not 1:1 as decoded when starting with SEI with exact match.
> For instance, if we had only P frames in sequence, after a while the
> picture decoded from any starting point starts looking like the original
> picture. So for such purpose one could use approximate match SEI
> recovery point. I haven't seen such sample yet, though.
>
> BTW, in the samples I have SEI recovery point with exact match and
> recovery_frame_count = 0 is present for I frames in GOP, since the files
> I have do not refer to frames before current I frame. Therefore I wrote
> first version which will work correctly at least with recovery_frame_cnt
> == 0.
>
> > I think that the only safe case is when recovery_frame_cnt is 0 and
> > exact_match_flag is true.
> >
> This is the case in my samples.
>
> >> In the parser, it is used to communicate (synthetic)
> >> picture type to libavformat's av_read_frame() routine to correctly set key
> >> flag and compute missing PTS/DTS timestamps.
> >>
> > Missing PTS/DTS can only be correctly recreated if the h264 parser implements
> > a complete DPB buffer handler.
> > I/P/B in h264 just specify the tools available, and not at all the frame
> > order(unlike in mpeg2 and mpeg4 part 2).
> > For example, you can use B frames instead of P frames without changing the
> > order of decoding and presentation, the B simply using past references.
> >
> Uhm, again, those are not "frames". For instance, "I-frame" of
> interlaced H.264 video can be composed of one "I-slice" in first field
> picture and one "P-slice" in second field, which refers to the first
> field. This is also the case in AVCHD samples from recent camcorders.
Sorry, it was a shortcut, I should have said "In the case of a stream for which
every picture is coded as frame using only one type of slice per picture, the type
being X"...
In this mail, every time I speak about a X frame, it is defined as above.
But my example stands, the picture type (in a stream like I described) does not
have any relation with timestamp.
> I'm not claiming all cases are handled. I just want to help support
> AVCHD camcorders finally.
>
> As for the timestamps, I and P "frames" must declare both PTS/DTS in
> H.222.0 stream. B "frames" don't have to (although in my sample files
> they do).
I don't think so. The only things mandatory are (from memory):
- a pts at least every 700ms.
- if the dts is written, the pts must be written too.
- if a pts is written but not the dts, then dts == pts.
Sor for a GOP of about 500ms, you could have a pts/dts only on the I/key frame.
> Correct computation of PTS/DTS is already handled in libavformat.
For example, With classic H264 B pyramid, I am not sure the parser does it,
but I have not checked it.
> >> To support interlaced mode, it was needed to collate two field pictures
> >> into one buffer to fulfill av_read_frame() contract - i.e., reading one
> >> whole frame per call.
> >>
> > This will limit you to support only a subset of the H264. Nothing prevents
> > a H264 stream to first encodes 2 tops and then the two bottoms. (I am not
> > sure to have seen such streams).
> >
> Yes. With or without my patch, it nevertheless wouldn't work. It can be
> added in the future by reordering the fields by frame number in the
> parser (I want to eventually implement pairing based on frame number, as
> I already wrote). But I cannot really imagine, someone would produce
> such a brain-damaged stream...
Well, for example you could encode every top as a P picture, and every bottom
as B picture. I do not think it is brain-damaged, and this stream will not have
consecutive top and bottom .
Now again, I do not think such stream exists in the wild. It was just an example.
> >> There is one open point, though: Although it seems that top field pictures
> >> are always preceding matching bottom field pictures, this is not fixed in
> >> the standard. Current implementation relies on this.
> >>
> > This cannot correctly works, bottom field first video are common.
> >
> Give me a sample. Note: only interlaced videos coded as field pictures
> (not whole frames) bottom-field-first won't work (currently they don't
> work anyway, so what). Videos coded by frame pictures or non-interlaced
> will work correctly.
I will try to find one that I can share. It exits only for SD video (as HD
is always(?) top field first.
Regards,
--
fenrir
More information about the ffmpeg-devel
mailing list