[FFmpeg-devel] [PATCH 1/3] avcodec: add a parser flag to enable keyframe tagging heuristics

Sun Jul 18 06:31:36 EEST 2021

James Almer:
> On 7/17/2021 10:23 PM, Andreas Rheinhardt wrote:
>> James Almer:
>>> On 7/15/2021 5:23 PM, Michael Niedermayer wrote:
>>>>
>>>> the concept of a keyframe is a point at which decoding can begin
>>>> that really are at least 3 points
>>>>
>>>> the point at which packets begin to be input into the decoder
>>>>
>>>> the point at which the decoder is able to return some decoded
>>>> data which closely resembles the encoder input
>>>>
>>>> and the point at which the decoder output matches 1:1 the output
>>>> of a decoder starting from frame 0
>>>
>>> All parsers save for h264 are currently only tagging packets containing
>>> a coded bitstream that, when decoded, it fully resets the decoding state
>>> and depends on no previously parsed data or state, which is what (most)
>>> muxers expect. This approach here is making the h264 do the same by
>>> default (in line with the decoder), to ensure some muxers don't wrongly
>>> mark certain packets as sync samples, while letting others remain
>>> liberal about it.
>>>
>> That is not true: The HEVC parser marks packets that may have leading
>> RASL pictures as keyframe; such frames are not sync samples according to
>> my version of ISO/IEC 14496-15. (Furthermore, for parsers that don't set
>> key_frame the recommended fallback is by checking pict_type for
>> AV_PICTURE_TYPE_I (parse_packet() in libavformat/utils.c does this); if
>> one follows this, then MPEG-2 I-frames will be marked as keyframes, even
>> when they are not sync samples in ISOBMFF if there is an open GOP.)
>>
>> It seems to be mostly followed that random access points are keyframes
>> even if they are not IDR frames/even if there is an open GOP. In fact,
>> the AV1 parser (which does not set it for delayed random access points
>> (AV1's equivalent of open GOP)) seems to be an exception.
>>
>> And your claim (also contained in the commit message) that this brings
>> the parser in line with the decoder is wrong, too: output_frame() in
>> h264dec.c sets key_frame depending on sei_recovery_frame_cnt.
> 
> True, missed that. But for example in the case of intra_refresh.h264 it
> would not trigger, whereas the parser does tag it.
> 
Yeah, we don't handle intra refresh well at all; and I am pretty sure
that quite a lot of people would not consider the packets containing the
recovery point SEI message a keyframe if recovery_frame_cnt is > 0. As
has been said in the earlier thread, there is no single correct
definition for keyframe.

- Andreas