[FFmpeg-devel] [PATCH 2/2] avformat/movenc: parse h264 packets to build Sync Sample and Recovery Point tables

Sat Jul 31 20:51:16 EEST 2021

On 7/31/2021 2:46 PM, Michael Niedermayer wrote:
> On Sat, Jul 31, 2021 at 01:10:27PM -0300, James Almer wrote:
>> On 7/31/2021 6:41 AM, Michael Niedermayer wrote:
>>> On Fri, Jul 30, 2021 at 10:31:53AM -0300, James Almer wrote:
>>>> On 7/30/2021 8:33 AM, Michael Niedermayer wrote:
>>>>> On Thu, Jul 29, 2021 at 03:09:03PM -0300, James Almer wrote:
>>>>>> On 7/29/2021 2:58 PM, Michael Niedermayer wrote:
>>>>>>> On Tue, Jul 27, 2021 at 10:08:20AM -0300, James Almer wrote:
>>>>>>>> Since we can't blindly trust the keyframe flag in packets and assume its
>>>>>>>> contents are a valid Sync Sample, do some basic bitstream parsing to build the
>>>>>>>> Sync Sample table in addition to a Random Access Recovery Point table.
>>>>>>>>
>>>>>>>> Suggested-by: ffmpeg at fb.com
>>>>>>>> Signed-off-by: James Almer <jamrial at gmail.com>
>>>>>>>> ---
>>>>>>>>      libavformat/movenc.c         | 125 +++++++++++++++++++++++++++++++++--
>>>>>>>>      libavformat/movenc.h         |   1 +
>>>>>>>>      tests/ref/lavf-fate/h264.mp4 |   6 +-
>>>>>>>>      3 files changed, 123 insertions(+), 9 deletions(-)
>>>>>>>
>>>>>
>>>>> A.
>>>>>>> Will this allow a user to mark a subset of keyframes as random
>>>>>>> access points ?
>>>>>>> A user may want seeking to preferably happen to a scene start and not a
>>>>>>> frame before
>>>>>>>
>>>>>
>>>>> B.
>>>>>>> Will this allow a user to mark a damaged keyframe as such ?
>>>>>>> A frame might in fact have been a keyframe and maybe the only in the file
>>>>>>> but maybe was damaged so a parser might fail to detect it.
>>>>>>
>>>>>> This code will ensure all packets with an IDR picture will be listed in the
>>>>>> Sync Sample table, and all packets with a Recovery Point SEI message will be
>>>>>> listed in the Random Access Recovery Point table.
>>>>>> Whatever is signaled in packets (like the keyframe flag) is ignored to
>>>>>> prevent creating non-spec compliant output.
>>>>>
>>>>> The file in case B will be non compliant, it is damaged data, it cannot
>>>>> be muxed in a compliant way. If we support muxing damaged data then
>>>>> parsing that data as a way to unconditionally override the user is
>>>>> problematic.
>>>>> If you try to interpret damaged data the result is undefined, the spec
>>>>> does not tell you how to interpret it and 2 parsers can disagree
>>>>> if a damaged frame is a keyframe.
>>>>
>>>> If you don't mark a packet as a Sync Sample, even if it contains damaged
>>>> data, the file is still compliant. The point here is to avoid filling the
>>>> Sync Sample table with bogus entries.
>>>
>>> IMHO, damaged data is not always a compliant video stream, otherwise in the most extreem
>>> example /dev/random would have to be a compliant h264 stream.
>>>
>>>
>>>>
>>>>>
>>>>> Lets just as 2 examples consider multiple slices some being parsed as IDR
>>>>> some as non IDR, so what is that ? or what if some packet reodering or
>>>>> duplication causes parts of a IDR and non IDR frame to be mixed.
>>>>> where is the IDR frame in that case ? 2 parsers can disagree
>>>>
>>>> The AVCodecParserContext will tag a packet with an IDR slice as keyframe,
>>>> and then disregard anything else it may find. I made the code i wrote here
>>>> do the same.
>>>
>>> If thats the case, we could take the example of a frame lets say it has
>>> 100 slices, and in one of these 1 byte is changed so it becomes a IDR slice
>>> That is not really a keyframe or sync symple yet it will
>>> be marked as one IIUC.
>>
>> Unless it's the first slice NALU out of those 100 in the AU, it will not be
>> marked as a keyframe by the AVCodecParserContext, nor as a Sync Sample by
>> this code.
>>
>>> Now to continue with this example, which way will it work better
>>> X. With this marked not a keyframe or
>>> Y. With this marked as keyframe as the parser detects ?
>>
>> It should not be marked as a Sync Sample.
>>
>>>
> 
>>> Its not containing a single valid IDR slice just 1 of the non IDR slices
>>> have been damaged and are misparsed. That will not work well as a sync sample
>>> yet as a normal sample the 1% of damage might not be noticed at all as its copied
>>> from the last frame
>>
>> How about i look at every slice NALU in the AU and ensure there's only IDR
>> slices before marking it as a Sync Sample, instead of only looking at the
>> very first slice and decide based on it?
>> Same thing could be done in the AVCodecParserContext, for that matter.
> 
> Then you get a IDR frame wrong if it has a single damaged slice where its
> parsed as non IDR.

Same situation: It does not get added to the Sync Sample table.

> 
> What you could do and i do not suggest this actually, just hypothetically
> is take the majority >50% being IDR as probably IDR. That will probably
> require extra care when there are 2 fields or frames in one packet
> 
> The problem is if you unconditionally overreide the user you really need
> to be perfect and this is not easy. More so as being done by default it
> too should be low overhead not slowing things down
> 
> 
>>
>>>
>>>
>>>
>>>>
>>>>> still a decoder can with some luck start decoding from such a
>>>>> point i would guess. But does the user want this marked as a keyframe ?
>>>>> maybe yes, maybe no. I think taking the users only way to set the keyframe
>>>>> flag away is a step backward
>>>>
>>>> Look at MPEG2 parsing and others in this container. It's the same scenario.
>>>> Did any of these concerns show up back then?
>>>> What I'm doing here for h264 is the same thing we did for those, to ensure
>>>> we don't create non spec compliant files because of damaged or mistagged
>>>> input, or a careless user.
>>>>
>>>> Letting the user mark whatever they want as a Sync Sample on a container
>>>> where it's explicit what should and should not be such is not good practice.
>>>> If you want that, you can use Matroska and other formats where keyframe
>>>> tagging is apparently not strict.
>>>>
>>>> Now, i want to point out that I wrote this parsing code here because my
>>>> previous attempt at stopping the AVCodecParserContext from being too liberal
>>>> marking things as keyframes was rejected *precisely* because it would let
>>>> the user create non compliant output in mp4. And now you don't want this one
>>>> either because you want to let the user create non complaint output.
>>>> I'm running in circles here and it's getting tiresome.
>>>
>>> I see the problem and i agree with you that it is a problem that needs
>>> a solution. Its very bad if the users sets bad flags and it results in
>>> non-compliant files. What my concern is, is that theres no way for the
>>> user to override this when its the parsing code thats wrong and previously
>>> the user could override this.
>>> It may be very rare that the parser is wrong and a user would bother to
>>> override it, i do not know, then again some people are quite crazy with
>>> the level of perfection they try to achieve in their videos
>>
>> The only reason the user may have wanted to override the packet's keyframe
>> flag was precisely because the AVCodecParserContext was tagging way too many
>> packets as such. This would no longer be necessary after this change, and
>> you can trust the muxer to do the right thing.
>>
> 
>> I can however accept a compromise here, and take into account the packet's
>> keyframe flag if and only if invalid or unexpected data is found in the
>> bitstream. If the parsing succeeds, then whatever the packet metadata may
>> signal should be outright ignored.
> 
> how do you detect invalid or unexpected data ?

Any parsing error for which we already abort, and if we find a mix of 
IDR and non IDR slices within a given AU. Instead of deciding to not add 
them to the Sync Sample table like i mentioned above, we look at the 
packet's keyframe flag and use it as last resort.

> 
> 
>>
>>>
>>> Theres also the 2nd situation where all the information the parser
>>> extracts is already known to the lib user and going over the whole
>>> bitstream is wasting cpu cycles. This can be a situation where for
>>> example all input files where just a moment ago created by FFmpeg
>>> and we assume these would then be as compliant as redoing the parsing
>>> would make them
>>
>> Other information, like Recovery Point SEI messages, can't be properly
>> signaled by encoders within packet metadata. Muxer can and should, for now,
>> look at the bitstream for this purpose.
> 
> yes
> 
> thx
> 
> [...]
> 
> 
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
>