[FFmpeg-devel] [PATCH 2/2] avformat/movenc: parse h264 packets to build Sync Sample and Recovery Point tables

Sun Aug 1 00:12:13 EEST 2021

On 7/31/2021 5:47 PM, Michael Niedermayer wrote:
> On Sat, Jul 31, 2021 at 02:51:16PM -0300, James Almer wrote:
>> On 7/31/2021 2:46 PM, Michael Niedermayer wrote:
>>> On Sat, Jul 31, 2021 at 01:10:27PM -0300, James Almer wrote:
>>>> On 7/31/2021 6:41 AM, Michael Niedermayer wrote:
>>>>> On Fri, Jul 30, 2021 at 10:31:53AM -0300, James Almer wrote:
>>>>>> On 7/30/2021 8:33 AM, Michael Niedermayer wrote:
>>>>>>> On Thu, Jul 29, 2021 at 03:09:03PM -0300, James Almer wrote:
>>>>>>>> On 7/29/2021 2:58 PM, Michael Niedermayer wrote:
>>>>>>>>> On Tue, Jul 27, 2021 at 10:08:20AM -0300, James Almer wrote:
>>>>>>>>>> Since we can't blindly trust the keyframe flag in packets and assume its
>>>>>>>>>> contents are a valid Sync Sample, do some basic bitstream parsing to build the
>>>>>>>>>> Sync Sample table in addition to a Random Access Recovery Point table.
>>>>>>>>>>
>>>>>>>>>> Suggested-by: ffmpeg at fb.com
>>>>>>>>>> Signed-off-by: James Almer <jamrial at gmail.com>
>>>>>>>>>> ---
>>>>>>>>>>       libavformat/movenc.c         | 125 +++++++++++++++++++++++++++++++++--
>>>>>>>>>>       libavformat/movenc.h         |   1 +
>>>>>>>>>>       tests/ref/lavf-fate/h264.mp4 |   6 +-
>>>>>>>>>>       3 files changed, 123 insertions(+), 9 deletions(-)
>>>>>>>>>
>>>>>>>
>>>>>>> A.
>>>>>>>>> Will this allow a user to mark a subset of keyframes as random
>>>>>>>>> access points ?
>>>>>>>>> A user may want seeking to preferably happen to a scene start and not a
>>>>>>>>> frame before
>>>>>>>>>
>>>>>>>
>>>>>>> B.
>>>>>>>>> Will this allow a user to mark a damaged keyframe as such ?
>>>>>>>>> A frame might in fact have been a keyframe and maybe the only in the file
>>>>>>>>> but maybe was damaged so a parser might fail to detect it.
>>>>>>>>
>>>>>>>> This code will ensure all packets with an IDR picture will be listed in the
>>>>>>>> Sync Sample table, and all packets with a Recovery Point SEI message will be
>>>>>>>> listed in the Random Access Recovery Point table.
>>>>>>>> Whatever is signaled in packets (like the keyframe flag) is ignored to
>>>>>>>> prevent creating non-spec compliant output.
>>>>>>>
>>>>>>> The file in case B will be non compliant, it is damaged data, it cannot
>>>>>>> be muxed in a compliant way. If we support muxing damaged data then
>>>>>>> parsing that data as a way to unconditionally override the user is
>>>>>>> problematic.
>>>>>>> If you try to interpret damaged data the result is undefined, the spec
>>>>>>> does not tell you how to interpret it and 2 parsers can disagree
>>>>>>> if a damaged frame is a keyframe.
>>>>>>
>>>>>> If you don't mark a packet as a Sync Sample, even if it contains damaged
>>>>>> data, the file is still compliant. The point here is to avoid filling the
>>>>>> Sync Sample table with bogus entries.
>>>>>
>>>>> IMHO, damaged data is not always a compliant video stream, otherwise in the most extreem
>>>>> example /dev/random would have to be a compliant h264 stream.
>>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Lets just as 2 examples consider multiple slices some being parsed as IDR
>>>>>>> some as non IDR, so what is that ? or what if some packet reodering or
>>>>>>> duplication causes parts of a IDR and non IDR frame to be mixed.
>>>>>>> where is the IDR frame in that case ? 2 parsers can disagree
>>>>>>
>>>>>> The AVCodecParserContext will tag a packet with an IDR slice as keyframe,
>>>>>> and then disregard anything else it may find. I made the code i wrote here
>>>>>> do the same.
>>>>>
>>>>> If thats the case, we could take the example of a frame lets say it has
>>>>> 100 slices, and in one of these 1 byte is changed so it becomes a IDR slice
>>>>> That is not really a keyframe or sync symple yet it will
>>>>> be marked as one IIUC.
>>>>
>>>> Unless it's the first slice NALU out of those 100 in the AU, it will not be
>>>> marked as a keyframe by the AVCodecParserContext, nor as a Sync Sample by
>>>> this code.
>>>>
>>>>> Now to continue with this example, which way will it work better
>>>>> X. With this marked not a keyframe or
>>>>> Y. With this marked as keyframe as the parser detects ?
>>>>
>>>> It should not be marked as a Sync Sample.
>>>>
>>>>>
>>>
>>>>> Its not containing a single valid IDR slice just 1 of the non IDR slices
>>>>> have been damaged and are misparsed. That will not work well as a sync sample
>>>>> yet as a normal sample the 1% of damage might not be noticed at all as its copied
>>>>> from the last frame
>>>>
>>>> How about i look at every slice NALU in the AU and ensure there's only IDR
>>>> slices before marking it as a Sync Sample, instead of only looking at the
>>>> very first slice and decide based on it?
>>>> Same thing could be done in the AVCodecParserContext, for that matter.
>>>
>>> Then you get a IDR frame wrong if it has a single damaged slice where its
>>> parsed as non IDR.
>>
>> Same situation: It does not get added to the Sync Sample table.
> 
> and then you cannot seek to the begin of the file anymore

We're trying to create spec complaint files as much as possible, not 
invalid files that some software can then read better. If the first 
sample is not tagged as Sync because it didn't meet the requirements, 
then it's how it's meant to be. Remember that files with no Sync Samples 
at all are still valid.

But i already agreed to let the user override the choice made by this 
code as a last resort, so no point arguing about it.

> 
> 
> 
>>
>>>
>>> What you could do and i do not suggest this actually, just hypothetically
>>> is take the majority >50% being IDR as probably IDR. That will probably
>>> require extra care when there are 2 fields or frames in one packet
>>>
>>> The problem is if you unconditionally overreide the user you really need
>>> to be perfect and this is not easy. More so as being done by default it
>>> too should be low overhead not slowing things down
>>>
>>>
>>>>
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>> still a decoder can with some luck start decoding from such a
>>>>>>> point i would guess. But does the user want this marked as a keyframe ?
>>>>>>> maybe yes, maybe no. I think taking the users only way to set the keyframe
>>>>>>> flag away is a step backward
>>>>>>
>>>>>> Look at MPEG2 parsing and others in this container. It's the same scenario.
>>>>>> Did any of these concerns show up back then?
>>>>>> What I'm doing here for h264 is the same thing we did for those, to ensure
>>>>>> we don't create non spec compliant files because of damaged or mistagged
>>>>>> input, or a careless user.
>>>>>>
>>>>>> Letting the user mark whatever they want as a Sync Sample on a container
>>>>>> where it's explicit what should and should not be such is not good practice.
>>>>>> If you want that, you can use Matroska and other formats where keyframe
>>>>>> tagging is apparently not strict.
>>>>>>
>>>>>> Now, i want to point out that I wrote this parsing code here because my
>>>>>> previous attempt at stopping the AVCodecParserContext from being too liberal
>>>>>> marking things as keyframes was rejected *precisely* because it would let
>>>>>> the user create non compliant output in mp4. And now you don't want this one
>>>>>> either because you want to let the user create non complaint output.
>>>>>> I'm running in circles here and it's getting tiresome.
>>>>>
>>>>> I see the problem and i agree with you that it is a problem that needs
>>>>> a solution. Its very bad if the users sets bad flags and it results in
>>>>> non-compliant files. What my concern is, is that theres no way for the
>>>>> user to override this when its the parsing code thats wrong and previously
>>>>> the user could override this.
>>>>> It may be very rare that the parser is wrong and a user would bother to
>>>>> override it, i do not know, then again some people are quite crazy with
>>>>> the level of perfection they try to achieve in their videos
>>>>
>>>> The only reason the user may have wanted to override the packet's keyframe
>>>> flag was precisely because the AVCodecParserContext was tagging way too many
>>>> packets as such. This would no longer be necessary after this change, and
>>>> you can trust the muxer to do the right thing.
>>>>
>>>
>>>> I can however accept a compromise here, and take into account the packet's
>>>> keyframe flag if and only if invalid or unexpected data is found in the
>>>> bitstream. If the parsing succeeds, then whatever the packet metadata may
>>>> signal should be outright ignored.
>>>
>>> how do you detect invalid or unexpected data ?
>>
>> Any parsing error for which we already abort, and if we find a mix of IDR
>> and non IDR slices within a given AU. Instead of deciding to not add them to
>> the Sync Sample table like i mentioned above, we look at the packet's
>> keyframe flag and use it as last resort.
> 
> Is there any other reason to parse the whole AU ?
> if not then this would require O(n) instead of O(1) parsing and be slower
> If you do this, i would suggest that
> if the keyframe flag is set to stop parsing on the first IDR slice, similarly
> if the keyframe flag is not set to stop parsing on the first non IDR slice
> else all slices are the same type and you would override the user

Alright, I'll look into this.

> 
> With this you only need to parse O(1) when the flag matches a undamaged AU
> and often also for a damaged AU.
> the O(n) case would be reserved for the oddball cases
> 
> That said, i still think theres value in having the parsing code be that
> a parser, bsf or other API not cramed into the muxer but run prior in a
> more modular way in the future

Yes, i agree and mentioned as much.

> 
> Thanks
> 
> [...]
> 
> 
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
>