[FFmpeg-devel] [PATCH 2/2] avformat/movenc: parse h264 packets to build Sync Sample and Recovery Point tables

Michael Niedermayer michael at niedermayer.cc
Sat Jul 31 20:46:19 EEST 2021


On Sat, Jul 31, 2021 at 01:10:27PM -0300, James Almer wrote:
> On 7/31/2021 6:41 AM, Michael Niedermayer wrote:
> > On Fri, Jul 30, 2021 at 10:31:53AM -0300, James Almer wrote:
> > > On 7/30/2021 8:33 AM, Michael Niedermayer wrote:
> > > > On Thu, Jul 29, 2021 at 03:09:03PM -0300, James Almer wrote:
> > > > > On 7/29/2021 2:58 PM, Michael Niedermayer wrote:
> > > > > > On Tue, Jul 27, 2021 at 10:08:20AM -0300, James Almer wrote:
> > > > > > > Since we can't blindly trust the keyframe flag in packets and assume its
> > > > > > > contents are a valid Sync Sample, do some basic bitstream parsing to build the
> > > > > > > Sync Sample table in addition to a Random Access Recovery Point table.
> > > > > > > 
> > > > > > > Suggested-by: ffmpeg at fb.com
> > > > > > > Signed-off-by: James Almer <jamrial at gmail.com>
> > > > > > > ---
> > > > > > >     libavformat/movenc.c         | 125 +++++++++++++++++++++++++++++++++--
> > > > > > >     libavformat/movenc.h         |   1 +
> > > > > > >     tests/ref/lavf-fate/h264.mp4 |   6 +-
> > > > > > >     3 files changed, 123 insertions(+), 9 deletions(-)
> > > > > > 
> > > > 
> > > > A.
> > > > > > Will this allow a user to mark a subset of keyframes as random
> > > > > > access points ?
> > > > > > A user may want seeking to preferably happen to a scene start and not a
> > > > > > frame before
> > > > > > 
> > > > 
> > > > B.
> > > > > > Will this allow a user to mark a damaged keyframe as such ?
> > > > > > A frame might in fact have been a keyframe and maybe the only in the file
> > > > > > but maybe was damaged so a parser might fail to detect it.
> > > > > 
> > > > > This code will ensure all packets with an IDR picture will be listed in the
> > > > > Sync Sample table, and all packets with a Recovery Point SEI message will be
> > > > > listed in the Random Access Recovery Point table.
> > > > > Whatever is signaled in packets (like the keyframe flag) is ignored to
> > > > > prevent creating non-spec compliant output.
> > > > 
> > > > The file in case B will be non compliant, it is damaged data, it cannot
> > > > be muxed in a compliant way. If we support muxing damaged data then
> > > > parsing that data as a way to unconditionally override the user is
> > > > problematic.
> > > > If you try to interpret damaged data the result is undefined, the spec
> > > > does not tell you how to interpret it and 2 parsers can disagree
> > > > if a damaged frame is a keyframe.
> > > 
> > > If you don't mark a packet as a Sync Sample, even if it contains damaged
> > > data, the file is still compliant. The point here is to avoid filling the
> > > Sync Sample table with bogus entries.
> > 
> > IMHO, damaged data is not always a compliant video stream, otherwise in the most extreem
> > example /dev/random would have to be a compliant h264 stream.
> > 
> > 
> > > 
> > > > 
> > > > Lets just as 2 examples consider multiple slices some being parsed as IDR
> > > > some as non IDR, so what is that ? or what if some packet reodering or
> > > > duplication causes parts of a IDR and non IDR frame to be mixed.
> > > > where is the IDR frame in that case ? 2 parsers can disagree
> > > 
> > > The AVCodecParserContext will tag a packet with an IDR slice as keyframe,
> > > and then disregard anything else it may find. I made the code i wrote here
> > > do the same.
> > 
> > If thats the case, we could take the example of a frame lets say it has
> > 100 slices, and in one of these 1 byte is changed so it becomes a IDR slice
> > That is not really a keyframe or sync symple yet it will
> > be marked as one IIUC.
> 
> Unless it's the first slice NALU out of those 100 in the AU, it will not be
> marked as a keyframe by the AVCodecParserContext, nor as a Sync Sample by
> this code.
>
> > Now to continue with this example, which way will it work better
> > X. With this marked not a keyframe or
> > Y. With this marked as keyframe as the parser detects ?
> 
> It should not be marked as a Sync Sample.
> 
> > 

> > Its not containing a single valid IDR slice just 1 of the non IDR slices
> > have been damaged and are misparsed. That will not work well as a sync sample
> > yet as a normal sample the 1% of damage might not be noticed at all as its copied
> > from the last frame
> 
> How about i look at every slice NALU in the AU and ensure there's only IDR
> slices before marking it as a Sync Sample, instead of only looking at the
> very first slice and decide based on it?
> Same thing could be done in the AVCodecParserContext, for that matter.

Then you get a IDR frame wrong if it has a single damaged slice where its
parsed as non IDR.

What you could do and i do not suggest this actually, just hypothetically
is take the majority >50% being IDR as probably IDR. That will probably 
require extra care when there are 2 fields or frames in one packet

The problem is if you unconditionally overreide the user you really need
to be perfect and this is not easy. More so as being done by default it 
too should be low overhead not slowing things down


> 
> > 
> > 
> > 
> > > 
> > > > still a decoder can with some luck start decoding from such a
> > > > point i would guess. But does the user want this marked as a keyframe ?
> > > > maybe yes, maybe no. I think taking the users only way to set the keyframe
> > > > flag away is a step backward
> > > 
> > > Look at MPEG2 parsing and others in this container. It's the same scenario.
> > > Did any of these concerns show up back then?
> > > What I'm doing here for h264 is the same thing we did for those, to ensure
> > > we don't create non spec compliant files because of damaged or mistagged
> > > input, or a careless user.
> > > 
> > > Letting the user mark whatever they want as a Sync Sample on a container
> > > where it's explicit what should and should not be such is not good practice.
> > > If you want that, you can use Matroska and other formats where keyframe
> > > tagging is apparently not strict.
> > > 
> > > Now, i want to point out that I wrote this parsing code here because my
> > > previous attempt at stopping the AVCodecParserContext from being too liberal
> > > marking things as keyframes was rejected *precisely* because it would let
> > > the user create non compliant output in mp4. And now you don't want this one
> > > either because you want to let the user create non complaint output.
> > > I'm running in circles here and it's getting tiresome.
> > 
> > I see the problem and i agree with you that it is a problem that needs
> > a solution. Its very bad if the users sets bad flags and it results in
> > non-compliant files. What my concern is, is that theres no way for the
> > user to override this when its the parsing code thats wrong and previously
> > the user could override this.
> > It may be very rare that the parser is wrong and a user would bother to
> > override it, i do not know, then again some people are quite crazy with
> > the level of perfection they try to achieve in their videos
> 
> The only reason the user may have wanted to override the packet's keyframe
> flag was precisely because the AVCodecParserContext was tagging way too many
> packets as such. This would no longer be necessary after this change, and
> you can trust the muxer to do the right thing.
> 

> I can however accept a compromise here, and take into account the packet's
> keyframe flag if and only if invalid or unexpected data is found in the
> bitstream. If the parsing succeeds, then whatever the packet metadata may
> signal should be outright ignored.

how do you detect invalid or unexpected data ?


> 
> > 
> > Theres also the 2nd situation where all the information the parser
> > extracts is already known to the lib user and going over the whole
> > bitstream is wasting cpu cycles. This can be a situation where for
> > example all input files where just a moment ago created by FFmpeg
> > and we assume these would then be as compliant as redoing the parsing
> > would make them
> 
> Other information, like Recovery Point SEI messages, can't be properly
> signaled by encoders within packet metadata. Muxer can and should, for now,
> look at the bitstream for this purpose.

yes

thx

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Does the universe only have a finite lifespan? No, its going to go on
forever, its just that you wont like living in it. -- Hiranya Peiri
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20210731/d583fcef/attachment.sig>


More information about the ffmpeg-devel mailing list