[FFmpeg-devel] [PATCH v20 02/20] avutil/frame: Prepare AVFrame\n for subtitle handling

Sat Dec 11 23:31:48 EET 2021

> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of Michael
> Niedermayer
> Sent: Saturday, December 11, 2021 6:21 PM
> To: FFmpeg development discussions and patches <ffmpeg-devel at ffmpeg.org>
> Subject: Re: [FFmpeg-devel] [PATCH v20 02/20] avutil/frame: Prepare AVFrame\n
> for subtitle handling
> 
> On Fri, Dec 10, 2021 at 03:02:32PM +0000, Soft Works wrote:
> >
> >
> > > -----Original Message-----
> > > From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of Daniel
> > > Cantarín
> > > Sent: Thursday, December 9, 2021 10:33 PM
> > > To: ffmpeg-devel at ffmpeg.org
> > > Subject: Re: [FFmpeg-devel] [PATCH v20 02/20] avutil/frame: Prepare
> AVFrame\n
> > > for subtitle handling
> > >
> This sounds a bit like you expect that the majority of cases to not
> change ? iam asking because
> most cases i tried do change with the part of the patchset which
> cleanly applies. In fact about half of the changes are the failure i already
> posted previously. I think you said its an issue elsewhere. Still that needs
> to be fixed before this patchset can be used as a
> "instant replacement in production scenarios"

textsub2video, overlaytextsubs, graphicsubs2video, overlaygraphicsubs and subscale,
in different graph setups, sw-only, hw-only, mixed, with a multitude of filter
combinations, color formats, source codecs, etc. running on about 5k beta 
installations for several weeks without severe issues. Just for context.

> Also if you want more testcases which fail the same way as the previously
> posted one or testcases which produce different output, just say so

The more we can find and fix right now rather than later - the better.

> That said, i too would love to see some cleaner timebase handling
> in a new subtitle API.
> Maybe a switch cleaning this up could be done in a more segregated
> way so it doesnt put existing functionality at risk while still
> allowing the future API not to have 2 timebases where one should suffice

It's not really about having two different time bases. It's about having
a _fixed_ timebase for subtitle_pts (AV_TIMEBASE_Q) whereas the frame's
timebase can change while it's traveling from filter to filter. 

But let's go back one step for a moment. I have a feeling that I haven't 
explained really well how this whole thing is working. So - let me
take another approach:

One of the important points to understand is that - in case of subtitles,
the AVFrame IS NOT the subtitle event. The subtitle event is actually 
a different and separate entity. For subtitles, the AVFrame is just 
a carrier - like a taxi. We put the subtitle event into the taxi and
the taxi drives it through the filtergraph. Sometimes the taxi can drive
slow, sometimes fast, sometimes drive into another time zone...you get it.

The taxi analogy comes to its limit. Let's better think of a "chair lift".
A chair lift is running continuously at the same speed and that speed needs
to be retained. The guests we have are subtitle events (AVSubtitle in old
terminology), but they are not arriving in regular intervals. We never know
when the next will arrive. So, when an AVSubtitle arrives, we put it into
a chair and start making copies of that AVSubtitle to fill every chair
with a copy until the next AVSubtitle arrives - and so on.

The chairs are obviously AVFrames. They need to be numbered monotonically
increasing - that's the frame.pts. without increasing numbering the 
transport would get stuck. We are filling the chairs with copies
of the most recent subtitle event, so an AVSubtitle could be repeated
like for example 5 times. It's always the exact same AVSubtitle event 
sitting in those 5 chairs. The subtitle event has always the same start time 
(subtitle_pts) but each frame has a different pts.

For that reason, those two separate pts fields are required.
(and this was just one example, there are more and different ones)

Now back to the reason why subtitle_pts should have a fixed AV_TIMEBASE_Q
time_base, different from the pts timebase.

Considering the relation between AVFrame and subtitle event as laid out 
above, it should be apparent that there's no guarantee for a certain 
kind of relation between the subtitle_pts and the frame's pts who 
is carrying it. Such relation _can_ exist, but doesn't necessarily.
It can easily be possible that the frame pts is just increased by 1
on subsequent frames. The time_base may change from filter to filter
and can be oriented on the transport of the subtitle events which 
might have nothing to do with the subtitle display time at all.

Also, subtitle events are sometimes duplicated. When we would convert 
the subtitle_pts to the time_base that is negotiated between two filters,
then it could happen that multiple copies of a single subtitle event have
different subtitle_pts values.

Besides that, there are practical considerations: The subtitle_pts
is almost nowhere needed in any other time_base than AV_TIMEBASE_Q.

All decoders expect it to be like this, all encoders and all filters.
Conversion would need to happen all over the place. 
Every filter would need to take care of rescaling the subtitle_pts
value (when time_base is different between in and out).

All this would be done, only for establishing a value that will 
never be used in that form and would always need to be converted back
at every place in code that is using it.

It really doesn't make sense. It doesn't simplify anything (but the 
opposite), it also wouldn't make the API easier to understand.
Having it with the fixed timebase makes it even more clear that 
it isn't just another property of the frame - it's a property 
of the subtitle event.

Kind regards,
softworkz