[FFmpeg-devel] Politics

Wed Dec 22 15:29:04 EET 2021

> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of Michael
> Niedermayer
> Sent: Monday, December 20, 2021 4:24 PM
> To: FFmpeg development discussions and patches <ffmpeg-devel at ffmpeg.org>
> Subject: Re: [FFmpeg-devel] Politics
> 
> I am not sure the direction from which you approuch this is going to
> increase the chances this patch has.
> 
> All stream types in libavformat/codec are timebase based, that was
> done because its exact (for some definition of exact at least)
> 
> I think you should argue why this is the best way forward not why its
> not too bad
> 
> also in a few places where a fixed timebase is used ffmpeg uses
> AV_TIME_BASE_Q which is micro not milli seconds. That suddenly
> allows exactly addressing individual frames and audio samples.
> And it should be easy to change to from ms, its just a *1000
> it would weaken the precission argument

For the final chapter of this story, let us return to the original 
subject which I would summarize like:

"Even though the whole world is fine using millisecond precision 
for subtitle display, I think I know better and therefore insist 
on having a higher precision and/or flexible timebase for subtitle 
timings, otherwise I won't accept the patchset"

I have been waiting for a while to answer, expecting somebody 
might come to realize himself how useless this whole idea actually
is, but I guess it's time to reveal:

Let's look at the concern first: The concern is about that with 
a subtitle precision of milliseconds (let's say milli, even
though we actually have microseconds), it would not be possible
to make sure that a subtitle event would be shown exactly at a 
specific video frame.

The claim is: This could be achieved by having a high precision
(and/or a custom time base) for subtitle timings, because
this would allow to have subtitle start times that could exactly 
match the frame display time of the frame at which a subtitle
should be initially displayed.

For a moment let's put aside the argument about subtitle format
precision. Let's assume we'd have a subtitle format that allows
such precisions and maybe even custom time bases and let's assume
a player that can handle this.

Now we look at the player and a situation where the player needs
to display frame N. At this point, a range of different things
can happen, mostly specific to the implementation of the player:
Whether it reads a frame's time value or infers it from the 
frame rate or which time base a player is using internally, 
just to name a few examples.
And then - at a total different place of implementation in the
player (could be custom, or a library like libass), the player
needs to determine whether a (and which) subtitle needs to be
displayed over frame N.
Here, we have the frame time, which has undergone a number
of calculations and we have a subtitle event with our super-
precise subtitle start time. The player converts that to its 
internal time base, and then..

..how does the player determine whether the subtitle event
should be shown on frame N?

Does it check like: frame.pts == subtitle.pts? No, it doesn't!

It does something like: 

frame.pts > subtitle.pts && frame.pts < subtitle.end_pts

..because it also needs to display all subtitle events that are 
already visible.

Let's look again at the proposal: to use high precision subtitle
timings which would allow us to have subtitle start times that 
are as close as possible (or even equal) to the video frame 
time.

Now what a surprise: having-frame-equal subtitle start-times
wouldn't make it _more_ clear at which video frame the subtitle 
should be shown - it would make it more and more _unclear_ and 
non-deterministic whether it should be shown at that frame or 
the next!
Eventually, the final presentation would depend on:
client implementation details and rounding errors, which means 
the opposite of consistent, reliable or predictable.

The closer and more precise a subtitle start time would be set
to a frame's display time, the higher the chance that it 
would be shown at the wrong time (due to the >/< tests that
clients need to perform.

Everybody who is creating computer animations and who wants 
to achieve sudden changes from one frame to another knows,
that this change needs to be authored in a way that it 
happens in the timely middle between two frames (to make 
it safe against slight adjustments, rescaling, etc.)

And the same goes for subtitle authoring: When you want 
to make sure that a subtitle is shown at frame N+1 but 
not at frame N, then you set the subtitle start time 
to a time half-way between N and N+1.

And this doesn't require high-precision timing values
nor custom time bases and it's also stable against 
calculations and rounding errors that might occur
during processing.

My wish for the future would be criticism that isn't 
based on mind-farts or unrealistic hypothetical cases,
but actual problems, ideally accompanied by an example 
to demonstrate.

Kind regards,
softworkz