[FFmpeg-devel] [PATCH v5 00/25] Subtitle Filtering 2022

Sun Jul 24 21:38:37 EEST 2022

> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of
> Nicolas George
> Sent: Sunday, July 24, 2022 5:10 PM
> To: FFmpeg development discussions and patches <ffmpeg-
> devel at ffmpeg.org>
> Subject: Re: [FFmpeg-devel] [PATCH v5 00/25] Subtitle Filtering 2022
> 
> I hesitated a long time before replying, but all considering, at this
> point expressing what is weighing on my heart cannot make things
> worse.
> 
> 
> Michael Niedermayer (12022-07-03):
> > What is the timeline for the audio+video merge ?
> 
> I cannot give a timeline because I do not work on FFmpeg on a
> schedule,
> I work on FFmpeg in my free time, for fun. And lately, working on
> FFmpeg
> has been really un-fun. Recently, my contributions have been met with
> indifference at best (see when I proposed a XML parser six months
> ago:
> nobody cared), outright hostility at worst. Under these
> circumstances,
> whenever I am considering working on something FFmpeg-related, I
> usually
> find something else more fun to do.
> 
> I do not recognize the project I started contributing to more than
> fifteen years ago. I do not even recognize the project that boasted
> the
> clever optimization framework that made FFVP9 possible, it has become
> increasingly hostile to trying new and more efficient ways of doing
> things in favor of a corporate never-take-risks style of coding. I am
> more and more often considering giving up and cutting my losses.
> 
> > IIUC this would resolve this deadlock (with extra work adapting the
> patchset
> > so it would be work for SW adapting it and it would be work for you
> finishing
> > the merge)
> > Also can others help nicolas moving his work forward
> >
> > What i suggest is to pick a time and then try to finish the merge
> before.
> > If it succeeds this patchset needs updating and can move forward
> without
> > this main objection
> > OTOH if the time is not hit, we agree that the objection can no
> longer be
> > used
> 
> My answer as maintainer of the framework of libavfilter is: no.
> 
> Of course, maintainers are not dictators. The members of the project
> can
> collectively decide otherwise. But I have to warn you about the
> consequences.
> 
> First, the issue about negotiation is not he only severe flaw in this
> patch series. 

Negotiation hasn't been implemented for audio+video yet. Neither 
does that patchset do it for audio+video+subtitles.
It is out of scope of this patchset. It can be done later or never,
not everybody is a fan of doing so, as comments have shown.
Clearly, this is in no way a showstopping reason as you had
conceded yourself recently.

> I can immediately quote another one: for text
> subtitles,
> the approach of this proposal to synchronization is to feed
> everything
> to libass as it comes and see what comes out. It will work on easy
> cases, when the subtitles are interleaved with the video or come from
> a separate file. 

- or come from decoded closed captions
- or come from graphic subtitles converted with the graphicsub2text
  filter
- or come from the subfeed filter after fixing durations
- or come from the subfeed filter ensuring a regular repetition
  (heartbeat)

Subtitle events don't need to come in linear order. Multiple
events can have identical start times, subtitle events can
overlap. 

The overlaytextsubs filter is meant to be a direct replacement 
for the existing subtitles filter, which performs additional 
opening, parsing and decoding of the source file in parallel, 
and avoiding that was one of the primary objectives I had for 
starting development.
That's why it was very important for me to preserve the exact 
same behavior as the overlaytextsubs filter exposes.

Other approaches for implementatino are surely possible as well. 
Traian, who did  the text2graphicsub filter had initially an 
implementation that handled the timing manually instead of letting 
libass do it, but it turned out that this can quickly become a 
really complex task, especially when overlapping events or 
animations are part of the game, so it came down to feeding 
everything to libass in the end, like the overlaytextsubs 
filter and the subtitles filters do.

The nice thing about having subtitle filtering is that there
is no fixed functionality involved where you can argue about 
right or wrong: anyone is free to contribute another filter 
which is pursuing a different approach. I would welcome that 
and there may be cases where an alternative method could be
advantageous, but it surely won't be superior in general.

> But as soon as a filter will, for example, adjust the
> timing to make the subtitles more early, it will just not work. Of
> course, it was not tested because this patch series does not even
> offer the feature to adjust the time of subtitles, which is frankly
> ridiculous, it is one of the most obvious thing people might want to
> do.

The only reason why there is no timing adjustment filter is that
I didn't need one. It is really easy to implement such filter.
The abilities of such kind of filter are limited by the individual
circumstances, though:
You can always delay subtitle presentation, but the amount of 
time that you can move them ahead is limited by the situation.
It depends on the amount of time they are muxed ahead in the 
source. This can range from zero (for example when originating
from closed captions) to a few seconds (e.g. with ocr-ed 
DVB subs).
In the latter case it's easy to subtract one or two seconds
to show subtitles earlier - in the former case, there is no
room for this - unless you would let the video frames queue
up at the overlaytextsubs filter to synchronize with the 
subtitle frames. 
You are absolutely right here: the overlaytextsubs filter doesn't 
do that, it doesn't use framesync. 
Besides the reason to produce equal results to the existing subtitles 
filter (which framesync would interfere with), there's another 
consideration which kept me from using framesync: 

The benefit would be small and very limited.

Let's look at an example: assuming we have a 4k 30fps video onto
which we want to overlay text subtitles. The text subtitles
are muxed "just-in-time" in the source and we want the subtitles
to be shown 3 seconds earlier.
In order to make this possible, it would be required to queue up 
3s * 30 fps = 90 frames at the overlay filter. And this is not 
the muxing queue, it's inside a filtergraph where we have 
uncompressed frames.
Assuming 4 bytes per pixel for simplicity * 3840 * 2160 = 32 MB
per frame. For 90 frames, this makes 2.8 GB memory for 3 seconds
delay. 
It is not unusual that subtitles are off by even larger time
spans (e.g. video has cut off intro but sub timings assume intro
to be present).
Even with sufficient RAM - as soon as hardware acceleration
is being used, you really wouldn't want to use previous GPU
memory for subtitle timing adjustments.

Eventually this brought me to the conclusion that this isn't 
a suitable approach for adjusting subtitle offsets except for 
small corrections. 
It's easy to add such filter to change timings and it's very
well possible to implement an overlay filter with a different 
behavior - I would be sincerely interested to see which 
alternative solution you would come up for this.

> this patch series does not even
> offer the feature to adjust the time of subtitles, which is frankly
> ridiculous, it is one of the most obvious thing people might want to
> do.

How did you come to assume that it wouldn't be possible to adjust 
the timing of subtitles at all? 

It's just that a much better and universal way than using a filter
is the typical approach used for audio video sync correction, 
which can even be combined with the overlay filter:

ffmpeg -i subtitlevideo.mkv -itsoffset -3 -i subtitlevideo.mkv -filter_complex "[1:s]subfeed[sub1];[0:v][sub1]overlaytextsubs[fout]" -map [fout] -map 0:a -sn -y out.mkv

> Note that I did not have to perform a full review of the patch series
> to
> find this flaw. I have been preparing to implement subtitles
> filtering
> for years now, I know which aspects are tricky and hard to implement
> properly. I only had to check precisely how it was done. And it turns
> out it was not done at all.

Then it's very weird that it's all working: dozens of examples for 
new functionality and all existing functionality is preserved.

> I suspect that if I were to do a full review, I would find a few
> other
> flaws. But the author has made painfully clear that they did not
> respect
> my expertise in this area, 

You kept talking about your expertise and the lack of mine, without
ever talking about any technical matters.

This time, you mentioned technical issues and you see me answering
in a very detailed way. We have seen how it cannot work and now you
see how it could work. If you stick to technical matters without 
dropping the typical discrediting subtext, it could all go well.

Thanks,
softworkz