[FFmpeg-devel] Status and Plans for Subtitle Filters

Sat Feb 22 14:51:13 EET 2020

> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of
> 
> On Sat, Feb 22, 2020 at 10:59:46AM +0000, Soft Works wrote:
> [...]
> > Reading through the discussion around your patch was discouraging,
> > even destructive in some parts. I understand why you felt alone with
> > that and I wonder why nobody else chimed in. I mean, sometimes there
> > are extensive discussions about some of the least important video
> > formats in the world, while subtitles are a pretty fundamental thing...
> 
> I think the main reason is that subtitles are a different beast in the
> multimedia world, and most people intuitively understand this is not fun
> work at all. It's much more comfortable to work with audio and video since
> the framework design revolves around them.
> 
> > On the other hand - playing devil's advocate: Why even handle a
> > subtitle media type in filtergraphs?
> >
> 
> It's not only about lavfi: the whole framework works with AVFrame. If you
> use something else, you'll have to duplicate most of the APIs to handle
> subtitles as well. In the past, audio was actually separated, and unification
> with video was a relief. Going another path for subtitles is going to be
> extremely invasive, verbose, and annoying to maintain on API change.
> 
> > Would there be any filters at all that would operate on subtitles?
> >  (other than rendering to a video surface)
> 
> Sure. A few ideas that come to my mind:
> 
> - rasterization (text subtitles to bitmap subtitles)
> - ocr (bitmap subtitles to text)
> - all kind of text processing (eventually piped to some external tools)
> - censoring bad words
> - inserting "watermark" text
> - timing processing: trimming, shift, scaling of time
> - lorem ipsum or similar "source" filter (equivalent to our video mires)
>   for testing purposes
> - audio to text for auto captioning
> - text to audio for audio synthesis
> - concat multiple subtitle files (think of a multiple episode merged into
>   one, and you want to do the same for subtitles)
> - merge/overlap multiple subtitle tracks (think of multi-language
>   subtitles)

I knew there would be reasonable ones. Maybe except the text-to-speech
Idea. I suppose you need to be a masochist to watch a full movie hearing
synthesized speech ;-)

> [...]
> > But when the primary purpose of having subtitles in filtergraphs would
> > be to have them eventually converted to bitmaps, and given that it's
> > really so extremely difficult and controversial to implement this,
> > plus that there seems to be only moderate support for this from other
> > developers- could it possibly be an easier and more pragmatic solution
> > to convert the subtitles to images simply before they are entering the
> filtergraph?
> 
> That means it's likely to be only available within the command line tool and
> not the API. Unless you design a separated "libavsubtitle" (discussed in the
> past several times), but you'll need at some point many interfaces with the
> usual demuxing-decoding-encoding-muxing pipeline.

You're right, I was focused on the CLI, and first of all at the huge discrepancy 
in the required amount of work. 

While the predominant model of ffmpeg development (patch-trial-and-error
until it gets accepted) seems to have proven to be quite successful, I'm 
wondering whether in this case it wouldn't be a better strategy to come to
agree about a plan before anybody will spend more time on this..?

softworkz