[FFmpeg-devel] [RFC] Talk about subtitles

Thu Nov 24 02:57:48 CET 2011

On Wed, Nov 23, 2011 at 01:56:45PM +0100, Nicolas George wrote:
> Le duodi 2 frimaire, an CCXX, Clément Bœsch a écrit :
> > [TL;DR: I want to improve/rewrite a real subtitles support in FFmpeg and need
> > some hints.]
> 
> My opinion on this:
> 
> In the long run, this belongs in libavfilter, with a filtergraph fragment
> that looks like that:
> 
>  text sub  +----------+ bitmap sub  +---------+  video with alpha
> ---------->| txt2bmap |------------>| sub2vid |------.
>            +----------+             +---------+       \    +---------+
>                                                        `-->|         |
>     video                                                  | overlay |->
> ---------------------------------------------------------->|         |
>                                                            +---------+
> 

Won't this lead to a complicated usage? In most people mind, it's just
"softsub" vs "hardsub", I don't think they want to deal with a complicated
filtergraph.

> Or possibly a single sub_overlay filter merging sub2vid with overlay, since
> subtitles are only small rectangles that may be more efficient.
> 

One filter combining every part you mentioned yes. Thought, I'm not sure
about having the core of the engine in the libavfilter: we still need the
demux/decode/encode/mux code, how would you integrate this in libavfilter?

> (In the very long run, I believe libavfilter should handle even the decoders
> and encoders, and possibly the demuxers and muxers, but that is another
> story.)
> 
> Before we get there, we need:
> 
> - support for subtitles in libavfilter;
> 
> - support for complexes filtergraphs in the command line tools, more
>   efficiently and less awkwardly than with movie/amovie/smovie.
> 
> This is not for tomorrow, but this will eventually come.
> 

Mmh, don't you think we could just improve the subtitles API and use it in
the filter is a simpler and fine way to deal with it? I may completely
miss your point though.

> In the short run, the above features can be hardcoded into the ffmpeg
> command line tool. Fortunately, all code written here can be later reused
> almost as is for the corresponding filter. And in fact, most of the code is
> probably already in Stefano's proposal for vf_ass.
> 
> In practice, that could look like that:
> 
> - -hardsub option similar to map to tell ffmpeg that it needs to overlay the
>   subtitles stream #S.s onto the video stream #V.v.
> 

-hardsub could be used as an alias: if hardsub is specified, let's insert
a subtitles video filter for transparency. Not sure how easily we can do
this.

> - In transcode_subtitles, if the stream is used in hardcoded sub, keep the
>   decoded packet around instead of avsubtitle_free()ing it.
> 
> - In do_video_out, just before the call to avcodec_encode_video, call
>   avcodec_overlay_subtitle(big_picture, current_sub).
> 
> - For text subtitles, some kind avcodec_render_subtitle function, probably
>   based on libass (but an internal rudimentary implementation may be
>   useful), called by avcodec_overlay_subtitle if necessary.
> 

If we insert a filter (and first the user would just specify it manually),
we won't have to need that AFAIU. Also note it will be easier to integrate
it in ffplay for instance (hell yeah subtitles in ffplay!).

> As a side note, since everything would be temporary (until proper support is
> in lavfi), we can skip optimizations. For example, avcodec_overlay_subtitle
> can always copy the whole frame, and later we can rely on lavfi's
> permissions framework to take care of that.
> 
> 
> Concerning the various markups for text subtitles, there are usually two
> options in such a case:
> 
> - We choose or define an universal format that can represent reliably all
>   other markups, so that the "markup X -> universal markup -> markup X"
>   round trip is always lossless. And then we always use it.
> 
> - We flag all text subtitle data with the used markup, we define a few
>   conversion functions and use them as needed, possibly using some kind of
>   "shortest path in a graph" algorithm when direct conversion is not
>   implemented.
> 
> Working with an universal markup is easier, and for text subtitles,
> efficiency is not really a concern. Unfortunately, I am not sure that the
> universal markup can actually exist. With the correct API, converting in all
> directions may be relatively painless.
> 

I believe ASS is a fine universal markup. Since we can render (almost?)
everything (with some pain sometimes), it should be fine.

And if not, the codec could output bitmap with its own rendering and use
the markup ASS common text field for a "verbatim" text version where his
markup is stripped.

> Note that apart from the markup, the encoding is also a problem.
> 

Arh. Almost forget this one...

[...]

-- 
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 490 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20111124/3e9e9572/attachment.asc>