[FFmpeg-devel] Internal handling of subtitles in ffmpeg

Thu Jan 1 18:16:07 CET 2009

On Thu, Jan 01, 2009 at 04:19:49PM +0100, Michael Niedermayer wrote:
> Let me summarize what i remember from your standpoint, please correct
> me if i misremember something
> 1. decoders should output bitmaps 
> 2. bitstream filters should convert betweem X->ASS and ASS->X

Actually, I think bitstream filters, at least how they are done
currently are horrible for usability.
I was just thinking in terms of "quick solution that looks like a
sensible template for a future good solution"

> My suggestion
> 1. decoders output vector based AVSubtitleRects containing ASS or bitmaps
> 1b. encoders take vector based AVSubtitleRects containing ASS or bitmaps
> 2.  A renderer can converts ASS AVSubtitleRects to bitmaps
> 
> You say "this feels like a horribly complex way to pass around of strings
> without much of an advantage", can you please elaborate on this?

The "horribly complex way" passing around AVSubtitleRects with text and
coordinates.
I think I'd already be mostly okay if the char * argument was in
AVSubtitle and not AVSubtitleRects (because I do not even remotely see
a rectangular position as an inherent property of some text - though
actually that would be true also of any non-trivial bitmap subtitle
format if such a thing existed).

> The concrete problems i see with your design are
> 1. The current architecture is demuxer->decoder->encoder->muxer
>    considering that your decoders return bitmaps its no longer possible to
>    encode these to text, thus breaking the "demuxer->decoder->encoder->muxer"

This assumes that you want to treat subtitles "exactly" like video/audio
which is somewhat questionable (lossless vs. lossy etc.).
Also I can not see it work well with your approach either because
a encoder after once agreeing on a format (pixfmt, size) will deal with
all inputs, most subtitle encoder will handle only text or only bitmap,
and you seem to not want to distinguish between text and bitmap "a
priori".
So what you would have to do would be decoder -> (possibly text<->bitmap
transformer) -> encoder.
Of course that would be comparable if you consider that "transformer"
analogous to swscale, but then text-only and bitmap-only subtitle
formats are as much "the same" as a RGB32 and a YV12 frame (with the
difference of supporting mixed formats).

> 2. How should mixed bitmap and text formats be represented?
>    Your suggestion requires a bitstream filter to convert to ASS and then from
>    ASS, but does ASS support bitmaps in every pixel format we would need, 
>    besides how to put this in the char * ?

I think ASS does not support bitmaps at all, only the next version with
some other name IIRC. But I'd expect it would also support rotating
bitmaps with some explicitly specified scaling algorithm and position
relative to the border of the screen and crazy stuff like that, which
leads to my original question for text subtitles "how to put this in
AVSubtitleRects".
And my answer is: AVSubtitleRects is fundamentally designed to only work
for trivial subtitle formats due to assuming you split the subtitle in
rectangular areas in a way that makes sense.

> 3. Does ASS support every way text can be positioned by other formats?
>    I mean if we convert from X to Y the text should stay at the same
>    spot on the screen given Y can represent it.

No idea, I was only claiming that AVSubtitleRects is orders of magnitude
worse.

> Also in the light of "horribly complex", does it not feel horribly complex
> to require every ASS->X bitstream filter to be able to extract things like
> position, i mean in my suggestion these would be stored in a easy accessable
> struct doing the extraction just at one spot.

And they would be wrong for any "non-trivial" text subtitle.

> and general case here means
> text -> text while not loosing effects when the destination supports the
>     effects
> text -> bitmaps (not a single 95% transparent screen sized bitmap)
> bitmaps -> display (with bitmaps not being colorspace converted twice)
> text+bitmaps -> text+bitmaps

Well, I just think you'd have to extend this to have at least those
"basic" subtitle types:
"DATA blob" (ASS with bitmap support extensions?, not possible to correctly
represent as AVSubtitleRects, thus not using them - alternatively
giving up on a common representation format for anything so advanced)
"trivial" bitmap only (using AVSubtitleRects)
"trivial" text only (using AVSubtitleRects)
"trivial" bitmap+text (using AVSubtitleRects)

possibly requiring that every encoder supports one of the "trivial"
formats.

Greetings,
Reimar D?ffinger