[FFmpeg-devel] [RFC] Talk about subtitles

Thu Nov 24 02:24:13 CET 2011

On Tue, Nov 22, 2011 at 08:52:14PM +0100, Reimar Döffinger wrote:
> On Tue, Nov 22, 2011 at 07:56:41PM +0100, Clément Bœsch wrote:
> >  - ASS/SSA: AFAIK we use internally a slightly modified representation of ASS to
> >    store all the different subtitles tracks. While now ASS is used to render
> >    almost everything, it might not be the perfect approach because it can't
> >    store the information of all the subtitles format (such as pts precision).
> >    See next points for various related headaches.
> 
> I am quite sure we use (or did use?) the MKV representation of ASS that
> does not include the pts, but pts is instead stored separately.
> However we are limit to 1 ms precision there anyway.
> 

Well, AFAIU, the muxed version of ASS in Matroska (the only I'm aware of)
does not store the start and end pts in text dialog (pts are stored in the
container), and thus in the mkv demuxer the subtitles text without time
information are converted to the ASS dialog text + start/end ts in text
using the pts (see matroska_fix_ass_packet() in lavf/matroskadec.c).

I'm still not sure if this is actually the best way to deal with that…

> >  - Jacosub: this is somewhat an ancestor of ASS/SSA, and it's also a crazy
> >    format. If we do things properly, we might want to support it (so mplayer
> >    could for instance use FFmpeg internals to replace its own old subtitles
> >    support, which includes a partial jacosub support). Note that this format
> >    supports the *include* of external *images*. This is also something we need
> >    to care about in the internal struct. I already wrote a demuxer a while ago,
> >    and it could be used.
> 
> To my knowledge the structs already allow combining text, ass and image
> rects, since each rect has a separate type.
> You just need to be able to "normalize" them. E.g. all-bitmap for
> rendering on video, if you want to be crazy all-text (even from bitmaps)
> for some other uses...
> 

Yes we always need a "normalized" bitmap we can blit (if that's what you
meant); I think Nicolas mentioned that (I'll reply to everyone in order).

> >  - SAMI subtitles: this is HTML-only-valid-with-iexplorer5 crap, with CSS and
> >    such. Converting them to ASS might be difficult but possible. Though, parsing
> >    HTML is a pain: since it's not the only existing XML-like subtitles, we might
> >    need to have subtitles with a dependency to a random HTML/XML library...
> 
> I don't really see the bother. We could add an extra type, but just
> throw away (most) of the markup. Haven't seen many people complain about
> it. Doesn't hurt to give people a reason to avoid inventing yet another
> format in the case of subtitles.
> 

I'm not sure if there is much benefit in having two engines like there is
in MPlayer: one that strip most of the markup (because the old OSD is not
able to render something complicated) and the one using libass for fancy
layouts. But I indeed mentioned an extra "purified" text version of the
subtitle (that could go along the normalized bitmap version in the
AVSubtitle), but I think the simple form should be a "bonus", not a
requirement.

> >  - Closed caption and teletext: I don't know them enough, but we also need to
> >    support them. If anyone can comment on this...
> 
> CC is mostly just plain text. I think it has some scrolling features and
> stuff, but nobody I know really cares about it.
> The biggest problem really is that it nowadays often is stored in MPEG
> userdata which _must_ be reordered along with the video frames before
> being able to process it.
> Haven't yet found a sane way of handling it in FFmpeg.
> You can't really cram teletext fully into the subtitle framework since it is
> (kind of) interactive, needs a page cache etc.
> 

OK, thanks.

> > This is all I can think about ATM. So if we want to support this properly, I can
> > only think of this: libavsubtitles (with a dependency to libass to render a lot
> > of crazy markup in subtitles).
> > 
> > This library will be used to demux/decode/etc (for conversion purpose too) and
> > render the subtitles. And more importantly, we would be able to use it to burn
> > them through the libavfilter with something like vf_subtitles.
> 
> I don't see why you would need or even want to have demux/decode in a
> separate lib instead of where it is now.
> You will want something to convert between the different AVRect types,
> but whether that justifies a separate lib, especially if it's mostly
> going to be a libass wrapper seems questionable to me.
> 

I just though the way it is now is a bit confusing, and having most of the
subtitles code in one place could help making things clear. Of course,
having the main engine in a dedicated lavf (and lavc?) file would do the
trick I guess.

> > There is ATM a pending patch (vf_ass) that can workaround the situation for most
> > of the people: a lot of software allow the convert from srt to ass, so it won't
> > be needed to duplicate the srt to ass convert code yet another time for the
> > filter.
> 
> Why and how would SRT even get to the filter? Subtitle AVRect to my
> knowledge doesn't even support/allow SRT.

There are IMO two scenarios:

 - softsub: that's what we have now; we use -i/-scodec, and we are able to
   packetize a few types of subtitles and mux them in the appropriate
   format.

 - hardsub: I would see a -vf subtitles=f=inputsub.srt:... just like the
   proposed vf_ass. That filter would actually decode any kind of
   subtitles (SRT included) and burn them with the appropriate settings.
   To decode the subtitles and get the bitmap it would use libass, but
   through the API.

In order to write the filter, we need to have an API that decode almost
every subtitles and get the bitmap (and also deal with the fonts, that is
an issue I forgot to mention earlier). How would that filter work? I guess
it could call avformat/avcodec open routines, and decode it just like
ffmpeg.c does, then just update the frame when the pts match.

I have a bad feeling of misunderstanding too much things, and not actually
answering to your questions because of the confusion; please correct me if
you think I miss something.

Thank you for your quick feedback BTW,

-- 
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 490 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20111124/0b96b6f6/attachment.asc>