[FFmpeg-devel] Format of decoded subtitles (was: matroska: Identify S_TEXT/UTF-8 tracks as SRT and not TEXT.)

Thu May 24 16:35:34 CEST 2012

On Thu, May 24, 2012 at 04:15:41PM +0200, Clément Bœsch wrote:
> On Thu, May 24, 2012 at 02:19:02PM +0200, Nicolas George wrote:
> > Le quartidi 4 prairial, an CCXX, Clément Bœsch a écrit :
> > > Most players use ASS rendering for every subtitles (assuming a conversion
> > > of the original subtitles markup into ASS), which is BTW what we do in our
> > > text subtitles decoders (SubRip, MicroDVD, and JacoSUB). ASS rendering is
> > > expected by most people even for these formats.
> > > 
> > > ASS also handles mostly every "useful" markups (of course I have a bunch
> > > of exceptions in mind) at the moment. If a new subtitle format is meant to
> > > replace ASS, it will likely keep some kind of retro compatibility with it
> > > (otherwise it will be a pain for almost every current decoders/players),
> > > and so moving our internal formats to this new one should not be much a
> > > problem.
> > > 
> > > I'm not sure about what you mean by handling the markup syntax the same
> > > way we handle pixel/sample formats.
> > 
> > What I meant was this: in AVFrame, the decoded video is in arrays of
> > integers, but there is a pix_fmt field that says if these arrays are YUV420P
> > or RGBA. If we have one and want the other, there is libswscale to do the
> > conversion; sometimes it is lossless, sometimes it is not.
> > 
> > For decoded text subtitles, there would be a markup_syntax field with values
> > like SUB_MARKUP_ASS or SUB_MARKUP_HTML. And an API to convert, losslessly or
> > not, from one markup to another.
> > 
> > Of course, if we have a perfect round-trip MARKUP_X -> MARKUP_Y -> MARKUP_X
> > (this can happen even if Y has features that X does not have, as they will
> > not be used in an Y converted from X; OTOH, if Y is case-sensitive and X is
> > not, we may lose the case information, which may be considered acceptable),
> > then MARKUP_X is useless and we can always convert to and from Y.
> > 
> > (This is not true for video, we can not convert everything to 32-bits per
> > component because of performance issues.)
> > 
> 
> So if I understand well, you would propose a model with libsubconvert
> doing any kind of markup conversion instead of the current model where the
> decoder is "encoding" the event in ASS, bitmap or text?
> 
> I don't think we really need to change this, I'm not sure to see the
> direct benefit.
> 
> > If, as you say, the ASS markup can express all the features of any other
> > known markup, then we can adopt ASS as an universal markup syntax, and
> > expect all subtitles codec to encode/decode the markup.
> > 
> 
> It should, for text-based subtitles. At least for the "useful" markup. But
> I admit ASS has some annoying limitations, especially with some particular
> subtitles features:
> 
>  - the first one I have in mind is that there is no text representation
>    for the "last up to the next subtitles" feature. Example: MicroDVD (and
>    SAMI which I'm working on ATM) have features like this:
> 
>    {500}{600}this is printed starting at frame 500 and last until frame 600
>    {1234}{}this starts being displayed at frame 1234...
>    {1400}{}...and will be "replaced" by this text until the end.
> 
>    We can express this in the AVPacket (pkt.duration = -1 for example),
>    but to encode the ASS event, it's not possible to have 00:01:02:03
>    -1:-1:-1:-1 for instance. So we need to workaround this.
> 
>  - One random limitation against SAMI: this insane HTML-based format
>    (actually not HTML at all, but full CSS2 compliant...), has two
>    subtitles place holders. Basically it's two subtitles in one (one to
>    print the talker name, and one for what's being said), relying on
>    various presentation markup expectation which ASS can't honor (I don't
>    want to try converting <table> into ASS markup for example).
> 
>  - Other crazy, but of limited usefulness: <img> tag in SAMI (yes...) or
>    even in JACOSub.
> 
>  - Last one is the precision limitation we already talked about (tb 1/100
>    for ASS, and 1/1000 for ones like SRT).

maybe:
pick the most complete format and extend it to become a superset of
all you want to support

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

When you are offended at any man's fault, turn to yourself and study your
own failings. Then you will forget your anger. -- Epictetus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20120524/b0629906/attachment.asc>