[FFmpeg-devel] Format of decoded text subtitles

Wed Aug 1 21:23:39 CEST 2012

On Wed, Aug 01, 2012 at 08:54:24PM +0200, Nicolas George wrote:
> Hi.
> 
> Currently, when a text subtitle is decoded, it is usually re-transformed to
> ASS, which gives that:
> 
> sub->num_rects = 1; /* or more */
> sub->rects[0]->ass =
>   "Dialogue: 0,0:00:01.00,0:00:03.50,Default,,0,0,0,,Hello {\i1}World{\i0}!\n";
> 
> This is wrong in several ways:
> 
> First, it has timestamps in text format, which breaks any kind of trimming
> or scaling.
> 
> Second, it has a lot of clutter that non-ASS decoders and encoders must deal
> with.
> 
> Third, the ASS clutter actually depends on external information (the Format
> header of the Events section).
> 
> Here is what I propose to fix this:
> 
> 1. Deprecate the AVSubtitleRect.ass field; for compatibility reasons, we may
>    put code in lavc to resynthesize it for some time, but that is all. For
>    the same reason, the text field could contain the text completely
>    stripped of markup.
> 
> 2. Add a rich_text field instead. The text in this rich_text field has only
>    local styling information, such as an italic span. The markup needs to be
>    simple and to nest properly, so just ASS is out of the question, but
>    slightly modified ASS is possible.
> 

Are you sure it needs to be nested? The current decoders are transforming
nested markup to "flat" one in ASS; I think it's best to keep that logic.
It will be simpler to convert the decoders at least. And we will limit the
ASS markup surprises.

We need to be *very* cautious when messing with the ASS event.  I'd
suggest you to have a deep look at libass/ass_{parse.c,types.h} before
planing anything. Think of Karaoke events at least.

> 3. For global styling, things become a bit hairy.
> 
>   typedef struct AVSubtitleStyle {
>       const char *name;
>       AVDictionary *tags; /* or another structure */
>   } AVSubtitleStyle;
> 
>   enum AVSubtitleStyleLevel {
>       AV_SUBTITLE_STYLE_RECT,
>       AV_SUBTITLE_STYLE_EVENT,
>       AV_SUBTITLE_STYLE_GROUP,
>       AV_SUBTITLE_STYLE_FILE,
>       AV_SUBTITLE_STYLE_HARDCODED,
>       AV_SUBTITLE_STYLE_NUMBER,
>   };
> 
>   typedef struct AVSubtitleRect {
>       ...
>       const char *rich_text;
>       AVSubtitleStyle *style[AV_SUBTITLE_MARKUP_NUMBER];
>   } AVSubtitleRect;
> 
>   The, considering the example ASS line I quoted above, we would have:
> 
>   rect->style[AV_SUBTITLE_STYLE_EVENT] = &{
>       .name = NULL,
>       .tags = {
> 	  { "layer", "0" },
> 	  { "marginl", "0" }, /* maybe omitted */
> 	  { "marginr", "0" },
> 	  { "marginv", "0" },
>       },
>   };
>   rect->style[AV_SUBTITLE_STYLE_GROUP] = &{
>       .name = "Default",
>       .tags = {
> 	  { "fontname", "DejaVu Serif" },
> 	  { "fontsize", "22" },
> 	  /* etc */
>       },
>   };
>   rect->style[AV_SUBTITLE_STYLE_FILE] = &{
>       .name = NULL,
>       .tags = {
> 	  { "playresx", "640" },
> 	  { "playresy", "360" },
> 	  { "collisions", "normal" },
> 	  /* etc */
>       },
>   };
> 
>   I believe it is enough to represent abstractly the ASS subtleties, and
>   probably most other subtitles system can fit in that too.
> 

Looks sane.

>   Problem: an encoder like ASS needs to be presented the FILE and all the
>   GROUP level styles at the start of encoding.
> 

How is that really different from the current state?

> This is not a completely finalized proposal, but I believe it is a good
> start.
> 

Why not, I don't mind, it's a good thing in the long term.

[...]

-- 
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 490 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20120801/22ff5eb5/attachment.asc>