[FFmpeg-devel] Format of decoded text subtitles
Clément Bœsch
ubitux at gmail.com
Wed Aug 1 21:23:39 CEST 2012
On Wed, Aug 01, 2012 at 08:54:24PM +0200, Nicolas George wrote:
> Hi.
>
> Currently, when a text subtitle is decoded, it is usually re-transformed to
> ASS, which gives that:
>
> sub->num_rects = 1; /* or more */
> sub->rects[0]->ass =
> "Dialogue: 0,0:00:01.00,0:00:03.50,Default,,0,0,0,,Hello {\i1}World{\i0}!\n";
>
> This is wrong in several ways:
>
> First, it has timestamps in text format, which breaks any kind of trimming
> or scaling.
>
> Second, it has a lot of clutter that non-ASS decoders and encoders must deal
> with.
>
> Third, the ASS clutter actually depends on external information (the Format
> header of the Events section).
>
> Here is what I propose to fix this:
>
> 1. Deprecate the AVSubtitleRect.ass field; for compatibility reasons, we may
> put code in lavc to resynthesize it for some time, but that is all. For
> the same reason, the text field could contain the text completely
> stripped of markup.
>
> 2. Add a rich_text field instead. The text in this rich_text field has only
> local styling information, such as an italic span. The markup needs to be
> simple and to nest properly, so just ASS is out of the question, but
> slightly modified ASS is possible.
>
Are you sure it needs to be nested? The current decoders are transforming
nested markup to "flat" one in ASS; I think it's best to keep that logic.
It will be simpler to convert the decoders at least. And we will limit the
ASS markup surprises.
We need to be *very* cautious when messing with the ASS event. I'd
suggest you to have a deep look at libass/ass_{parse.c,types.h} before
planing anything. Think of Karaoke events at least.
> 3. For global styling, things become a bit hairy.
>
> typedef struct AVSubtitleStyle {
> const char *name;
> AVDictionary *tags; /* or another structure */
> } AVSubtitleStyle;
>
> enum AVSubtitleStyleLevel {
> AV_SUBTITLE_STYLE_RECT,
> AV_SUBTITLE_STYLE_EVENT,
> AV_SUBTITLE_STYLE_GROUP,
> AV_SUBTITLE_STYLE_FILE,
> AV_SUBTITLE_STYLE_HARDCODED,
> AV_SUBTITLE_STYLE_NUMBER,
> };
>
> typedef struct AVSubtitleRect {
> ...
> const char *rich_text;
> AVSubtitleStyle *style[AV_SUBTITLE_MARKUP_NUMBER];
> } AVSubtitleRect;
>
> The, considering the example ASS line I quoted above, we would have:
>
> rect->style[AV_SUBTITLE_STYLE_EVENT] = &{
> .name = NULL,
> .tags = {
> { "layer", "0" },
> { "marginl", "0" }, /* maybe omitted */
> { "marginr", "0" },
> { "marginv", "0" },
> },
> };
> rect->style[AV_SUBTITLE_STYLE_GROUP] = &{
> .name = "Default",
> .tags = {
> { "fontname", "DejaVu Serif" },
> { "fontsize", "22" },
> /* etc */
> },
> };
> rect->style[AV_SUBTITLE_STYLE_FILE] = &{
> .name = NULL,
> .tags = {
> { "playresx", "640" },
> { "playresy", "360" },
> { "collisions", "normal" },
> /* etc */
> },
> };
>
> I believe it is enough to represent abstractly the ASS subtleties, and
> probably most other subtitles system can fit in that too.
>
Looks sane.
> Problem: an encoder like ASS needs to be presented the FILE and all the
> GROUP level styles at the start of encoding.
>
How is that really different from the current state?
> This is not a completely finalized proposal, but I believe it is a good
> start.
>
Why not, I don't mind, it's a good thing in the long term.
[...]
--
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 490 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20120801/22ff5eb5/attachment.asc>
More information about the ffmpeg-devel
mailing list