[FFmpeg-devel] [RFC] AVSubtitles rework

Tue Sep 11 19:45:31 CEST 2012

On Mon, Sep 03, 2012 at 09:40:25PM +0200, Clément Bœsch wrote:
> On Mon, Sep 03, 2012 at 08:01:04PM +0200, Nicolas George wrote:
> > L'octidi 18 fructidor, an CCXX, Clément Bœsch a écrit :
> > > I'm not very fond of introducing a new structure for a few reasons:
> > >  - having a AVSubtitle2 will require to maintain both paths for longer,
> > >    and the problem is already hard to deal with even if starting from
> > >    scratch
> > >  - if we do that, it will require duplicating the current public API for a
> > >    while, which sounds kind of a pain
> > 
> > All that is true, but that is the burden of compatibility. If we do not
> > introduce a new structure, all programs that currently allocate AVSubtitle
> > themselves will break if dynamically linked with a more recent lavc.
> > 
> > >  - I don't think the current AVSubtitle API is really used apart from
> > >    MPlayer, but I may be wrong
> > 
> > A Google search for avcodec_decode_subtitle2 shows VLC, XBMC, and a few
> > small projects.
> > 
> 
> TL;DR: follow up and extend brainstorming after VDD/subtitles talks
> 
> Mmh OK. Well then should we introduce an experimental AVSubtitle2 directly
> into libavutil to ease the integration with libavfilter later on?
> 
> If we are to start a new structure, we should consider designing it the
> proper way at first, so a subtitle structure being able to store two types
> of subtitles as we already discussed:
> 
>  == bitmap subtitles ==
> 
> For the bitmap stuff I don't have much opinions on how it should be done.
> IIRC, we agreed that the current AVSubtitle structure was mostly fine
> (since AVSubtitle is designed for such kind of subtitles at first) except
> that it it is missing the pixel format information, and we were wondering
> where to put that info (in each AVSubtitle2->rects or at the root of the
> AVSubtitle2 structure).
> 
>  == styled events for text based subtitles ==
> 
> For the styled text events, each AVSubtitle2 would have, instead of a
> AVSubtitle->rects[N]->ass an exploitable N AVSubtitleEvent (or maybe only
> one?). This is what the subtitles decoders would output (in a decode2
> callback for example, depending on how we keep compat with AVSubtitle) and
> what the users would exploit (by reading that AST to use it in their
> rendering engine/converter/etc, or simply pass it along to our encoders
> and muxers). Additionally, we may want to provide a "TEXT" encoder to
> provide a raw text version (stripping all markups) for simple rendering
> engine.
> 
> So, here is a suggestion of the classic workflow:
> 
>                                                      /* common transmuxing/coding path */
> DEMUXER -> [AVPacket] -> DECODER -> [AVSubtitle2] -> ENCODER -> [AVPacket] -> MUXER
>                                           |
>                                           |
>                         /* lavfi/hardsub or video player path */
>                                           |
>                                          / \
>                                         /   \
>        custom rendering                /     \
>        engine using the  <--------- text?  bitmap?
>       AVSubtitle2->events            /         \
>            structure                /           \
>                             libass to render?   bitmap overlay
>                                  /     \
>                            yes  /       \ no
>                                /         \
>                      ENCODER:assenc   ENCODER:textenc          (<== both lavc encoders)
>                              /             \
>    AVPacket->data is an ASS /               \
>    payload (no timing)     /                 \ AVPacket->data is raw text
>  (need to mux for timings)/                   \
>                          /                     \
>                  libass:parse&render    freetype/mplayer-osd/etc
> 
> 
> At least, that's how I would see the usage from a user perspective.
> 
> Now if we agree with such model, we need to focus on how to store the
> events & styles. Basically, each AVSubtitle2 must make available as AST
> the following:
> 
>  - an accessible header with all the global styles (such as an external
>    .css for WebVTT, the event styles in the ASS header, palettes with some
>    formats, etc.); maybe that one would belong in the AVCodecContext
>  - one (or more?) events with links to styles structure: either in the
>    global header, or associated with that specific event. BTW, these
>    "styles" info must be able to contain various information such as
>    karaoke or ruby stuff (WebVTT supports that,
>    https://en.wikipedia.org/wiki/Ruby_character)
> 
> We still need to agree on how to store that (and Nicolas already proposed
> something related already), but I'd like to check if everyone would agree
> with such model at first. And then we might engage in the API for text
> styling.
> 

Any comment?

When I'm done with the 3 projects I'm working on right now, I will likely
start this work.

-- 
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 490 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20120911/9ce98217/attachment.asc>