[FFmpeg-devel] Subtitles for GSoC

Thu Mar 10 18:12:57 CET 2016

On Dienstag, 8. März 2016 20:42:39 CET Clément Bœsch wrote:
> On Tue, Mar 08, 2016 at 06:21:12PM +0100, Gerion Entrup wrote:
> > Hello,
> > 
> 
> Hi,
> 
> > my own ideas seems not to be suitable for GSoC, so I looked again on the ideas page,
> > because I have high interest to do something for FFmpeg this summer.
> > 
> > The project, that I find most interesting, unfortunately is a unmentored one, the subtitle
> > support. Is someone willing to mentor this?
> > 
> 
> I added this task for previous OPW (and maybe GSoC, can't remember). I'm
> unfortunately not available for mentoring (too much time, energy and
> responsibility). Though, I can provide standard help as a developer.
> 
> The main issue with this task is that it involves API redesign, which is
> often not a good idea for a GSoC task.
> 
> That said, a bunch of core limitations have been solved in the past so
> it's starting to be comfortable to work on top of the current stack.
> 
> I'm summarizing the current state at the end of this mail, which can be
> useful for any potential mentor and eventually student.
> 
> > On the ideas page the mentioned subtitle for the qualification task is Spruce subtitle. It
> > seems, it is already supported, so I would try to implement the corepart of usf. I know
> > it is not widely used, but very powerful with similar features as SSA, if I get it right. Do you
> > think, that is suitable?
> > 
> 
> Spruce has indeed been added in last OPW as a qualification task. USF is
> more painful but a basic support could be a potential qualification task
> indeed. You might be able to figure out something playing with the
> ff_smil_* functions for the demuxing part.
> 
> So basically you would have to:
> 
> - an USF demuxer which extracts the timing and text (with its markup) of
>   every event, and put them into an AVPacket
> 
> - introduce an USF codec and write a decoder that will transform the
>   xml-like markup into ASS markup (see below)
I've implement such a demuxer and decoder, based on SAMI (see other mail).
But XML parsing with the builtin tools is a real pain and hard to extend later.

If the GSoC project come off, please let me change this and maybe the SAMI code
into code based on a xmllib. Header parsing should be doable then as well.

> 
> Again, I'm not a mentor, so you need confirmation from someone else.
> 
> > And then another question. You mentioned as ultimate goal the libavfilter integration.
> > If I get it right, ATM no rendering is implemented, and libavfilter would allow an (automatic)
> > rendering from SSA to e.g. dvdsub. Would the rendering itself part of the project (because
> > this is very extensive, I think)?
> > 
> 
> So, yeah, currently the subtitles are decoded into an AVSubtitle
> structure, which hold one or several AVSubtitleRect (AVSubtitle.rects[N]).
> 
> For graphic subtitles, each rectangle contains a paletted buffer and its
> position, size, ...
> 
> For text subtitles, the ass field contains the text in ASS markup: indeed,
> we consider the ASS markup to be the best/least worst superset supporting
> almost every style of every other subtitles formats have, so it's used as
> the "decoded" form for all text subtitles. For example, the SubRip (the
> "codec", or markup you find in SRT files) decoder will transform
> "<i>foo</i>" into "{\i1}foo{\i0}".
> 
> So far so good.  Unfortunately, this is not sufficient, because the
> AVSubtitle* structs are old and not convenient for several reasons:
> 
> - they are allocated on the stack by the users, so we can't extend them
>   (add fields) without breaking the ABI (= angry users).
> 
> - they are defined in libavcodec, and we do not want libavfilter to
>   depend on libavcodec for a core feature (we have a few filters
>   depending on it, but that's optional). As such, libavutil is a much
>   better place for this, which already contains the AVFrame.
> 
> - the graphic subtitles are kind of limited (palette only, can't hold YUV
>   or RGB32 pixel formats for instance)
> 
> - the handling of the timing is inconsistent: pts is in AV_TIME_BASE and
>   start/end display time are relative and in ms.
> 
> When these issues are sorted out, we can finally work on the integration
> within libavfilter, which is yet another topic where other developers
> might want to comment. Typically, I'm not sure what is the state of
> dealing with the sparse property of the subtitles. Nicolas may know :)
> 
> Anyway, there are multiple ways of dealing with the previous mentioned
> issues.
> 
> The first one is to create an AVSubtitle2 or something in libavutil,
> copying most of the current AVSubtitle layout but making sure the user
> allocates it with av_subtitle_alloc() or whatever, so we can add fields
> and extend it (mostly) at will.
> 
> The second one, which I'm currently wondering about these days is to try
> to hold the subtitles data into the existing AVFrame structure. We will
> for example have the frame->extended_data[N] (currently used by audio
> frames to hold the channels) point on a instances of a newly defined
> rectangle structure. Having the subtitles into AVFrame might simplify a
> lot the future integration within libavfilter since they are already
> supported as audio and video.  This needs careful thinking, but it might
> be doable.
> 
> But again, these are ideas, which need to be discussed and experimented. I
> don't know if it's a good idea for a GSoC, and I don't know who would be
> up for mentoring.
> 
> It's nice to finally see some interest into this topic though.
> 
> > regards,
> 
> Regards,
> 
> > Gerion
> 
>