[FFmpeg-devel] [PATCH] WIP: subtitles in AVFrame

Fri Nov 11 13:48:37 EET 2016

On Fri, 11 Nov 2016 12:45:03 +0100
Nicolas George <george at nsup.org> wrote:

> Le septidi 17 brumaire, an CCXXV, Clement Boesch a écrit :
> > I didn't. Duration is somehow broken currently for some reason. I did
> > nothing for sparseness: the reason I added basic support in lavfi is
> > because it was much simpler to handle at ffmpeg.c level, so it's currently
> > just a passthrough mechanism.  
> 
> A decision will need to be made pretty soon, though, or it will not help
> anything much. Sparseness will make things break badly OOM-killer-style
> if someone tries to use subtitles coming from the same multiplexed file
> than the corresponding video. Duration affects the visible API, a
> decision on it is more or less final. Here are the results of my current
> thoughts:
> 
> For sparseness, the solution is to use heartbeat frames: if you have a
> subtitle event at 10s and the next one at 50s, emit a frame with no
> payload and just a timestamp at 10.1, 10.2, ... 49.1.

Sounds like a bad idea.

> 
> Whoever is supposed to emit the frame can be decided later. The simplest
> idea is to connect sbuffersrc to a master vbuffersrc: when a frame is
> added on the master, consider generating heartbeat frames on the slaves.
> 
> The code needs to be ready immediately to handle the heartbeat frames.
> At least have a way of expressing them: maybe data[0] == NULL? And the
> code needs to not segfault on them.
> 
> The duration issue is more tricky, because there are so many cases. Here
> is a scheme I think should work:
> 
> Each subtitle screen decodes into two subtitles frames: one to show it,
> one to clear it. The clear frame needs to reference the corresponding
> start frame, to allow for overlap.
> 
>   For example, the following ASS dialogue:
> 
>   Dialogue: 0,0:00:10.00,0:00:15.00,,,,,,,Long dialogue line.
>   Dialogue: 0,0:00:12.00,0:00:13.00,,,,,,,Short.
> 
>   would decode into:
> 
>   pts=10 id=42 text="Long dialogue line."
>   pts=12 id=43 text="Short."
>   pts=13 id=43 clear
>   pts=15 id=42 clear
> 
> When the duration is entirely reliable (text files read all at once and
> correctly processed), the decoder generates both frames immediately and
> keeps the clear frame in a reorder buffer.
> 
> When the duration is not entirely reliable, the decoder should generate
> the clear frame when it gets the corresponding packet (either the clear
> packet or the next start packet). If the duration is known but not
> reliable (dvdsub, dvbsub), the decoder should use it as a cap when
> waiting for the actual end.
> 
> The decoder needs some kind of heartbeat flush API to get the pending
> clear frames. We may want an option to disable internal reordering and
> get clear frames immediately.
> 
> When the duration is not known but not reliable, we may set some kind of
> "expiration timestamp" on the start frame, but I am not sure it is
> necessary.
> 
> Whether the duration is reliable or not is a property of both the codec
> and the format. For example, mkvmerge does not de-overlap events when
> muxing SRT into Matroska, therefore the duration is not known. On the
> other hand, when lavf reads directly from a SRT file, it can de-overlap
> easily. I suppose it means AVCodecParameters needs an extra field.
> 
> > I did't like having multiple fields for text based data. If we want to
> > decode in another form, we can still add an option to print out verbatim
> > text instead of ASS markup.  
> 
> I think we are not talking about the same thing. A long time ago, we
> considered replacing the ASS markup with a simple text field with
> styling in a separate, non-text, structure. Did you discard that idea?
> 
> For CSS-based subtitles, a richer data structure would make it slightly
> less hard to preserve the structure of the styling information.
> 
> > Yeah I guess I'll need to write a filter to blend in the final patchset
> > submission.  
> 
> Or just a sub->video filter.
> 
> Regards,
>