[FFmpeg-devel] [PATCH] WIP: subtitles in AVFrame

Fri Nov 11 13:45:03 EET 2016

Le septidi 17 brumaire, an CCXXV, Clement Boesch a écrit :
> I didn't. Duration is somehow broken currently for some reason. I did
> nothing for sparseness: the reason I added basic support in lavfi is
> because it was much simpler to handle at ffmpeg.c level, so it's currently
> just a passthrough mechanism.

A decision will need to be made pretty soon, though, or it will not help
anything much. Sparseness will make things break badly OOM-killer-style
if someone tries to use subtitles coming from the same multiplexed file
than the corresponding video. Duration affects the visible API, a
decision on it is more or less final. Here are the results of my current
thoughts:

For sparseness, the solution is to use heartbeat frames: if you have a
subtitle event at 10s and the next one at 50s, emit a frame with no
payload and just a timestamp at 10.1, 10.2, ... 49.1.

Whoever is supposed to emit the frame can be decided later. The simplest
idea is to connect sbuffersrc to a master vbuffersrc: when a frame is
added on the master, consider generating heartbeat frames on the slaves.

The code needs to be ready immediately to handle the heartbeat frames.
At least have a way of expressing them: maybe data[0] == NULL? And the
code needs to not segfault on them.

The duration issue is more tricky, because there are so many cases. Here
is a scheme I think should work:

Each subtitle screen decodes into two subtitles frames: one to show it,
one to clear it. The clear frame needs to reference the corresponding
start frame, to allow for overlap.

  For example, the following ASS dialogue:

  Dialogue: 0,0:00:10.00,0:00:15.00,,,,,,,Long dialogue line.
  Dialogue: 0,0:00:12.00,0:00:13.00,,,,,,,Short.

  would decode into:

  pts=10 id=42 text="Long dialogue line."
  pts=12 id=43 text="Short."
  pts=13 id=43 clear
  pts=15 id=42 clear

When the duration is entirely reliable (text files read all at once and
correctly processed), the decoder generates both frames immediately and
keeps the clear frame in a reorder buffer.

When the duration is not entirely reliable, the decoder should generate
the clear frame when it gets the corresponding packet (either the clear
packet or the next start packet). If the duration is known but not
reliable (dvdsub, dvbsub), the decoder should use it as a cap when
waiting for the actual end.

The decoder needs some kind of heartbeat flush API to get the pending
clear frames. We may want an option to disable internal reordering and
get clear frames immediately.

When the duration is not known but not reliable, we may set some kind of
"expiration timestamp" on the start frame, but I am not sure it is
necessary.

Whether the duration is reliable or not is a property of both the codec
and the format. For example, mkvmerge does not de-overlap events when
muxing SRT into Matroska, therefore the duration is not known. On the
other hand, when lavf reads directly from a SRT file, it can de-overlap
easily. I suppose it means AVCodecParameters needs an extra field.

> I did't like having multiple fields for text based data. If we want to
> decode in another form, we can still add an option to print out verbatim
> text instead of ASS markup.

I think we are not talking about the same thing. A long time ago, we
considered replacing the ASS markup with a simple text field with
styling in a separate, non-text, structure. Did you discard that idea?

For CSS-based subtitles, a richer data structure would make it slightly
less hard to preserve the structure of the styling information.

> Yeah I guess I'll need to write a filter to blend in the final patchset
> submission.

Or just a sub->video filter.

Regards,

-- 
  Nicolas George
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20161111/5ab49888/attachment.sig>