[FFmpeg-devel] How do we deal with muxing 3GPP Timed Text subtitles?

Sun Jul 8 06:38:52 CEST 2012

Hi all,

As you know, I'm currently working on an encoder for 3GPP
timed text. The actual encoder itself is straight forward;
the much harder question is muxing, and there are two
challenges.

1) As mp4 doesn't store duration, the end of a subtitle is
represented by an empty subtitle, unless it is intended to
remain visible until the next real subtitle appears.

2) A fully conforming file always as a sample at pts==0. For
subtitles, which hardly ever have anything at pts==0, this
means there's usually an empty subtitle at pts==0.

This means that we do not have a 1:1 relationship between
samples passed into the decoder and samples written to the
file by the muxer. Somewhere in the middle, we need to generate
the empty subtitles we need to mux correctly.

The current code I have in my tree for this is sub-optimal, and
it does the following:

1) When the first sample is seen, and pts != 0, write an empty
subtitle at pts == 0 before the first real subtitle. This sucks
because it could mean the first subtitle sample is many minutes
into the stream, which will lead to buffering problems on playback.

2) After a real sample is written, write another empty sample to
mark the duration. To try and avoid non-monotonically increasing
pts, the pts of the empty sample is (pts + duration - 10). This
is shitty, and doesn't always work due to rounding. Ideal, we would
know if the next subtitle is set to appear immediately after the
current subtitle and then decide whether to generate an empty
subtitle or not.

In both cases, what we really want to know is information about
the next sample, but how do we get it? In general, it is not
possible to fully extract the subtitle stream, as the source
may not be seekable, but I feel this is the only available option.
To add to the inconvenience, I think this has to be a demux-side
behaviour, as I can't see a way that the muxer can transparently
buffer the whole subtitle stream if it can't control the order or
interleave of incoming packets.

Existing tools that support muxing timed text all work by fully
extracting the source subtitles before attempting to mux the
output file.

I'd love to hear any thoughts or ideas you had about this.

Thanks,

--phil

--phil