[FFmpeg-devel] [PATCH] matroska: Identify S_TEXT/UTF-8 tracks as SRT and not TEXT.

Nicolas George nicolas.george at normalesup.org
Tue May 22 16:59:49 CEST 2012


Le tridi 3 prairial, an CCXX, Philip Langdale a écrit :
> 1) What is the "subtitle decoder?"

Before that, we have to ask: What is a decoded subtitle?

We know what a decoded video frame is: array(s) of pixel values. We know
what a decoded audio frame is: array(s) or PCM sample values.

But we do not have consensus on what decoded data is supposed to look like
for text-based subtitles format. Surely it consists mostly of text. But what
about markup?

Once there is a clear policy about that, we can start worrying about what is
the role of the demuxer and what is the role of the decoder.

My opinion:

* Decoded text subtitles should be in a structure where the time and
  duration is stored as a number, and the text in a string without the
  timing information.

* The text should always be in Unicode (probably UTF-8, but using ints
  instead of chars is also a reasonable option (wchar_t is not)).

* For the markup, two options:

  - Choose / invent a universal markup syntax and always convert to/from it.

  - Handle markup syntax more or less the same way we handle pixel or sample
    formats.

  Note that subtitles are much less performance-relevant than video frames,
  so the systematic conversion is not a problem. On the other hand, there
  are so many features in various subtitles formats that maintaining an
  universal syntax would probably be problematic. So I guess I am rather in
  favour of the second solution.

* Regarding the role of the demuxer, it looks like for SRT and ASS,
  Matroska/mkvmerge does what I believe is best: parse the timestamps and
  store them as part of the packet structure, and leave the rest of the text
  for the packet payload. Other demuxers should do the same.

For example, consider the following SRT file:

1
00:00:01,000 --> 0:00:02,000
Hello <i>World</i>.

The demuxer should output (AVPacket){ .pts = 1000, .duration = 1000, .data =
"Hello <i>World</i>. }.


Regards,

-- 
  Nicolas George
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20120522/a6d26a9c/attachment.asc>


More information about the ffmpeg-devel mailing list