[MPlayer-dev-eng] [PATCH] libass: fix parsing of tracks extracted from containers

Ivan Kalvachev ikalvachev at gmail.com
Mon Sep 15 15:52:17 CEST 2008


On 9/14/08, Uoti Urpala <uoti.urpala at pp1.inet.fi> wrote:
> On Sun, 2008-09-14 at 14:49 +0300, Ivan Kalvachev wrote:
>> So could somebody give nice explanation with _examples_ of both
>> methods and the differences associated in them.
>
> Currently Matroska is the only container that supports ASS subtitles.
> The bitstream contents of each subtitle packet are <ReadOrder, ASS
> subtitle contents>. ReadOrder is the line number where this subtitle was
> stored in a .ass file; this allows reconstructing a file similar to the
> original from the muxed version and has also been useful in MPlayer to
> identify which subtitles have already been seen and which are repeated
> after a seek back. Information about subtitle start and stop times is
> stored at the container level.
>
> The subtitle format contains two timed changes to subtitle content in
> each packet: the time when the subtitle appears, and the time when it
> disappears. This causes a problem for containers that only support a
> single timestamp per packet. To mux the subtitles into such a container
> you need to either split each packet in two or use a hack moving part of
> the timing information inside the bitstream of the packet. Aurelien
> wanted to pick such a format. He insisted on using an existing format
> instead of designing a new one; however as no one is yet using ASS with
> containers that would need such a format a suitable one did not exist.
> So instead he made up a packet format based on the spec for lines in
> a .ass file. This was not a format meant for muxing and is not suitable
> for it.
>
> Aurelien's format has <absolute start time, absolute stop time, ASS
> subtitle contents>. This duplicates the start time which practically all
> containers CAN store. Using absolute timestamps inside the bitstream is
> a bad choice as it means all timing changes require modifying the
> bitstream. If you need to move part of the timing information there then
> the best choice is to store the duration from start to end instead. This
> allows changing the timing without bitstream modification as long as
> duration does not change. Aurelien's format also lacks support for
> ReadOrder so you cannot mux SSA/ASS from Matroska into his format
> without losing information.
>
> I think the best format for containers that only support a single time
> value would be <duration, ReadOrder, ASS subtitle contents>. This avoids
> absolute timestamps and can be easily converted to/from the Matroska
> format (which is used by all existing content) by removing/adding the
> "duration, " prefix.
>
> Another question is what format to use internally in lavf and in MPlayer
> (assuming you want to convert all ASS packets to a standard format).
> Aurelien wants to use his format because in theory it could be muxed
> into more containers unmodified. I think the Matroska format is a better
> choice. Moving the timing information inside the bitstream is a
> workaround for container limitations, and even if such limitations are
> common that doesn't mean you should limit your program in the same way.
> Also his justification relies on the unproven assumption that multiple
> containers would define ASS storage to work the way he prefers; at the
> moment no container does that.
>
> The code changes Aurelien wanted in MPlayer also had the additional
> breakage that they moved all timing information inside the codec
> bitstream, completely ignoring what was stored at the demuxer level.

Your explanation still lacks concrete examples;)

I googled a little, I guess this link http://www.matroska.org/technical/specs/subtitles/ssa.html is giving detailed explanation: 

The first dialogue .ass line looks like:
"Dialogue: Marked=0,0:02:40.65,0:02:41.79,Wolf main,Cher,0000,0000,0000,,Et les enregistrements de ses ondes delta ?"

Becomes something like:
{
  Block's timecode: 00:02:40.650
  BlockDuration: 00:00:01.140
  "1,,Wolf main,Cher,0000,0000,0000,,Et les enregistrements de ses ondes delta ?"
}
Where, the leading "1" is the ReadOrder.

So, storing .ass in mkv uses special format. The line is modified, times are removed and placed in mkv specific tags, it seems that there is more stricter field placements (aka no free text tags). ReadOrder is not exactly the line number of the .ass file, as the (common) header of .ass file is stored in the CodecPrivate element of mkv.

So the stuff I don't understand.

1. Is "Block's timecode" the only PTS that this packet have?

2. Why do we want the demuxer to mess with the string (aka, data, aka payload) instead of just passing it. In other words, why do we dump useful data?

3. If "Block's timecode" is stored in AVPacket.pts and "Block's duration" is stored in AVPacket.convergence_duration, doesn't that mean we have everything we need for muxing?

Let's say that we want to use some container that only stores packets and no additional info (not even pts). Then avformat muxer for that format is free to create it's own way, like:
a) prefix the payload with "{0,0:02:40.65,0:02:41.79}\0" , and let the corresponding demuxer parse it on reading. (aka, create its own format).
b) Just dump the numbers as int64_t.
c) recreate original .ass line.


What are the benefits:
Zero changes and processing in mkv.
Per (de)muxer decision for optimal storage format.
No forward and backward conversion through new artificial formats.

In short, I'm all for the KiSS principle. I am strongly against using artificial internal formats. This creates additional processing in all of demuxer, muxer and players.
It could introduce rounding errors.


About the ReadOrder. It both creates and solves problems:

1. Srt subtitles have ReadOrder as part of  the original subtitle. Ironically it is removed when stored in mkv.

2. On video editing, it should be renumbered. It means we have to parse the strings. It's mandatory if we add more subtitles (or concatenate) .

3. When seeking AVPacket.pos could be used as indicator instead of ReadOrder, this means we should store .pos for all already parsed and buffered packets.

4. It would have been best if ReadOrder have been implemented as separate xml/obml tag, and respectively stored in AVPacket.

In #4 we have another problem. What if we have more than one subtitle description. e.g. when 2 people are speaking at the same time, and subtitles are marking their text with different style (color).

In that case we can have different packets all with same pts and duration, but having different payload. This of course makes the bitstream larger and I won't be surprised if somebody had already used mkv to store many sub lines with different ReadOrder in same packet.
e.g.
{
  Block's timecode: 00:02:40.650
  BlockDuration: 00:00:01.140
  "1,,Titre_episode,Cher,0000,0000,0000,,TITRE.
   2,,Wolf main,Cher,0000,0000,0000,,Et les enregistrements de ses ondes delta ?"
}

Now if we want to fix that, we can make the demuxer output different packets for each line. However if we strip out the ReadOrder and we use real AVPacket.pos, all these packets are going to have same .pos and would be ignored as duplicates. Making them with different .pos would fix the above problem but would break seeking to the subtitle packet.

In short, we need to preserve ReadOrder in one form or another. How about using AVPacket.dts ? It is form of dts after all ;)



Summary:

No new artificial text formats!

1. Return one packet per line, containing :pts,duration,dts, and payload with removed ReadOrder. Modify mplayer/libass to pass and parse in this format.

or
2. Return one full packet, containing: pts, duration and unmodified payload, containing one or more lines starting with ReadOrder.

or
3. Return one packet. containing recreated payload in form of .ass lines, and use .pos to distinguish already parsed packets.

Leave it to the (de)muxer to decide how to store all the info in its own format.



More information about the MPlayer-dev-eng mailing list