[MPlayer-dev-eng] ASS/SSA discussions

Wed Sep 24 01:03:38 CEST 2008

On Tue, 2008-09-23 at 23:16 +0200, Michael Niedermayer wrote:
> On Sun, Sep 21, 2008 at 01:06:00AM +0300, Ivan Kalvachev wrote:

> besides#2, the ass demuxer in libavcodec will not parse, nor set the
> display_duration. Its the ass AVParser that might, it would be nasty
> code duplication if demuxers would do it ...

Do you intend to actually implement a .ass demuxer?

Would the demuxer set _anything_ at all in the packets then?
display_duration is at the same level as start time information. Would
you omit both or for some reason parse one but not the other?

> It seems you missed my past comments ...
> What ffmpeg is heading toward is
> * the demuxers return subtitle packets like any other packet 
> * the subtite decoder decodes these packets to a common subtitle structure
>   (AVSubtitle) containing utf-8 text, timestamps/durations, positions,
>   effects, bitmaps, font references, ...
> * A common subtitle renderer renders these so they can be displayed or
>   a subtitle encoder encodes them to a possibly diferent format again.

What do you mean by "heading toward"? Is someone going to actually
implement this? Who? I've seen no indication of such work being done.

> Now this is not so much different from video and audio
> the decoder converts a codec specific bitstream into a common and simple
> representation (a bitmap or a bunch of PCM samples).
> 
> Within this framework, subtitles are trivially editable, not only the

They won't be trivially editable at least if you want to store the
result in an existing format.

There is no "simple representation" for all SSA/ASS effects other than
naming the specific effect. Audio codecs can be decoded to PCM in some
sample format and most video codecs can be decoded to bitmaps, but
subtitles are more like vector graphics. There is no simple format that
could accurately represent every input.

> You wouldnt want to apply video filters to packets prior to the video
> decoder, so why do you base your arguments on changing things prior to
> subtitle decoder?

Your generic "decoded subtitle" format is all talk and vaporware so far,
and there is reason to doubt it'll appear in the future either. You
shouldn't base arguments on the assumption that such a format will
appear. 

> > Summary:
> > Demuxer must remove fields that it have parsed and stored in AVPacket structure.
> 
> see above
> 
> 
> > Anything else would lead to increasing of code duplication and special handling.
> 
> This is certainly not true, as it is not done currently by any (de)muxer 
> and doing it would add very significant and complex code to every
> demuxer. And yes iam speaking about the general case here, not just ass, if you
> mean just ass, then i honestly do not understand why it should be a special
> case.

The reasons why SSA/ASS subtitles are different from a standard video
codec have already been explained in the thread (a couple of times). But
I'll try once again:

Some video codecs have their own timestamps. But those are NOT used for
timing when muxed in a normal container. If Matroska packets had
start/stop fields, completely ignored, whose value would never make any
difference for any use (container-level start/duration would always be
used) that would be comparable to video codecs with ignored timestamps.
Such values would not be parsed or placed in AVPacket fields either.
However you're talking about a format that would be used to move some of
the _semantically significant_ timing information inside the bitstream
because of containers that cannot represent it at the natural container
level. Keeping such fields in the packets is real duplication of data
(or if you want them to be completely unused in the internal packets
then you must at least rewrite the packets in every muxer to make sure
the contents aren't used).

The specific format you propose is not suitable for muxing because it
contains absolute timestamps in the bitstream. Again this is different
from the timestamps in video codecs because in this case the timestamps
would really be _used_.

The specific format you propose also lacks a way to express the
ReadOrder information, so you'd need another way to convey that (or
applications would be unable to access all the original information in a
Matroska file and remuxing would lose information).