[MPlayer-dev-eng] ASS/SSA discussions

Sun Oct 26 14:26:03 CET 2008

On Sun, 2008-10-12 at 23:14 +0200, Michael Niedermayer wrote:
> On Fri, Sep 26, 2008 at 04:42:38AM +0300, Uoti Urpala wrote:
> > On Fri, 2008-09-26 at 01:17 +0200, Michael Niedermayer wrote:
> > > On Wed, Sep 24, 2008 at 02:03:38AM +0300, Uoti Urpala wrote:
> > > > On Tue, 2008-09-23 at 23:16 +0200, Michael Niedermayer wrote:

> > > > > Now this is not so much different from video and audio
> > > > > the decoder converts a codec specific bitstream into a common and simple
> > > > > representation (a bitmap or a bunch of PCM samples).
> > > > > 
> > > > > Within this framework, subtitles are trivially editable, not only the
> > > > 
> > > > They won't be trivially editable at least if you want to store the
> > > > result in an existing format.
> > > > 
> > > > There is no "simple representation" for all SSA/ASS effects other than
> > > > naming the specific effect. Audio codecs can be decoded to PCM in some
> > > > sample format and most video codecs can be decoded to bitmaps, but
> > > > subtitles are more like vector graphics. There is no simple format that
> > > > could accurately represent every input.
> > > 
> > > Iam not interrested in what you would prefer cannot be done or did not exist.
> > 
> > What I stated were facts, not opinions or preferences. Do you claim that
> > some of those facts were false, or are you saying that you are not
> > interested in what the facts are? (Your recent behavior in this thread
> > does give that impression.)
> 
> yes i do claim some of the things you preset as "facts" are false.
> for example 
> "They won't be trivially editable at least if you want to store the
>  result in an existing format."
> 
> Its surely possible to
> mkv -> ass -> some easy editable struct -> edit the struct -> ass -> mkv
> its like
> mpeg-ps -> mpeg2-es -> raw -> filter -> mpeg2-es -> mpeg-ps
> or
> mkv -> mpeg4 -> raw -> filter -> raw -> mpeg4 -> mkv

I already explained above why that analogy is false. Bitmap-based
formats can be reduced to "raw" bitmaps, but there is no "raw" vector
graphics format.

> also you present irrelevant things as if their truth (if they are true) would
> be in some way a problem.
> for example:
> "There is no "simple representation" for all SSA/ASS effects other than
>  naming the specific effect."
> 
> Its not as if "naming the specific effect" would be a problem or a bad choice

You end up with the union of all the formats you support and you need to
separately consider the conversion of every effect to every other format
which does not support it exactly.

> or
> "There is no simple format that
>  could accurately represent every input."
> 
> the truth of this depends on what one considers "simple" and "every input"
> this statement is one of these useless generic, things that litterally taken
> means noting but quickly read sounds like something big.

It's the essential difference from bitmap formats that can be reduced to
a simple raw format. If you think it "means nothing" the problem is in
your understanding.

> What matters is which system is better not if one of the systems fails to
> achive the unachivable. With such "arguments" one can make anything look bad
> like:
>  there is no airplane that can get you to every spot

The way you described your desired system it was trying to "achieve the
unachievable". I'm saying that your designs are unrealistic. If you
claim to be making an airplane that can take you to every spot instantly
then it's worth pointing out that won't work.

> > > Ill not try to repeat the same awnser again though, you can look it up in the
> > > thread.
> > 
> > You have not given any answers that would show the issues do not apply.
> > 
> > 1) The reason why SSA/ASS differ from usual video packets (two timed
> > events per packet).
> > You have not given any "answer" that would contradict this. Your only
> > attempts have been stupid excuses interpreting interlaced frame decoding
> > as "timed events".
> 
> As ive said i do not have much interrest in this whole thread, thats also
> why i make no attempt to reply quicker.
> And i surely have awnsered this but ive the feeling that you just do not
> know how mpeg-ps works.
> mpeg-ps does not have timestamps for each frame and it does not even store
> each frame in a seperate container packet. So from a containers point of
> view there surely can be several (even hundreads) of frames in a
> single packet, their timestamps are only known from the durations in their
> headers.

And this is relevant how? Would you store video this way in any of the
containers you'd use SSA with?

> > 2) Your proposed format is unsuitable for muxing because is uses
> > absolute timestamps.
> 
> your bike is unsuitable because you painted it pink
> 
> 
> > Earlier in the thread you did seem to understand that this is a serious
> > flaw.
> 
> > However your only "answer" was to propose reinterpreting the
> > meaning of the fields so that duration would be stored by setting the
> > start and stop to some values duration apart, instead of using the
> > normal semantics of those fields.
> 
> Thats the second time you put your words in my mouth and then argue against
> the result. Can you maybe just continue the discussion without me? It does
> not seem to matter that much what i reply anyway ..
> 
> 
> > This would be incompatible with
> > anything using the normal semantics and so strictly inferior to storing
> > just a duration field (which would equally differ from lines in .ass
> > files, but would make sense as a way of storing duration). 
> 
> > Then you said
> > the "start" field would always have to exactly match the container pts
> > to maintain some level of compatibility with .ass line semantics. This
> > in turn has obvious problems with duplication of data, requiring extra
> > work to rewrite packets after any changes (and you said you wanted to
> > avoid extra parsing?),
> 
> I thought we agreed that the updating the timestamps is needed either way when
> they change ...

No we didn't. Not only that, I already clarified that at least once.
There is never a need to rewrite packets after changing their timeline
position if they're stored in a sane format that doesn't use absolute
timestamps.

What I suppose you're again referring to is changing timing in a way
which alters packet duration, which works correctly without packet
changes most of the time but has exceptions for certain effects.

> > consistency of player behavior when the values do
> > not match (handling this robustly would require more parsing), 
> 
> when the values do not match, the file is broken, true 2 players tend not to
> gurantee to play broken files identically to each other.

Saying "file is broken" doesn't mean you get to shift the blame for all
problems to others when you're advocating a design which makes such
brokenness more likely.

> > and it's
> > also completely inconsistent with the way video codecs are treated.
> 
> I am talking against a brick wall it seems ...

Do you want to require that video codec-level time stamps must match the
container ones in NUT or the file is considered "broken"? Didn't think
so.

> > 3) Your proposed format cannot be used to represent tracks from Matroska
> > without losing information.
> > You haven't given anything that could be called an answer. The only
> > related things you've said have been vague comments about how the
> > ReadOrder information wouldn't be that useful anyway, clearly without
> > much clue about whether or how people use it. Are you explicitly saying
> > that in your opinion ReadOrder information should always be treated as
> > completely worthless, should not be provided to programs that try to use
> > the libavformat Matroska demuxer, and should be destroyed when remuxing
> > Matroska files with FFmpeg? Or not?
> 
> I said, that we need to know a specific use case first before exporting
> information. there is a lot of stuff in containers
> some of it is usefull, some not. If ReadOrder is usefull we will find a
> way to preserve it and export it. If noone can provide a real (compared to
> constructed and obscure) use case then we (in ffmpeg at least) will avoid
> the extra code and complexity to export it.
> its the difference between 
> "i want X" vs. "i need X as its the only/best way to do Y"
> 
> also as random other example, we dont export the 2 bytes AVI uses to identify
> streams (i mean the "dc" "wb" stuff) the reason is the same, noone asked for
> it with an explanation for what it would be usefull.

And how many people have asked for those AVI values, and how many
programs use them? MPlayer already has code that uses ReadOrder. I
mentioned benefiting from it in my own use even though I don't work much
with subtitles. IMO that attitude of refusing to provide container data,
even when it clearly does get used, unless you're given reasons you
personally agree with is inappropriate.