[MPlayer-dev-eng] ASS/SSA discussions

Michael Niedermayer michaelni at gmx.at
Sun Oct 12 23:14:07 CEST 2008


On Fri, Sep 26, 2008 at 04:42:38AM +0300, Uoti Urpala wrote:
> On Fri, 2008-09-26 at 01:17 +0200, Michael Niedermayer wrote:
> > On Wed, Sep 24, 2008 at 02:03:38AM +0300, Uoti Urpala wrote:
> > > On Tue, 2008-09-23 at 23:16 +0200, Michael Niedermayer wrote:

[random insult]

> 
> > > Would the demuxer set _anything_ at all in the packets then?
> > > display_duration is at the same level as start time information. Would
> > > you omit both or for some reason parse one but not the other?
> > 
> > I can just repeat that there will be no random code duplication in ffmpeg,
> > and having avi, nut, asf, ... parse the duration would be code duplication.
> 
> If those will use the same format (in case they get a standard format at
> all) they can use the same function to parse it. Talking about the "code
> duplication" of calling the function is silly.

one can have every demxuer call some abitrary transform function like zlib
too, still this does not make it a good or sane design.
If a solution exists that is simpler and otherwise equivalent that should
be prefered.


> 
> Why are you bringing up those demuxers when talking about .ass? .ass is
> clearly a different format and you do have to parse the line to get any
> information at all.
> 
> > Not to mention that i will try to minimize codec specific code in demuxers.
> 
> That's hardly an argument when talking about a .ass demuxer.

ass can be stored in more containers than .ass


> 
> > > > It seems you missed my past comments ...
> > > > What ffmpeg is heading toward is
> > > > * the demuxers return subtitle packets like any other packet 
> > > > * the subtite decoder decodes these packets to a common subtitle structure
> > > >   (AVSubtitle) containing utf-8 text, timestamps/durations, positions,
> > > >   effects, bitmaps, font references, ...
> > > > * A common subtitle renderer renders these so they can be displayed or
> > > >   a subtitle encoder encodes them to a possibly diferent format again.
> > > 
> > > What do you mean by "heading toward"? Is someone going to actually
> > > implement this? Who? I've seen no indication of such work being done.
> > 
> > It will be implemented as subtitle decoders are implemented. You surely
> > can see that the existing decoders alraedy use AVSubtitle and AVSubtitle
> > is a vector based container not a sigle bitmap.
> 
> I see no work toward anything applicable to SSA. 

ffmpeg does not support SSA decoding or rendering yet.


> 
> > > > Now this is not so much different from video and audio
> > > > the decoder converts a codec specific bitstream into a common and simple
> > > > representation (a bitmap or a bunch of PCM samples).
> > > > 
> > > > Within this framework, subtitles are trivially editable, not only the
> > > 
> > > They won't be trivially editable at least if you want to store the
> > > result in an existing format.
> > > 
> > > There is no "simple representation" for all SSA/ASS effects other than
> > > naming the specific effect. Audio codecs can be decoded to PCM in some
> > > sample format and most video codecs can be decoded to bitmaps, but
> > > subtitles are more like vector graphics. There is no simple format that
> > > could accurately represent every input.
> > 
> > Iam not interrested in what you would prefer cannot be done or did not exist.
> 
> What I stated were facts, not opinions or preferences. Do you claim that
> some of those facts were false, or are you saying that you are not
> interested in what the facts are? (Your recent behavior in this thread
> does give that impression.)

yes i do claim some of the things you preset as "facts" are false.
for example 
"They won't be trivially editable at least if you want to store the
 result in an existing format."

Its surely possible to
mkv -> ass -> some easy editable struct -> edit the struct -> ass -> mkv
its like
mpeg-ps -> mpeg2-es -> raw -> filter -> mpeg2-es -> mpeg-ps
or
mkv -> mpeg4 -> raw -> filter -> raw -> mpeg4 -> mkv

also you present irrelevant things as if their truth (if they are true) would
be in some way a problem.
for example:
"There is no "simple representation" for all SSA/ASS effects other than
 naming the specific effect."

Its not as if "naming the specific effect" would be a problem or a bad choice

or
"There is no simple format that
 could accurately represent every input."

the truth of this depends on what one considers "simple" and "every input"
this statement is one of these useless generic, things that litterally taken
means noting but quickly read sounds like something big.
What matters is which system is better not if one of the systems fails to
achive the unachivable. With such "arguments" one can make anything look bad
like:
 there is no airplane that can get you to every spot


> 
> > > > This is certainly not true, as it is not done currently by any (de)muxer 
> > > > and doing it would add very significant and complex code to every
> > > > demuxer. And yes iam speaking about the general case here, not just ass, if you
> > > > mean just ass, then i honestly do not understand why it should be a special
> > > > case.
> > > 
> > > The reasons why SSA/ASS subtitles are different from a standard video
> > > codec have already been explained in the thread (a couple of times). But
> > > I'll try once again:
> > 
> > Ill not try to repeat the same awnser again though, you can look it up in the
> > thread.
> 
> You have not given any answers that would show the issues do not apply.
> 
> 1) The reason why SSA/ASS differ from usual video packets (two timed
> events per packet).
> You have not given any "answer" that would contradict this. Your only
> attempts have been stupid excuses interpreting interlaced frame decoding
> as "timed events".

As ive said i do not have much interrest in this whole thread, thats also
why i make no attempt to reply quicker.
And i surely have awnsered this but ive the feeling that you just do not
know how mpeg-ps works.
mpeg-ps does not have timestamps for each frame and it does not even store
each frame in a seperate container packet. So from a containers point of
view there surely can be several (even hundreads) of frames in a
single packet, their timestamps are only known from the durations in their
headers.
And if you want a different example, take H.263 PB frames these have 2 frames,
that is a P frame and a B frame coded togeher, and "together" here means that
macro blocks of both are mixed not that one could split the bitstream in the
middle to seperate them.


> 
> 2) Your proposed format is unsuitable for muxing because is uses
> absolute timestamps.

your bike is unsuitable because you painted it pink


> Earlier in the thread you did seem to understand that this is a serious
> flaw.

> However your only "answer" was to propose reinterpreting the
> meaning of the fields so that duration would be stored by setting the
> start and stop to some values duration apart, instead of using the
> normal semantics of those fields.

Thats the second time you put your words in my mouth and then argue against
the result. Can you maybe just continue the discussion without me? It does
not seem to matter that much what i reply anyway ..


> This would be incompatible with
> anything using the normal semantics and so strictly inferior to storing
> just a duration field (which would equally differ from lines in .ass
> files, but would make sense as a way of storing duration). 

> Then you said
> the "start" field would always have to exactly match the container pts
> to maintain some level of compatibility with .ass line semantics. This
> in turn has obvious problems with duplication of data, requiring extra
> work to rewrite packets after any changes (and you said you wanted to
> avoid extra parsing?),

I thought we agreed that the updating the timestamps is needed either way when
they change ...


> consistency of player behavior when the values do
> not match (handling this robustly would require more parsing), 

when the values do not match, the file is broken, true 2 players tend not to
gurantee to play broken files identically to each other.


> and it's
> also completely inconsistent with the way video codecs are treated.

I am talking against a brick wall it seems ...


> 
> 3) Your proposed format cannot be used to represent tracks from Matroska
> without losing information.
> You haven't given anything that could be called an answer. The only
> related things you've said have been vague comments about how the
> ReadOrder information wouldn't be that useful anyway, clearly without
> much clue about whether or how people use it. Are you explicitly saying
> that in your opinion ReadOrder information should always be treated as
> completely worthless, should not be provided to programs that try to use
> the libavformat Matroska demuxer, and should be destroyed when remuxing
> Matroska files with FFmpeg? Or not?

I said, that we need to know a specific use case first before exporting
information. there is a lot of stuff in containers
some of it is usefull, some not. If ReadOrder is usefull we will find a
way to preserve it and export it. If noone can provide a real (compared to
constructed and obscure) use case then we (in ffmpeg at least) will avoid
the extra code and complexity to export it.
its the difference between 
"i want X" vs. "i need X as its the only/best way to do Y"

also as random other example, we dont export the 2 bytes AVI uses to identify
streams (i mean the "dc" "wb" stuff) the reason is the same, noone asked for
it with an explanation for what it would be usefull.

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

I know you won't believe me, but the highest form of Human Excellence is
to question oneself and others. -- Socrates
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/mplayer-dev-eng/attachments/20081012/da7f55ee/attachment.pgp>


More information about the MPlayer-dev-eng mailing list