[FFmpeg-devel] Status and Plans for Subtitle Filters

Michael Niedermayer michaelni at gmx.at
Thu Feb 27 00:13:08 EET 2020


On Tue, Feb 25, 2020 at 06:40:13PM +0100, Clément Bœsch wrote:
> On Sun, Feb 23, 2020 at 09:59:59PM +0100, Michael Niedermayer wrote:
> [...]
> > > The subtitles refactor requires to see the big picture and all the problems at
> > > once. 
> > 
> > really ?
> > just hypothetically, and playing the devils advocat here.
> > what would happen if one problem or set of problems is solved at a time ?
> 
> The first requirement of everything following is to define a new
> structure/API for holding the subtitles within the AVFrame (which has to
> live in lavu and not lavc like current API). So you have to address all
> the current limitations in that new API first, unless you're ready to
> change that new API 10x in the near future. 

yes i realized this implication when i wrote my mail and while it gave me
pause iam not sure this is a problem. This would not necessarily be public
API for user applications to use. Rather a step toward implementing the
new final API. Done that way to simplify things.

This like other comments is also really just a suggestion to simplify
the work. If it doesnt simplify anything it makes no sense of course.



> And even if you keep most of
> the current design, you still have to at least come up with ways to remove
> all the current hacks that would go away while moving to the new design.
> 
> > 
> > Maybe the thinking should not be "what are all the things that might need
> > to be considered"
> > but rather "what is the minimum set of things that need to be considered"
> > to make the first step towards a better API/first git push
> > 
> > 
> > 
> > > Since the core change (subtitles in AVFrame) requires the introduction of
> > > a new subtitles structure and API, it also involve addressing the shortcomings
> > > of the original API (or maybe we could tolerate a new API that actually looks
> > > like the old?). So even if we ignore the subtitle-in-avframe thing, we don't
> > > have a clear answer for a sane API that handles everything. Here is a
> > > non-exhaustive list of stuff that we have to take into account while thinking
> > > about that:
> > > 
> > > - text subtitles with and without markup
> > 
> > > - sparsity, overlapping
> > 
> > heartbeat frames would eliminate sparsity
> 
> Yes, and like many aspect of this refactor: we need to come up and
> formalize a convention. Of course I can make a suggestion, but there are
> many other cases and exceptions.
> 
> > what happens if you forbid overlapping ?
> 
> You can't, it's too common. The classic "Hello, hello" was already
> mentioned, but I could also mention subtitles used to "legend" the
> environment (you know, like, signposts and stuff) in addition to
> dialogues.

I do think i misunderstand something here
because if we have a video with a signpost shown from 0:00 to 1:00
and another shown from 0:30 to 1:30 then the subtitles translating
or commenting that would overlap.
and also the video frames showing these signposts overlap , ehm i mean
they dont overlap. That is what i do not understand.
Video frames dont do that and its fine
and then theres audio
someone playing a note on the trumpet and another a note on the piano
again we have 2 AVFrame overlapp i mean not overlapping.
So why subtitles ?

and one could even argue why it would make sense for audio to be
overlapping with this information about instruments and it is in 
midi and mod files. And a filter writing notes for the instruments
would benefit from this and simlar a midi encoder



[...]
> > > - bitmap subtitles and their potential colorspaces (each rectangle as an
> > >   AVFrame is way overkill but technically that's exactly what it is)
> > 
> > then a AVFrame needs to represent a collection of rectangles.
> > Its either 1 or N for the design i think.
> > Our current subtitle structures already have a similar design so this
> > wouldnt be really different.
> 
> Yeah, the new API prototype ended up being:
> 
> +#define AV_NUM_DATA_POINTERS 8
> +
> +/**
> + * This structure describes decoded subtitle rectangle
> + */
> +typedef struct AVFrameSubtitleRectangle {
> +    int x, y;
> +    int w, h;
> +
> +    /* image data for bitmap subtitles, in AVFrame.format (AVPixelFormat) */
> +    uint8_t *data[AV_NUM_DATA_POINTERS];
> +    int linesize[AV_NUM_DATA_POINTERS];
> +
> +    /* decoded text for text subtitles, in ASS */
> +    char *text;
> +

> +    int flags;

is 32bit flags enough ?
just bringing this up as a int64 is less ugly than a flags2

Thanks

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Elect your leaders based on what they did after the last election, not
based on what they say before an election.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20200226/995cde1a/attachment.sig>


More information about the ffmpeg-devel mailing list