[FFmpeg-devel] [PATCH v20 02/20] avutil/frame: Prepare AVFrame for subtitle handling

Sun Dec 5 23:30:59 EET 2021

5 Dec 2021, 20:21 by softworkz at hotmail.com:

>
>
>> -----Original Message-----
>> From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of Lynne
>> Sent: Sunday, December 5, 2021 5:40 PM
>> To: FFmpeg development discussions and patches <ffmpeg-devel at ffmpeg.org>
>> Subject: Re: [FFmpeg-devel] [PATCH v20 02/20] avutil/frame: Prepare AVFrame
>> for subtitle handling
>>
>> 5 Dec 2021, 17:23 by softworkz at hotmail.com:
>>
>> > @@ -491,6 +499,39 @@ typedef struct AVFrame {
>> >  */
>> >  uint64_t channel_layout;
>> >
>> > +    /**
>> > +     * Display start time, relative to packet pts, in ms.
>> > +     */
>> > +    uint32_t subtitle_start_time;
>> > +
>> > +    /**
>> > +     * Display end time, relative to packet pts, in ms.
>> > +     */
>> > +    uint32_t subtitle_end_time;
>> >
>>
>> Milliseconds? Our entire API's based around timestamps
>> with time bases. Plus, we all know what happened when
>> Matroska settled onto milliseconds and ruined a perfectly
>> complex but good container.
>> Make this relative to the PTS field, with the same timebase
>> as the PTS field.
>> There's even a new AVFrame->time_base field for you to
>> set so you wouldn't forget it.
>>
>>
>> > +    /**
>> > +     * Number of items in the @ref subtitle_areas array.
>> > +     */
>> > +    unsigned num_subtitle_areas;
>> > +
>> > +    /**
>> > +     * Array of subtitle areas, may be empty.
>> > +     */
>> > +    AVSubtitleArea **subtitle_areas;
>> >
>>
>> There's no reason why this cannot be handled using the buffer
>> and data fields. If you need more space there, you're free to bump
>> the maximum number of pointers, plus this removes the horrid
>> malloc(1) hack. We've discussed this, and I couldn't follow why
>> this was bad in the email discussion you linked.
>>
>
>
> Hi Lynne,
>
> let me add another note on these two topics.
>
> One of the reasons why this subtitle filtering patchset has proceeded
> rather smoothly so far, causing only a small amount of regressions 
> that were easy to find and fix, is that I have kept as much as possible
> from the current AVSubtitle implementation, projecting it to the 
> new frame-based subtitle implementation, while keeping it mostly 
> compatible with the current logic.
> Interestingly, an earlier attempt towards subtitle filtering has
> gotten stuck right at the point about storing subtitle data in 
> AVFrame buffers. I decided to focus on functionality rather than
> academic kind of details. What has been in place and working
> for such a long time can't be so wrong that it can't be accepted.
>
> We also need to consider that it's not my patchset that is deliberately
> deciding to store data in pointers associate with rects/area and 
> to store start- and end-times in milliseconds: it's all the decoders
> and encoders that are doing this. I'm just adapting to the status
> quo.
>
> These two things - changing the timebase of the ms-fields and 
> changing the location of the buffer pointers - is a gigantic task 
> that would require to change everything that this patchset covers 
> and implements - plus even more code that this patchset hasn't even 
> touched. It would require changes to everything everywhere where
> subtitles are concerned.
> It is a task that is risky as hell, will cause plenty of 
> regressions under  guarantee, require extensive testing and might even 
> drive this whole endeavor into a dead end. 
>
> As I said before, please understand that I'm not ready to enter that
> path.
>

You don't have to break anything to make the start/end times proper
timestamps. We have timebases for that! If you'd like your timestamps
to have millisecond precision, you use a millisecond timebase!

So what if we use ASS internally which rounds timestamps? We have
bitmap formats too, and those are generally following the video
timestamps. You absolutely don't want the timestamps being rounded
in that case, or you risk having your subtitles be a frame late or
a frame early.

You could add a separate timebase for the subtitle start/end fields,
in case the packet's timebase doesn't match the subtitle format's
native timebase. I won't object to that.

You also don't have to risk that much by removing the subtitle_pts
field. Once the start/end times are converted to proper timestamps,
then the AVFrame->pts field can do whatever it wants to, whether
it mirrors the demuxer's timestamp, be constant, non-existent or even
go backwards.

If you just want to have an empty subtitle rect, you can simply
zero the subtitle struct's data and buffer fields.

Calling these issues too academic when the frame API is so very
straightforward and cut-and-paste is like calling computers
too academic when you have several thousand large numbers to add.

I'd really rather have a subtitle API that's usable, maintainable and easy
to extend, even if it's momentarily broken for certain corner cases
and weird files.
We can easily fix broken files one by one. We've got plenty of heuristics,
and have mostly abstracted and documented them well. We can add more.
We cannot remove cruft from an API that's designed to be crufty, however,
because the cruft is the only thing holding it together.