[FFmpeg-devel] [PATCH v20 02/20] avutil/frame: Prepare AVFrame for subtitle handling

Mon Dec 6 00:38:24 EET 2021

On Sun, Dec 5, 2021 at 6:58 PM Soft Works <softworkz at hotmail.com> wrote:
>
>
>
> > -----Original Message-----
> > From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of Lynne
> > Sent: Sunday, December 5, 2021 5:40 PM
> > To: FFmpeg development discussions and patches <ffmpeg-devel at ffmpeg.org>
> > Subject: Re: [FFmpeg-devel] [PATCH v20 02/20] avutil/frame: Prepare AVFrame
> > for subtitle handling
> >
> > 5 Dec 2021, 17:23 by softworkz at hotmail.com:
> >
> > > @@ -491,6 +499,39 @@ typedef struct AVFrame {
> > >  */
> > >  uint64_t channel_layout;
> > >
> > > +    /**
> > > +     * Display start time, relative to packet pts, in ms.
> > > +     */
> > > +    uint32_t subtitle_start_time;
> > > +
> > > +    /**
> > > +     * Display end time, relative to packet pts, in ms.
> > > +     */
> > > +    uint32_t subtitle_end_time;
> > >
> >
> > Milliseconds? Our entire API's based around timestamps
> > with time bases. Plus, we all know what happened when
> > Matroska settled onto milliseconds and ruined a perfectly
> > complex but good container.
> > Make this relative to the PTS field, with the same timebase
> > as the PTS field.
> > There's even a new AVFrame->time_base field for you to
> > set so you wouldn't forget it.
>
> The internal format for text subtitles is ASS, and this is
> using a timebase of milliseconds.
>
> All existing decoders and encoders are using this and I'm
> afraid, but I will not go and change them all.
>
> > > +    /**
> > > +     * Number of items in the @ref subtitle_areas array.
> > > +     */
> > > +    unsigned num_subtitle_areas;
> > > +
> > > +    /**
> > > +     * Array of subtitle areas, may be empty.
> > > +     */
> > > +    AVSubtitleArea **subtitle_areas;
> > >
> >
> > There's no reason why this cannot be handled using the buffer
> > and data fields. If you need more space there, you're free to bump
> > the maximum number of pointers, plus this removes the horrid
> > malloc(1) hack. We've discussed this, and I couldn't follow why
> > this was bad in the email discussion you linked.
>
> There are reasons. Some of those had I mentioned in an earlier
> discussion with Hendrik.
> The effort alone to relate the buffers to subtitle areas (which area
> 'owns' which buffer) is not reasonable. Too many cases to consider,
> what's when there are 3 areas and the second area doesn't have any
> buffer? The convention is that the buffer should be used contiguously.
> Managing those relations is error-prone and would require a lot of code.
>
> > > +    /**
> > > +     * Usually the same as packet pts, in AV_TIME_BASE.
> > > +     *
> > > +     * @deprecated This is kept for compatibility reasons and corresponds
> > to
> > > +     * AVSubtitle->pts. Might be removed in the future.
> > > +     */
> > > +    int64_t subtitle_pts;
> > >
> >
> > I'm not going to accept a field which is instantly deprecated.
> > As we've discussed multiple times, please merge this into
> > the regular frame PTS field. We already have _2_ necessary
> > stard/end fields.
>
> --
>
> > I agree with this entirely. Even ignoring the fact that adding a new
> > field thats deprecated is instantly a disqualification, AVSubtitle had
> > one pts field, AVFrame already has one pts field - both are even
> > documented to have the same semantic. They should just contain the
> > exact same data, thats how you achieve compatibility, not by claiming
> > you need a new field for compatibility reasons.
> >
> > - Hendrik
>
> I think the mistake is to declare subtitle_pts as deprecated. I had
> added the deprecation at a very early point in time where I had still
> thought that it can be eliminated.
>
> Even though we are driving subtitle data through the graph attached
> to AVFrame, the behavior involved is very different from audio and
> video frames. Actually there's not one but many different ways how
> subtitle data can appear in a source and travel through a filtergraph:
>
> - Sometimes, subtitle events are muxed into a stream many seconds
>   ahead of display time. In this case, AVFrame.pts is the mux position
>   and AVFrame.subtitle_pts is the actual presentation time.
>   When filtering subtitles to modify something, it would be still desired
>   to retain the offset between mux time and display start
>

This information was impossible to convey in AVSubtitle, as it had
exactly one pts field, so where does it come from now?
Also, isn't that kind of offset what subtitle_start_time is for, a
delta on top of the pts?

Sounds like its just duplicating logic once again.

pts is also by definition and name the "presentation time stamp", if a
timestamp of another meaning can convey additional information, we do
have a dts field in AVFrame, for the decoding timestamp.

>   Now when splitcc sends such a repeated frame, it needs to adjust the
>   frame's pts in order to match the configured output framerate.
>   But that's for keeping the graph running, the actual display start
>   time (subtitle_pts) must not be changed.

This sounds like an implementation detail, and not something we should
have in a core public structure like AVFrame.

- Hendrik