[FFmpeg-devel] [PATCH v16 01/16] global: Prepare AVFrame for subtitle handling

Sat Nov 27 10:19:36 EET 2021

Soft Works:
> 
> 
>> -----Original Message-----
>> From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of Andreas
>> Rheinhardt
>> Sent: Friday, November 26, 2021 11:35 AM
>> To: ffmpeg-devel at ffmpeg.org
>> Subject: Re: [FFmpeg-devel] [PATCH v16 01/16] global: Prepare AVFrame for
>> subtitle handling
> 
> Hi Andreas,
> 
> thanks for the detailed review. There was hardly anything to object or disagree.
> Everything I don't respond to is implemented as suggested.
> 
>> As has already been said by others, this should be split: One patch for
>> the actual new libavutil additions, one patch for the lavc decoding API,
>> one patch for the encoding API, one patch for every user that is made to
>> use the new APIs.
> 
> By keeping the AVSubtitle definition in its original location, it was 
> possible to split this as suggested on IRC.
> Thanks.
> 
>> This is not how you deprecate something: You have to add a @deprecated
>> doxygen comment as well as the attribute_deprecated to make some noise.
> 
> I didn’t want the noise before things get serious ;-)
> 
>>
>> Actually, you should deprecate the whole AVSubtitle API.
> 
> Done. Do we deprecate structs as well?
> 
> Also, there's a function avcodec_get_subtitle_rect_class().
> 
> I have no idea whether it ever had any practical use. A similar function
> above (avcodec_get_frame_class) is deprecated.
> 

It never had any practical use. AVSubtitleRect elements can just be set
directly. I always pondered deprecating it, but was somehow too lazy.

> 
>>> +/**
>>> + * Return subtitle format from a codec descriptor
>>> + *
>>> + * @param codec_descriptor codec descriptor
>>> + * @return                 the subtitle type (e.g. bitmap, text)
>>> + */
>>> +enum AVSubtitleType av_get_subtitle_format_from_codecdesc(const
>> AVCodecDescriptor *codec_descriptor);
>>
>> Do we need this function? It seems too trivial and as Anton has pointed
>> out is flawed. And anyway, this would belong into codec_desc.h.
> 
> I had replied to Anton why it's not flawed. Moved it to codec_desc and
> named it as Anton had suggested.
> 
> 
>>>  int attribute_align_arg avcodec_send_packet(AVCodecContext *avctx, const
>> AVPacket *avpkt)
>>>  {
>>>      AVCodecInternal *avci = avctx->internal;
>>> @@ -590,6 +618,9 @@ int attribute_align_arg
>> avcodec_send_packet(AVCodecContext *avctx, const AVPacke
>>>      if (avpkt && !avpkt->size && avpkt->data)
>>>          return AVERROR(EINVAL);
>>>
>>> +    if (avctx->codec_type == AVMEDIA_TYPE_SUBTITLE)
>>> +        return decode_subtitle_shim(avctx, avci->buffer_frame, avpkt);
>>
>> 1. This is missing a check whether buffer_frame is empty or not; in the
>> latter case, you need to return AVERROR(EAGAIN); instead
>> decode_subtitle_shim() destroys what is already there.
>> (Said check is currently offloaded to the BSF API in the audio/video
>> codepath. This is actually a violation of the BSF API.)
> 
> Added that check.
> 
>> 2. Flushing is not really implemented well:
>> a) It is legal to call avcodec_send_packet() with avpkt == NULL to
>> indicate EOF. If I am not mistaken, decode_subtitle2_priv() will just
>> crash in this case.
> 
> Like all subtitle decoders would when supplying a NULL packet.
> What should I return when a null packet is supplied?
> EAGAIN or EOF?
> 

Treat is as an ordinary flush packet and abide by the docs of
avcodec_send_packet(): "Sending the first flush packet will return
success. Subsequent ones are unnecessary and will return AVERROR_EOF."
(avcodec_send_packet() by the way does not abide by its own
documentation: Packets with side data are never considered flush
packets. And in case a BSF outputs multiple packets for an input packet
or a decoder outputs multiple frames for an input packet, sending a
flush packet can actually return errors; furthermore, I also don't see
that subsequent flush packets always return EOF. Will look into this.)

>> b) avcodec_receive_frame() only returns what is in buffer_frame; it
>> never calls decode_subtitle2_priv() or the underlying decode function at
>> all. This basically presumes that a subtitle decoder can only return one
>> subtitle after flushing. I don't know whether this is true for our
>> decoders with the delay cap.
> 
> The only one I could see with that flag is cc_dec, which is a special case
> anyway as it doesn't work on regular streams.

You missed libzvbi.

> (even this one would crash when providing a NULL packet)

Of course they do. The subtitle API treats packets with size == 0 as
flush packets.

> 
>> c) avcodec_receive_frame() seems to never return AVERROR_EOF after
>> flushing, but always AVERROR(EAGAIN) (if the delay cap is set, the first
>> call to avcodec_receive_frame() after flushing can also result in a
>> frame being returned). Yet this makes no sense and is IMO an API
>> violation on lavc's part.
>> (While we have subtitle decoders with the delay cap, I don't know
>> whether this is actually tested by fate and whether all code actually
>> flushes subtitle decoders at all.)
> 
> I don't think so.
> 
> 
>>> diff --git a/libavfilter/vf_subtitles.c b/libavfilter/vf_subtitles.c
>>> index 377160c72b..6b54cf669b 100644
>>> --- a/libavfilter/vf_subtitles.c
>>> +++ b/libavfilter/vf_subtitles.c
>>> @@ -35,14 +35,12 @@
>>>  # include "libavformat/avformat.h"
>>>  #endif
>>>  #include "libavutil/avstring.h"
>>> -#include "libavutil/imgutils.h"
>>>  #include "libavutil/opt.h"
>>>  #include "libavutil/parseutils.h"
>>>  #include "drawutils.h"
>>>  #include "avfilter.h"
>>>  #include "internal.h"
>>>  #include "formats.h"
>>> -#include "video.h"
>>>
>>>  typedef struct AssContext {
>>>      const AVClass *class;
>>> @@ -292,6 +290,29 @@ static int attachment_is_font(AVStream * st)
>>>      return 0;
>>>  }
>>>
>>> +static int decode(AVCodecContext *avctx, AVFrame *frame, int *got_frame,
>> AVPacket *pkt)
>>> +{
>>> +    int ret;
>>> +
>>> +    *got_frame = 0;
>>
>> You don't really need this: You could just return 0 if no frame was
>> returned, 1 if a frame was returned and < 0 on (actual, not EAGAIN) errors.
>>
>>> +
>>> +    if (pkt) {
>>> +        ret = avcodec_send_packet(avctx, pkt);
>>> +        // In particular, we don't expect AVERROR(EAGAIN), because we read
>> all
>>> +        // decoded frames with avcodec_receive_frame() until done.
>>
>> Where do you do this? For this one would need to call
>> avcodec_receive_frame() in a loop until it returns EAGAIN, but you don't
>> do this.
> 
> This decode() function is copied from ffmpeg.c
> 

>From the documentation of decode() in ffmpeg.c: "if you got a frame, you
must call it again with pkt=NULL." Notice that transcode_subtitles() is
not called in a loop, which means that the subtitle decoding API is not
properly drained. The check would need to be modified to "if (repeating
&& pkt)". I wonder whether this would really change anything.

> 
>>> +static int get_subtitle_buffer(AVFrame *frame)
>>> +{
>>> +    // Buffers in AVFrame->buf[] are not used in case of subtitle frames.
>>> +    // To accomodate with existing code, checking ->buf[0] to determine
>>> +    // whether a frame is ref-counted or has data, we're adding a 1-byte
>>> +    // buffer here, which marks the subtitle frame to contain data.
>>
>> This is a terrible hack that might be necessary for now. But I don't see
>> anything
> 
> I'm not sure about the second half sentence, but what I can say is that I 
> had taken the time to try and replace the checks for ->buf[0] to a frame
> field, but it turned out that this change would be so big that it cannot
> be part of the subtitle patchset.
> (also talked about this with Hendrik a while ago)
> 
> 
> 
>>> +    for (i = 0; i < frame->num_subtitle_areas; i++) {
>>> +        area = frame->subtitle_areas[i];
>>> +        if (area) {
>>
>> Who sets an incorrect num_subtitle_areas?
> 
> Maybe code that hasn't gotten reviewed by you ;-)
> 
>>> +    /**
>>> +     * Display start time, relative to packet pts, in ms.
>>> +     */
>>> +    uint32_t subtitle_start_time;
>>> +
>>> +    /**
>>> +     * Display end time, relative to packet pts, in ms.
>>> +     */
>>> +    uint32_t subtitle_end_time;
>>
>> Hardcoding a millisecond timestamp precision into the API doesn't seem
>> good. As does using 32bit.
> 
> Our internal text subtitle format is defined to be ASS and those
> variables are reflecting just that.
> 

But the timing of ASS (both when in AVPackets as well when in
AVSubtitles) is external to the ASS data; so there is no need to impose
this ms constraint on the whole API (even on non-ASS subtitles).

> 
>>> +/**
>>> + * Copies subtitle data from AVFrame to AVSubtitle.
>>> + *
>>> + * @deprecated This is a compatibility method for interoperability with
>>> + * the legacy subtitle API.
>>> + */
>>> +void av_frame_get_subtitle(AVSubtitle *sub, AVFrame *frame);
>>
>> 1. Missing const.
>> 2. Wrong return type.
>> 3. I don't see a need for this compatibility at all. As explained below,
>> the only user that needs something like this is libavcodec and then it
>> should live in libavcodec.
>> The last two points apply to both av_frame_get_subtitle and
>> av_frame_put_subtitle. Notice that the only use of any of these
>> functions outside of libavcodec is remove later again (patch #8 adds an
>> av_frame_get_subtitle() in ffmpeg.c, #16 removes it again) which shows
>> that such a compatibility stuff is not needed outside of libavcodec.
> 
> Moved to avcodec as ff_ functions.
> 
> 
>>> +    uint32_t pal[256]; ///< RGBA palette for the bitmap.
>>> +
>>> +    char *text;        ///< 0-terminated plain UTF-8 text
>>> +    char *ass;         ///< 0-terminated ASS/SSA compatible event line.
>>> +
>>> +} AVSubtitleArea;
>>
>> I presume that sizeof(AVSubtitleArea) is not supposed to be part of the
>> public API/ABI (after all, you used an array of pointers to
>> AVSubtitleArea in AVFrame). This needs to be documented.
> 
> Makes sense, but I'm not sure how and where?
> 

Whether sizeof(struct foo) is public or not is typically documented
directly before the definition of said struct.
Notice that latter patches of yours put AVSubtitleArea on the stack in
lavc, so you seem to want to make its sizeof fixed. Or you just didn't
know what doing this implied.

> 
> 
>>> +/**
>>> + * Return a subtitle format corresponding to name, or AV_SUBTITLE_FMT_NONE
>>> + * on error.
>>> + *
>>> + * @param name Subtitle format name.
>>> + */
>>> +enum AVSubtitleType av_get_subtitle_fmt(const char *name);
>>
>> Do we need this? Given that there are many possible ways to name the
>> subtitles, it doesn't seem useful to me. The only way to be sure to use
>> the "authoritative" name for a subtitle format is by
>> av_get_subtitle_fmt_name(). But then one doesn't need
>> av_get_subtitle_fmt() at all.
> 
> I've tried to implement everything analog to the functionality we have
> for audio and video. This corresponds to av_get_pix_fmt().
> 
> 
>>> + */
>>> +int av_subtitle_area2area(AVSubtitleArea *dst, const AVSubtitleArea *src,
>> int copy_data);
>>
>> This documentation does not mention that dst should be "blank". But this
>> actually poses a problem: How does the user know that an AVSubtitleArea
>> is blank given that ? For AVFrames the answer is this: An AVFrame is
>> blank if it comes directly from av_frame_alloc() or av_frame_unref().
>> But there is no equivalent for these functions here.
>> Furthermore, is this API supposed to be a standalone API or is
>> AVSubtitleArea something that is always supposed to be part of an
>> AVFrame? The usage in this patchset suggests the latter:
>> av_subtitle_area2area() is only used in frame.c.
> 
> AVSubtitleArea is always supposed to be part of an AVFrame.

Then you should have a chat with the author of patch 15 that puts an
AVSubtitleArea on the stack in dvdsubenc.c.
More seriously, this is an important point.

> I had thought the copy function might be useful outside, but I didn't 
> need it in any of the filter.
> Made it private in AVFrame now.
> 
> 
>>> +/**
>>> + * Copy data from @ref AVSubtitleArea to @ref AVSubtitleRect.
>>> + *
>>> + * @param dst        The target rect (@ref AVSubtitleRect).
>>> + * @param src        The source area (@ref AVSubtitleArea).
>>> + *
>>> + * @return 0 on success.
>>> + *
>>> + * @deprecated This is a compatibility method for interoperability with
>>> + * the legacy subtitle API.
>>> + */
>>> +int av_subtitle_area2rect(AVSubtitleRect *dst, const AVSubtitleArea *src);
>>> +
>>> +/**
>>> + * Copy data from @ref AVSubtitleRect to @ref AVSubtitleArea.
>>> + *
>>> + * @param dst        The source area (@ref AVSubtitleArea).
>>> + * @param src        The target rect (@ref AVSubtitleRect).
>>> + *
>>> + * @return 0 on success.
>>> + *
>>> + * @deprecated This is a compatibility method for interoperability with
>>> + * the legacy subtitle API.
>>> + */
>>> +int av_subtitle_rect2area(AVSubtitleArea *dst, const AVSubtitleRect *src);
>>
>> Do we need compatibility with the legacy API at all? I only see two
>> usecases for this:
>>
>> 1. One has a huge project using AVSubtitles that uses AVSubtitle so
>> pervasively that one can't transition to the new API in one step. But I
>> doubt that that's a thing.
>> 2. One is forced to provide compatibility with the legacy API while also
>> providing an API based upon the new API. This is of course the situation
>> with libavcodec, which for this reason needs such compatibility
>> functions (in libavcodec!). But I don't see a second user with the same
>> needs, so I don't see why these functions (which would actually need to
>> be declared as deprecated from the very beginning if public) should be
>> public.
> 
> Made those private in avcodec.
>