[FFmpeg-devel] [PATCH] avcodec/bsf: restructure the internal implementation of the bistream filter API

Fri Apr 17 21:49:15 EEST 2020

James Almer:
> Process input data as soon as it's fed to av_bsf_send_packet(), instead of
> storing a single packet and expecting the user to call av_bsf_receive_packet()
> in order to trigger the decoding process before they are allowed to feed more
> data.
> 
> This puts the bsf API more in line with the decoupled decode API, without
> breaking existing code using it.
> 
> Signed-off-by: James Almer <jamrial at gmail.com>
> ---
> Benefits from this change include less constrains to how to use the API (Now
> you can feed as many packets as the filter will accept, including the flush
> call, before attempting to fetch the first output packet), and actually
> honoring the part of the doxy for av_bsf_receive_packet() that says that pkt is
> not touched on failure.

Given that the packet is supposed to be blank on input, its content will
be preserved even on error. Granted, memcmp() might show same
differences, but I'd rather see this (non-)issue eliminated by stating
that the packet will be blank on error instead of untouched.
For this patch this means that one can directly return
ctx->filter->filter(ctx, pkt) in av_bsf_receive_packet().
> 
> Drawback from this change is that now all bsfs will always receive non-writable
> packets, so filters like noise and mpeg4_unpack_bframes will not be able to
> do their work in-place anymore. This is because av_bsf_send_packet() will now
> trigger the filtering process, and by passing the input packet reference to the
> underlying bsf, any alteration in case of failure would go against the doxy
> statement that pkt is untouched on such scenarios. So a new reference must be
> used instead.
> 
This IMO underestimates the consequences of this change for bsfs that
don't care about the writability of the packets. It makes a bsf thought
to be a simple pass-through bsf considerably more expensive. This
applies to e.g. vp9_superframe which is auto-inserted (without checking
the actual content) for muxing VP9 and it would also apply to the
proposed pgs_frame_merge bsf.

Of course, the null bsf is another one of those supposedly free bsfs and
that's why it's added in ff_decode_bsfs_init() to every decoder if there
is no other bsf to add.

Furthermore, this would also apply to callers that have no use for the
packet besides sending it to a bsf (and that would simply unref it on
failure). Given that (on success) ownership passes to the bsf this
probably includes most current callers.

In other words: I don't like this patch.

I see two solutions for this. The first is outlined at the end of this
mail, but I don't think it is acceptable because it probably amounts to
an API break. The second is an idea off the top of my head. It might be
bullshit.

The second solution mitigates this by adding a new function
av_set_ownership_status() or so. It allows one to send updated
information to the bsf to override the behaviour currently documented in
av_bsf_send/receive_packet() (the default behaviour would of course stay
unchanged). With av_set_ownership_status() one can indicate whether
a) the bsf should not take ownership of the reference at all, but always
make a copy,
b) the bsf should leave the packet untouched in case of an error, but
retain ownership of the reference sent to it in case of success.
c) the bsf should take take ownership of the packet even on error (and
unref the packet sent), unless the error is caused by the bsf already
being full (i.e. if the error would be AVERROR(EAGAIN))
d) the bsf takes ownership of the packet it received in the moment it
consumes the packet, i.e. buffers it internally (your proposed code
would try to return an unmodified packet to the caller if
ctx->filter->filter() fails which is problematic (see the discussion at
the end)).

The new status would apply to all future calls to av_bsf_send_packet().

The same could also be done for avcodec_send_packet() and
avcodec_send_frame() (in which case the function would be allowed to
cast the const from the AVPacket/AVFrame away). One could even use one
function for all three -- one can distinguish AVBSFContext and
AVCodecContext via the AVClass.

This would also allow an easy way to solve the problem that in the
noninterleaved codepath in libavformat/mux.c the input packet is
destroyed when a bsf is autoinserted despite the function not taking
ownership of the packet: In the noninterleaved codepath, we would choose
option a) (this should be done when the bsf is added); in the
interleaved codepath, we would choose option c).

>  libavcodec/avcodec.h |  6 +---
>  libavcodec/bsf.c     | 68 +++++++++++++++++++++++++++-----------------
>  2 files changed, 43 insertions(+), 31 deletions(-)
> 
> diff --git a/libavcodec/avcodec.h b/libavcodec/avcodec.h
> index 55151a0b71..e60f2ac1ce 100644
> --- a/libavcodec/avcodec.h
> +++ b/libavcodec/avcodec.h
> @@ -4720,10 +4720,6 @@ int av_bsf_init(AVBSFContext *ctx);
>  /**
>   * Submit a packet for filtering.
>   *
> - * After sending each packet, the filter must be completely drained by calling
> - * av_bsf_receive_packet() repeatedly until it returns AVERROR(EAGAIN) or
> - * AVERROR_EOF.
> - *
>   * @param pkt the packet to filter. The bitstream filter will take ownership of
>   * the packet and reset the contents of pkt. pkt is not touched if an error occurs.
>   * If pkt is empty (i.e. NULL, or pkt->data is NULL and pkt->side_data_elems zero),

In your changes to av_bsf_send_packet(), you explicitly filter the
return value AVERROR(EAGAIN) out among the return values of the filter,
so that if the bsf has been completely drained before this call, it does
not output AVERROR(EAGAIN). Yet you do not document this whereas
avcodec_send_packet() does:

AVERROR(EAGAIN):   input is not accepted in the current state - user
                   must read output with avcodec_receive_frame() (once
                   all output is read, the packet should be resent, and
                   the call will not fail with EAGAIN).

> @@ -4755,7 +4751,7 @@ int av_bsf_send_packet(AVBSFContext *ctx, AVPacket *pkt);
>   * an error occurs.
>   *
>   * @note one input packet may result in several output packets, so after sending
> - * a packet with av_bsf_send_packet(), this function needs to be called
> + * a packet with av_bsf_send_packet(), this function may need to be called
>   * repeatedly until it stops returning 0. It is also possible for a filter to
>   * output fewer packets than were sent to it, so this function may return
>   * AVERROR(EAGAIN) immediately after a successful av_bsf_send_packet() call.
> diff --git a/libavcodec/bsf.c b/libavcodec/bsf.c
> index 7b96183e64..97d86beb6f 100644
> --- a/libavcodec/bsf.c
> +++ b/libavcodec/bsf.c
> @@ -28,7 +28,8 @@
>  #include "bsf.h"
>  
>  struct AVBSFInternal {
> -    AVPacket *buffer_pkt;
> +    AVPacket *in_pkt;
> +    AVPacket *out_pkt;
>      int eof;
>  };
>  
> @@ -45,8 +46,10 @@ void av_bsf_free(AVBSFContext **pctx)
>      if (ctx->filter->priv_class && ctx->priv_data)
>          av_opt_free(ctx->priv_data);
>  
> -    if (ctx->internal)
> -        av_packet_free(&ctx->internal->buffer_pkt);
> +    if (ctx->internal) {
> +        av_packet_free(&ctx->internal->in_pkt);
> +        av_packet_free(&ctx->internal->out_pkt);
> +    }
>      av_freep(&ctx->internal);
>      av_freep(&ctx->priv_data);
>  
> @@ -110,8 +113,9 @@ int av_bsf_alloc(const AVBitStreamFilter *filter, AVBSFContext **pctx)
>      }
>      ctx->internal = bsfi;
>  
> -    bsfi->buffer_pkt = av_packet_alloc();
> -    if (!bsfi->buffer_pkt) {
> +    bsfi->in_pkt = av_packet_alloc();
> +    bsfi->out_pkt = av_packet_alloc();
> +    if (!bsfi->in_pkt || !bsfi->out_pkt) {
>          ret = AVERROR(ENOMEM);
>          goto fail;
>      }
> @@ -183,7 +187,8 @@ void av_bsf_flush(AVBSFContext *ctx)
>  
>      bsfi->eof = 0;
>  
> -    av_packet_unref(bsfi->buffer_pkt);
> +    av_packet_unref(bsfi->in_pkt);
> +    av_packet_unref(bsfi->out_pkt);
>  
>      if (ctx->filter->flush)
>          ctx->filter->flush(ctx);
> @@ -204,21 +209,38 @@ int av_bsf_send_packet(AVBSFContext *ctx, AVPacket *pkt)
>          return AVERROR(EINVAL);
>      }
>  
> -    if (bsfi->buffer_pkt->data ||
> -        bsfi->buffer_pkt->side_data_elems)
> +    if (bsfi->in_pkt->data ||
> +        bsfi->in_pkt->side_data_elems)
>          return AVERROR(EAGAIN);
>  
> -    ret = av_packet_make_refcounted(pkt);
> +    ret = av_packet_ref(bsfi->in_pkt, pkt);
>      if (ret < 0)
>          return ret;
> -    av_packet_move_ref(bsfi->buffer_pkt, pkt);
> +
> +    if (!bsfi->out_pkt->data && !bsfi->out_pkt->side_data_elems) {
> +        ret = ctx->filter->filter(ctx, bsfi->out_pkt);
> +        if (ret < 0 && ret != AVERROR(EAGAIN) && ret != AVERROR_EOF)
> +            return ret;
> +    }
> +
> +    av_packet_unref(pkt);

Consider the following scenario: You send a packet to a bsf and in the
first call to ctx->filter->filter (happening here above) the bsf take
bsf->in_pkt; let's presume this call is successfull and generates an
out_pkt. Then the caller collects his packet with
av_bsf_receive_packet(). In the current API, the caller would now have
to call av_bsf_receive_packet() until it returns AVERROR(EAGAIN) before
sending another packet. Yet in your new API, the caller may send a new
packet immediately and in this scenario the filter will be called again.

Let's presume the first packet contained enough data for more than one
output packet, so that the bsf will first use the data is has already
cached internally (an example of this is Marton's proposed
pcm_rechunk_bsf) without collecting the new input packet. Let's also
presume that the actual filtering will fail. Then the caller still has
his packet, yet a copy/reference also exists in in_pkt.

What is the caller now supposed to do with the packet he has? Submit it
again for filtering? In the scenario outlined above, this would mean
that a packet is effectively sent twice to the bsf. It would also mean
that for an ordinary one-in-one-out bsf an invalid packet might lead to
an infinite loop if the caller tries to resend it again and again.

If I were to design a new API from scratch, my answer would be: The way
the caller can see if the packet has been accepted is by looking at
whether the packet has been consumed (i.e. whether the returned packet
is blank). This would of course be coupled with "on success the packet
has been consumed" and "if it has not been consumed, the packet is
untouched".
In this case, there would be no need to create new references: We take
ownership of the reference as soon as we know that
av_packet_make_refcounted() succeeds.

- Andreas