[FFmpeg-devel] ffmpeg -af, libavfilter

Fri Feb 24 19:13:33 CET 2012

On date Tuesday 2012-02-21 17:44:28 +0100, Clément Bœsch encoded:
> Hi folks,
> 
> This is the 3rd attempt to get the -af option in ffmpeg (Hemanth → Stefano →
> Clément). And guess what, it's still a work in progress because there are still

You forget Mina ;).

> a few issues to fix:
> 
>  * -async option is now disabled
>  * -map_channel is now disabled
>  * audio resampling filter doesn't flush
>  * on-the-fly resampling won't work anymore
> 
> The main reason of all of this is that the current do_audio_out() in ffmpeg.c
> implements its own resampling, audio compensation and related stuff. And this
> code is using a lot the audio decode context. This is a problem because the
> audio buffer in the decoded frame now always contain the data which goes out the
> filtergraph, and not the data just after the decode.
> 
> Example with a channel layout change:
> 
> Current:
> 
>   dec      do_audio_out()      enc
>  [6ch] --- [6ch ~~~~ 2ch] --- [2ch]
> 
> With filtergraph system (always active, even without specifying -af):
> 
>   dec       filtergraph       do_audio_out()      enc
>  [6ch] --- [6ch ~~~~ 2ch] --- [2ch ~~~~ 2ch] --- [2ch]
> 
> So now, do_audio_out() is supposed to receive a sanitized input (which means
> exactly the same sample rate, number of channels, and format as the output) all
> the time, and thus can't do the audio sync anymore, as well as -map_channel;
> the heuristics in do_audio_out() make excessive use of the decode context.
> 
> I can deal with -map_channel: I would "just" have to insert af_pan filters (and
> af_amerge to finally support the merge) at the end of the filtergraph, just like
> we could do for the volume option (with af_volume). So I'll be working on this
> in the next days.

> On the other hand, I don't think I'll be able to correctly deal with the -async
> option, which we should somehow integrates to libavfilter; anyone wants to do
> that?

What should the filter exactly do? (I suppose it would be a A|V -> A|V
filter).

> 
> Concerning the audio resampling filter which doesn't flush, as Michael said on
> IRC, it's likely to be handled in a request_frame() callback in af_aresample.
> 

> Last issue is the on-the-fly resampling heuristics (check resample_changed).
> Libavfilter is not able to deal with that, right? How are we supposed to
> workaround this?

This is dynamic reconfiguration, I keep talking about this since ages
but I never implement it, maybe we could add it to the GSoC task, some
brainstorming on it may help so we have at least a clear design idea.

Currently the abuffersrc source works by normalizing input (same for
the source video buffer).

> Hopefully, fixing these issues will allow to replace a lot of audio "hacks" in
> ffmpeg.c (and we should consider make a hard dependency on libavfilter to get
> rid of all the old code).

Are there still use cases when it makes sense to compile ffmpeg
without libavfilter? Benchmarks? But I agree we should get rid of the
non-lavfi path if keeping it is making the code more harder to
maintain.

> Anyone motivated to lend me a hand on all of this?
> 
> -- 
> Clément B.

> From 22e02b3b5be58f4e28395a2ff10bf65269da2d8a Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?Cl=C3=A9ment=20B=C5=93sch?= <clement.boesch at smartjog.com>
> Date: Tue, 14 Feb 2012 17:00:53 +0100
> Subject: [PATCH 1/2] lavfi/WIP: add
>  avfilter_fill_frame_from_audio_buffer_ref().
> 
> ---
>  libavfilter/avcodec.c |   14 ++++++++++++++
>  libavfilter/avcodec.h |   11 +++++++++++
>  2 files changed, 25 insertions(+), 0 deletions(-)
> 
> diff --git a/libavfilter/avcodec.c b/libavfilter/avcodec.c
> index 2850c4d..5fea5b7 100644
> --- a/libavfilter/avcodec.c
> +++ b/libavfilter/avcodec.c
> @@ -56,6 +56,20 @@ AVFilterBufferRef *avfilter_get_video_buffer_ref_from_frame(const AVFrame *frame
>      return picref;
>  }
>  
> +int avfilter_fill_frame_from_audio_buffer_ref(AVFrame *frame,
> +                                              const AVFilterBufferRef *samplesref)
> +{
> +    if (!samplesref || !samplesref->audio || !frame)
> +        return AVERROR(EINVAL);
> +
> +    memcpy(frame->data, samplesref->data, sizeof(frame->data));
> +    frame->pkt_pos    = samplesref->pos;
> +    frame->format     = samplesref->format;
> +    frame->nb_samples = samplesref->audio->nb_samples;
> +
> +    return 0;
> +}
> +
>  int avfilter_fill_frame_from_video_buffer_ref(AVFrame *frame,
>                                                const AVFilterBufferRef *picref)
>  {
> diff --git a/libavfilter/avcodec.h b/libavfilter/avcodec.h
> index 22dd1a2..36ab917 100644
> --- a/libavfilter/avcodec.h
> +++ b/libavfilter/avcodec.h
> @@ -47,6 +47,17 @@ int avfilter_copy_frame_props(AVFilterBufferRef *dst, const AVFrame *src);
>  AVFilterBufferRef *avfilter_get_video_buffer_ref_from_frame(const AVFrame *frame, int perms);
>  
>  /**
> + * Fill an AVFrame with the information stored in samplesref.
> + *
> + * @param frame an already allocated AVFrame
> + * @param samplesref an audio buffer reference
> + * @return 0 in case of success, a negative AVERROR code in case of
> + * failure
> + */
> +int avfilter_fill_frame_from_audio_buffer_ref(AVFrame *frame,
> +                                              const AVFilterBufferRef *samplesref);
> +
> +/**
>   * Fill an AVFrame with the information stored in picref.
>   *
>   * @param frame an already allocated AVFrame
> -- 
> 1.7.9

OK, but maybe we should unify the A/V API and have a single
avfilter_fill_frame_from_buffer_ref().

> From 6f8dabe1fa5e3611134b7a20063d691d1c429437 Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?Cl=C3=A9ment=20B=C5=93sch?= <clement.boesch at smartjog.com>
> Date: Tue, 7 Feb 2012 09:36:40 +0100
> Subject: [PATCH 2/2] ffmpeg/WIP: add -af option.
> 

> Based on a patch by Stefano which itself is based on a patch by Hemanth.

Based on a patch by Mina Nagy Zaky which was based on a patch by
Stefano which itself is based on a patch by Hemanth.

> ---
>  ffmpeg.c |  185 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
>  1 files changed, 175 insertions(+), 10 deletions(-)
> 
> diff --git a/ffmpeg.c b/ffmpeg.c
> index 8d727a3..0da8461 100644
> --- a/ffmpeg.c
> +++ b/ffmpeg.c
> @@ -62,6 +62,7 @@
>  # include "libavfilter/buffersink.h"
>  # include "libavfilter/buffersrc.h"
>  # include "libavfilter/vsrc_buffer.h"
> +# include "libavfilter/asrc_abuffer.h"
>  #endif
>  
>  #if HAVE_SYS_RESOURCE_H
> @@ -286,9 +287,13 @@ typedef struct OutputStream {
>  #if CONFIG_AVFILTER
>      AVFilterContext *output_video_filter;
>      AVFilterContext *input_video_filter;
> +    AVFilterContext *output_audio_filter;
> +    AVFilterContext *input_audio_filter;
>      AVFilterBufferRef *picref;
> +    AVFilterBufferRef *samplesref;
>      char *avfilter;
>      AVFilterGraph *graph;
> +    AVFilterGraph *agraph;
>  #endif
>  
>      int64_t sws_flags;
> @@ -703,6 +708,96 @@ static int configure_video_filters(InputStream *ist, OutputStream *ost)
>  
>      return 0;
>  }
> +
> +static int configure_audio_filters(InputStream *ist, OutputStream *ost)
> +{
> +    AVFilterContext *last_filter, *filter;
> +    AVCodecContext * const icodec = ist->st->codec;
> +    AVCodecContext * const ocodec = ost->st->codec;
> +    const AVFilterLink *outlink;
> +
> +    const enum AVSampleFormat sample_fmts[] = { AV_SAMPLE_FMT_S16, -1 };
> +    const int packing_fmts[]                = { AVFILTER_PACKED, -1 };
> +    const int64_t *chlayouts                = avfilter_all_channel_layouts;
> +    AVABufferSinkParams *abuffersink_params;
> +
> +    char args[255];
> +    int ret;
> +
> +    ost->agraph = avfilter_graph_alloc();
> +    if (!ost->agraph)
> +        return AVERROR(ENOMEM);
> +
> +    /* input */
> +    if (!icodec->channel_layout)
> +        icodec->channel_layout = av_get_default_channel_layout(icodec->channels);
> +    snprintf(args, sizeof(args), "%d:%d:0x%"PRIx64":packed",
> +             icodec->sample_rate, icodec->sample_fmt, icodec->channel_layout);
> +    ret = avfilter_graph_create_filter(&ost->input_audio_filter,
> +                                       avfilter_get_by_name("abuffer"), "asrc",
> +                                       args, NULL, ost->agraph);
> +    if (ret < 0)
> +        return ret;
> +
> +    /* output  */
> +    abuffersink_params = av_abuffersink_params_alloc();
> +    abuffersink_params->sample_fmts     = sample_fmts;
> +    abuffersink_params->channel_layouts = chlayouts;
> +    abuffersink_params->packing_fmts    = packing_fmts;
> +    ret = avfilter_graph_create_filter(&ost->output_audio_filter,
> +                                       avfilter_get_by_name("abuffersink"), "aout", NULL,
> +                                       abuffersink_params, ost->agraph);
> +    if (ret < 0)
> +        return ret;
> +
> +    /* auto insert resampling in case of sample rate mismatch */
> +    last_filter = ost->input_audio_filter;
> +    if (icodec->sample_rate != ocodec->sample_rate) {
> +        snprintf(args, sizeof(args), "%d", ocodec->sample_rate);
> +        if ((ret = avfilter_graph_create_filter(&filter, avfilter_get_by_name("aresample"),
> +                                                NULL, args, NULL, ost->agraph)) < 0)
> +            return ret;
> +        if ((ret = avfilter_link(last_filter, 0, filter, 0)) < 0)
> +            return ret;
> +        last_filter = filter;
> +    }
> +
> +    /* insert user filter (-af) */
> +    if (ost->avfilter) {
> +        AVFilterInOut *outputs = avfilter_inout_alloc();
> +        AVFilterInOut *inputs  = avfilter_inout_alloc();
> +
> +        outputs->name       = av_strdup("in");
> +        outputs->filter_ctx = last_filter;
> +        outputs->pad_idx    = 0;
> +        outputs->next       = NULL;
> +
> +        inputs->name        = av_strdup("out");
> +        inputs->filter_ctx  = ost->output_audio_filter;
> +        inputs->pad_idx     = 0;
> +        inputs->next        = NULL;
> +
> +        if ((ret = avfilter_graph_parse(ost->agraph, ost->avfilter, &inputs, &outputs, NULL)) < 0)
> +            return ret;
> +        av_freep(&ost->avfilter);
> +    } else {
> +        if ((ret = avfilter_link(last_filter, 0, ost->output_audio_filter, 0)) < 0)
> +            return ret;
> +    }
> +
> +    if ((ret = avfilter_graph_config(ost->agraph, NULL)) < 0)
> +        return ret;
> +
> +    /* set output codec context with settings of the audio buffer sink */
> +    outlink = ost->output_audio_filter->inputs[0];
> +    ocodec->sample_rate    = outlink->sample_rate;
> +    ocodec->channel_layout = outlink->channel_layout;
> +    ocodec->channels       = av_get_channel_layout_nb_channels(outlink->channel_layout);
> +    ocodec->sample_fmt     = outlink->format;
> +
> +    return 0;
> +}

I wonder how much of this could be merged with the
configure_video_filters() code (and if that would be convenient from
the maintainance point of view).

> +
>  #endif /* CONFIG_AVFILTER */
>  
>  static void term_exit(void)
> @@ -1123,12 +1218,15 @@ static void do_audio_out(AVFormatContext *s, OutputStream *ost,
>                           InputStream *ist, AVFrame *decoded_frame)
>  {
>      uint8_t *buftmp;
> -    int64_t audio_buf_size, size_out;
> -
> -    int frame_bytes, resample_changed;
> +    int64_t size_out;
>      AVCodecContext *enc = ost->st->codec;
> -    AVCodecContext *dec = ist->st->codec;
>      int osize = av_get_bytes_per_sample(enc->sample_fmt);
> +
> +    /* FIXME: restore -map_channel and -async option when avfilter is on */
> +#if !CONFIG_AVFILTER
> +    int64_t audio_buf_size;
> +    int resample_changed;
> +    AVCodecContext *dec = ist->st->codec;
>      int isize = av_get_bytes_per_sample(dec->sample_fmt);
>      uint8_t *buf = decoded_frame->data[0];
>      int size     = decoded_frame->nb_samples * dec->channels * isize;
> @@ -1260,9 +1358,12 @@ need_realloc:
>              }
>          }
>      } else
> +#else
>          ost->sync_opts = lrintf(get_sync_ipts(ost, ist->pts) * enc->sample_rate) -
>                                  av_fifo_size(ost->fifo) / (enc->channels * osize); // FIXME wrong
> +#endif
>  
> +#if !CONFIG_AVFILTER
>      if (ost->audio_resample || ost->audio_channels_mapped) {
>          buftmp = audio_buf;
>          size_out = swr_convert(ost->swr, (      uint8_t*[]){buftmp}, audio_buf_size / (enc->channels * osize),
> @@ -1274,9 +1375,14 @@ need_realloc:
>      }
>  
>      av_assert0(ost->audio_resample || dec->sample_fmt==enc->sample_fmt);
> +#else
> +    buftmp   = decoded_frame->data[0];
> +    size_out = decoded_frame->nb_samples * enc->channels * osize;
> +#endif
>  
>      /* now encode as many frames as possible */
>      if (!(enc->codec->capabilities & CODEC_CAP_VARIABLE_FRAME_SIZE)) {
> +        int frame_bytes;
>          /* output resampled raw samples */
>          if (av_fifo_realloc2(ost->fifo, av_fifo_size(ost->fifo) + size_out) < 0) {
>              av_log(NULL, AV_LOG_FATAL, "av_fifo_realloc2() failed\n");
> @@ -1969,6 +2075,8 @@ static int transcode_audio(InputStream *ist, AVPacket *pkt, int *got_output)
>      AVCodecContext *avctx = ist->st->codec;
>      int bps = av_get_bytes_per_sample(ist->st->codec->sample_fmt);
>      int i, ret;
> +    int decoded_data_size;
> +    void *samples;
>  
>      if (!ist->decoded_frame && !(ist->decoded_frame = avcodec_alloc_frame()))
>          return AVERROR(ENOMEM);
> @@ -2000,9 +2108,9 @@ static int transcode_audio(InputStream *ist, AVPacket *pkt, int *got_output)
>  
>  
>      // preprocess audio (volume)
> +    decoded_data_size = decoded_frame->nb_samples * avctx->channels * bps;
> +    samples = decoded_frame->data[0];
>      if (audio_volume != 256) {
> -        int decoded_data_size = decoded_frame->nb_samples * avctx->channels * bps;
> -        void *samples = decoded_frame->data[0];
>          switch (avctx->sample_fmt) {
>          case AV_SAMPLE_FMT_U8:
>          {
> @@ -2057,6 +2165,23 @@ static int transcode_audio(InputStream *ist, AVPacket *pkt, int *got_output)
>          }
>      }
>  
> +#if CONFIG_AVFILTER
> +    for (i = 0; i < nb_output_streams; i++) {
> +        OutputStream *ost = &output_streams[i];
> +        if (!check_output_constraints(ist, ost) || !ost->encoding_needed)
> +            continue;
> +        if (av_asrc_buffer_add_buffer(ost->input_audio_filter,
> +                                      samples, decoded_data_size,
> +                                      ist->st->codec->sample_rate,
> +                                      ist->st->codec->sample_fmt,
> +                                      ist->st->codec->channel_layout,
> +                                      0, ist->pts, 0) < 0) {
> +            av_log(NULL, AV_LOG_FATAL, "Failed ton inject audio samples into filter network\n");
> +            exit_program(1);
> +        }
> +    }
> +#endif
> +
>      rate_emu_sleep(ist);
>  
>      for (i = 0; i < nb_output_streams; i++) {
> @@ -2064,9 +2189,32 @@ static int transcode_audio(InputStream *ist, AVPacket *pkt, int *got_output)
>  
>          if (!check_output_constraints(ist, ost) || !ost->encoding_needed)
>              continue;
> +
> +#if CONFIG_AVFILTER
> +        while (av_buffersink_poll_frame(ost->output_audio_filter)) {
> +            AVFrame *filtered_frame;
> +

> +            if (av_buffersink_get_buffer_ref(ost->output_audio_filter, &ost->samplesref, 0) < 0) {
> +                av_log(NULL, AV_LOG_WARNING, "AV Filter told us it has audio samples available but failed to output some\n");

This message is a bit lame (think of what "AV Filter" means to the
average user reading the log), yes I know this is consistent with the
video path.

> +                goto cont;
> +            }
> +            if (!ist->filtered_frame && !(ist->filtered_frame = avcodec_alloc_frame())) {
> +                ret = AVERROR(ENOMEM);
> +                goto end;
> +            }
> +            filtered_frame  = ist->filtered_frame;
> +            *filtered_frame = *decoded_frame;
> +            avfilter_fill_frame_from_audio_buffer_ref(filtered_frame, ost->samplesref);
> +            do_audio_out(output_files[ost->file_index].ctx, ost, ist, filtered_frame);
> +            cont:
> +            avfilter_unref_buffer(ost->samplesref);
> +        }
> +#else
>          do_audio_out(output_files[ost->file_index].ctx, ost, ist, decoded_frame);
> +#endif
>      }
>  
> +end:
>      return ret;
>  }
>  
> @@ -2577,6 +2725,12 @@ static int transcode_init(OutputFile *output_files, int nb_output_files,
>                  ost->audio_resample      |=    codec->sample_fmt     != icodec->sample_fmt
>                                              || codec->channel_layout != icodec->channel_layout;
>                  icodec->request_channels  = codec->channels;
> +#if CONFIG_AVFILTER
> +                if (configure_audio_filters(ist, ost)) {
> +                    av_log(NULL, AV_LOG_FATAL, "Error opening audio filters!\n");
> +                    exit_program(1);
> +                }
> +#endif
>                  ost->resample_sample_fmt  = icodec->sample_fmt;
>                  ost->resample_sample_rate = icodec->sample_rate;
>                  ost->resample_channels    = icodec->channels;
> @@ -3113,6 +3267,7 @@ static int transcode(OutputFile *output_files, int nb_output_files,
>          }
>  #if CONFIG_AVFILTER
>          avfilter_graph_free(&ost->graph);
> +        avfilter_graph_free(&ost->agraph);
>  #endif
>      }
>  
> @@ -4103,7 +4258,7 @@ static OutputStream *new_audio_stream(OptionsContext *o, AVFormatContext *oc)
>      audio_enc->codec_type = AVMEDIA_TYPE_AUDIO;
>  
>      if (!ost->stream_copy) {
> -        char *sample_fmt = NULL;
> +        char *sample_fmt = NULL, *filters = NULL;
>  
>          MATCH_PER_STREAM_OPT(audio_channels, i, audio_enc->channels, oc, st);
>  
> @@ -4118,6 +4273,12 @@ static OutputStream *new_audio_stream(OptionsContext *o, AVFormatContext *oc)
>  
>          ost->rematrix_volume=1.0;
>          MATCH_PER_STREAM_OPT(rematrix_volume, f, ost->rematrix_volume, oc, st);
> +
> +#if CONFIG_AVFILTER
> +        MATCH_PER_STREAM_OPT(filters, str, filters, oc, st);
> +        if (filters)
> +            ost->avfilter = av_strdup(filters);
> +#endif
>      }
>  
>      /* check for channel mapping for this audio stream */
> @@ -4938,9 +5099,12 @@ static int opt_qscale(OptionsContext *o, const char *opt, const char *arg)
>      return ret;
>  }
>  
> -static int opt_video_filters(OptionsContext *o, const char *opt, const char *arg)
> +static int opt_avfilters(OptionsContext *o, const char *opt, const char *arg)
>  {
> -    return parse_option(o, "filter:v", arg, options);
> +    char *s = av_asprintf("filter:%c", *opt);
> +    int ret = parse_option(o, s, arg, options);
> +    av_free(s);
> +    return ret;
>  }
>  
>  static int opt_vsync(const char *opt, const char *arg)
> @@ -5049,7 +5213,8 @@ static const OptionDef options[] = {
>      { "vstats", OPT_EXPERT | OPT_VIDEO, {(void*)&opt_vstats}, "dump video coding statistics to file" },
>      { "vstats_file", HAS_ARG | OPT_EXPERT | OPT_VIDEO, {(void*)opt_vstats_file}, "dump video coding statistics to file", "file" },
>  #if CONFIG_AVFILTER
> -    { "vf", HAS_ARG | OPT_VIDEO | OPT_FUNC2, {(void*)opt_video_filters}, "video filters", "filter list" },
> +    { "vf", HAS_ARG | OPT_VIDEO | OPT_FUNC2, {(void*)opt_avfilters}, "video filters", "filter list" },
> +    { "af", HAS_ARG | OPT_VIDEO | OPT_FUNC2, {(void*)opt_avfilters}, "audio filters", "filter list" },
>  #endif
>      { "intra_matrix", HAS_ARG | OPT_EXPERT | OPT_VIDEO | OPT_STRING | OPT_SPEC, {.off = OFFSET(intra_matrices)}, "specify intra matrix coeffs", "matrix" },
>      { "inter_matrix", HAS_ARG | OPT_EXPERT | OPT_VIDEO | OPT_STRING | OPT_SPEC, {.off = OFFSET(inter_matrices)}, "specify inter matrix coeffs", "matrix" },

[...]

Note: this requires a FATE test of course, my last incomplete &
outdated variant can be found here:
http://gitorious.org/~saste/ffmpeg/sastes-ffmpeg/commit/48d568e058f7356de1ae733ffd53cd63e3d97c69
-- 
FFmpeg = Fantastic Faithful Maxi Patchable Elastic Guru