[FFmpeg-devel] [PATCH] lavfi: add volumedetect filter.

Stefano Sabatini stefasab at gmail.com
Sat Aug 18 20:16:54 CEST 2012


On date Saturday 2012-08-18 18:23:58 +0200, Nicolas George encoded:
> 
> Signed-off-by: Nicolas George <nicolas.george at normalesup.org>
> ---
>  Changelog                     |    1 +
>  doc/filters.texi              |   40 ++++++++++
>  libavfilter/Makefile          |    1 +
>  libavfilter/af_volumedetect.c |  164 +++++++++++++++++++++++++++++++++++++++++
>  libavfilter/allfilters.c      |    1 +
>  5 files changed, 207 insertions(+)
>  create mode 100644 libavfilter/af_volumedetect.c
> 
> diff --git a/Changelog b/Changelog
> index cd73c6d..14e01f3 100644
> --- a/Changelog
> +++ b/Changelog
> @@ -50,6 +50,7 @@ version next:
>  - edge detection filter
>  - framestep filter
>  - ffmpeg -shortest option is now per-output file
> +- volume measurement filter
>  
>  
>  version 0.11:
> diff --git a/doc/filters.texi b/doc/filters.texi
> index 5793100..8847990 100644
> --- a/doc/filters.texi
> +++ b/doc/filters.texi
> @@ -690,6 +690,46 @@ volume=-12dB
>  @end example
>  @end itemize
>  
> + at section volumedetect
> +
> +Detect the volume of the input video.
> +
> +The filter has no parameters. The input is not modified. Statistics about
> +the volume will be printed in the log when the input stream end is reached.
> +
> +In particular it will show the mean volume (root mean square), maximum
> +volume (on a per-sample basis), and the beginning of an histogram of the
> +registered volume values (from the maximum value to a cumulated 1/1000 of
> +the samples).
> +
> +All volumes are in decibels relative to the maximum PCM value.
> +
> +Here is an excerpt of the output:
> + at example
> +[Parsed_volumedetect_0 @ 0xa23120] mean_volume: -27 dB
> +[Parsed_volumedetect_0 @ 0xa23120] max_volume: -4 dB
> +[Parsed_volumedetect_0 @ 0xa23120] histogram_4db: 6
> +[Parsed_volumedetect_0 @ 0xa23120] histogram_5db: 62
> +[Parsed_volumedetect_0 @ 0xa23120] histogram_6db: 286
> +[Parsed_volumedetect_0 @ 0xa23120] histogram_7db: 1042
> +[Parsed_volumedetect_0 @ 0xa23120] histogram_8db: 2551
> +[Parsed_volumedetect_0 @ 0xa23120] histogram_9db: 4609
> +[Parsed_volumedetect_0 @ 0xa23120] histogram_10db: 8409
> + at end example
> +
> +It means that:
> + at itemize
> + at item
> +The mean square energy is approximately -27 dB, or 10^-2.7.
> + at item
> +The largest sample is at -4 dB, or more precisely between -4 dB and -5 dB.
> + at item
> +There are 6 samples at -4 dB, 62 at -5 dB, 286 at -6 dB, etc.
> + at end itemize
> +
> +In other words, raising the volume by +4 dB does not cause any clipping,
> +raising it by +5 dB causes clipping for 6 samples, etc.
> +
>  @section asyncts
>  Synchronize audio data with timestamps by squeezing/stretching it and/or
>  dropping samples/adding silence when needed.
> diff --git a/libavfilter/Makefile b/libavfilter/Makefile
> index 916e54a..af4fde6 100644
> --- a/libavfilter/Makefile
> +++ b/libavfilter/Makefile
> @@ -67,6 +67,7 @@ OBJS-$(CONFIG_PAN_FILTER)                    += af_pan.o
>  OBJS-$(CONFIG_RESAMPLE_FILTER)               += af_resample.o
>  OBJS-$(CONFIG_SILENCEDETECT_FILTER)          += af_silencedetect.o
>  OBJS-$(CONFIG_VOLUME_FILTER)                 += af_volume.o
> +OBJS-$(CONFIG_VOLUMEDETECT_FILTER)           += af_volumedetect.o
>  
>  OBJS-$(CONFIG_AEVALSRC_FILTER)               += asrc_aevalsrc.o
>  OBJS-$(CONFIG_ANULLSRC_FILTER)               += asrc_anullsrc.o
> diff --git a/libavfilter/af_volumedetect.c b/libavfilter/af_volumedetect.c
> new file mode 100644
> index 0000000..0a6306c
> --- /dev/null
> +++ b/libavfilter/af_volumedetect.c
> @@ -0,0 +1,164 @@
> +/*
> + * Copyright (c) 2012 Nicolas George
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public License
> + * as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public License
> + * along with FFmpeg; if not, write to the Free Software Foundation, Inc.,
> + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +/**
> + * @file
> + * filter for showing textual audio frame information
> + */
> +
> +#include "libavutil/audioconvert.h"
> +#include "libavutil/avassert.h"
> +#include "audio.h"
> +#include "avfilter.h"
> +#include "internal.h"
> +
> +typedef struct {
> +    /**
> +     * Number of samples at each PCM value.
> +     * histogram[0x8000 + i] is the number of samples at value i.
> +     * The extra element is there for symmetry.
> +     */
> +    uint64_t histogram[0x10001];
> +} VolDetectContext;
> +
> +static int query_formats(AVFilterContext *ctx)
> +{
> +    enum AVSampleFormat sample_fmts[] = {
> +        AV_SAMPLE_FMT_S16,
> +        AV_SAMPLE_FMT_S16P,
> +        AV_SAMPLE_FMT_NONE
> +    };
> +    AVFilterFormats *formats;
> +
> +    if (!(formats = ff_make_format_list(sample_fmts)))
> +        return AVERROR(ENOMEM);
> +    ff_set_common_formats(ctx, formats);
> +
> +    return 0;
> +}
> +
> +static int filter_samples(AVFilterLink *inlink, AVFilterBufferRef *samples)
> +{
> +    AVFilterContext *ctx = inlink->dst;
> +    VolDetectContext *vd = ctx->priv;
> +    int64_t layout  = samples->audio->channel_layout;
> +    int nb_samples  = samples->audio->nb_samples;
> +    int nb_channels = av_get_channel_layout_nb_channels(layout);
> +    int nb_planes   = nb_planes;
> +    int plane, i;
> +    int16_t *pcm;
> +    
> +    if (!av_sample_fmt_is_planar(samples->format)) {
> +        nb_samples *= nb_channels;
> +        nb_planes = 1;
> +    }
> +    for (plane = 0; plane < nb_planes; plane++) {
> +        pcm = (int16_t *)samples->extended_data[plane];
> +        for (i = 0; i < nb_samples; i++)
> +            vd->histogram[pcm[i] + 0x8000]++;
> +    }
> +
> +    return ff_filter_samples(inlink->dst->outputs[0], samples);
> +}
> +

> +#define MAX_DB 91
> +
> +static inline int logdb(uint64_t v)
> +{
> +    double d = v / (double)(0x8000 * 0x8000);
> +    if (!v)
> +        return MAX_DB;
> +    return log(d) * -4.3429448190325182765112891891660508229; /* -10/log(10) */

You may consider to return a more exact value (especially useful for
the max value) and approximate when required.

Also I'd consider more natural to return a negative value (and replace
MAX_DB with MIN_DB = -91).

[...]

Looks good to me otherwise, and nice work.
-- 
FFmpeg = Furious and Forgiving Martial Pacific Earthshaking God


More information about the ffmpeg-devel mailing list