[FFmpeg-devel] One pass volume normalization (ebur128)
Jan Ehrhardt
phpdev at ehrhardt.nl
Wed Jul 17 20:36:44 CEST 2013
Nicolas George in gmane.comp.video.ffmpeg.devel (Wed, 17 Jul 2013 17:11:15 +0200):
>I consider this very bad design. I already explained why I think this is bad
>design from the result point of view, but you are of course free to keep it
>in your software.
>
>From ffmpeg point of view, I consider this bad design for the following
>reasons:
>
>* Two filters with almost identical features and no good reason to separate
> them (there is a good reason to have ebur128 and volumedetect: correctness
> vs. speed).
The reason of the duplicate is clear: I was expecting a reaction by you
like this.
>* Exposing intermediate values that have no relevancy whatsoever. The final
> results of volumedetect are already quite dubious as volume measurements,
> but at least they have a clear mathematical meaning.
Strange. First you steer me in the direction of volumedetect and now it
seems the other way around.
>* Adding a feature to suit a very personal and specific need.
Here we differ. I do not think having the equivalent of MEncoder's '-af
volnorm' is a very personal and specific need. For live broadcasts you
need a way to normalize the loudness and the only way to do that is
looking back in time at the previous frames.
>IMHO, the correct design for solving this issue would require some or all
>the following points:
>
>* Dynamic expression evaluation for the volume filter. IIRC, Stefano had a
> patch that was pretty good; in fact, I thought it was already applied
> since a long time ago. The expression should be able to reference
> metadata (at least one item).
As far as I know, this was never implemented. Neither were any of
Clement's proposals. That is exactly the reason why I brought up the
subject once again.
>* A filter to smooth a metadata value over time, so that r128.M can be
> turned into something suitable for volume normalization.
>
>* A switch to volumedetect to inject as metadata the momentary RMS of the
> signal over a configurable frame, to use in place of r128.M and trade
> correctness for speed.
I would welcome those features, but keep in mind that neither r128.I nor
r128.M can be used at all at the moment.
>* A switch to volumedetect to inject as metadata the final results (on a
> dummy final frame maybe?).
As a switch is is OK, of course. But it will not help for live streams.
>Some of these are fairly easy, other are quite hard, and some pose problems
>of design decisions rather than implementation. Anyone should feel free to
>submit patches implementing any of these points.
My suggestion was to start with implementing a basic filter like I
proposed when I started this discussion. It was met with not only the
expected technical comments, but also with arguments against the idea
of one-pass or on-the-fly normalization at all.
See my current working version below. I have taken away some of the
technical issues. Do with it whatever you like. My idea is that is
is a good starting point for future patches like you suggested.
Jan
diff --git a/libavfilter/af_volume.c b/libavfilter/af_volume.c
index a2ac1e2..87491ea 100644
--- a/libavfilter/af_volume.c
+++ b/libavfilter/af_volume.c
@@ -51,18 +51,26 @@ static const AVOption volume_options[] = {
{ "fixed", "select 8-bit fixed-point", 0, AV_OPT_TYPE_CONST, { .i64 = PRECISION_FIXED }, INT_MIN, INT_MAX, A|F, "precision" },
{ "float", "select 32-bit floating-point", 0, AV_OPT_TYPE_CONST, { .i64 = PRECISION_FLOAT }, INT_MIN, INT_MAX, A|F, "precision" },
{ "double", "select 64-bit floating-point", 0, AV_OPT_TYPE_CONST, { .i64 = PRECISION_DOUBLE }, INT_MIN, INT_MAX, A|F, "precision" },
+ { "metadata", "set the metadata key for loudness normalization", OFFSET(metadata), AV_OPT_TYPE_STRING, { .str = NULL }, .flags = A|F },
+ { "normvol", "set volume normalization level",
+ OFFSET(normvol), AV_OPT_TYPE_DOUBLE, { .dbl = -23.0 }, INT_MIN, INT_MAX, A|F },
{ NULL },
};
AVFILTER_DEFINE_CLASS(volume);
+static void set_fixed_volume(VolumeContext *vol, double volume)
+{
+ vol->volume_i = (int)(volume * 256 + 0.5);
+ vol->volume = vol->volume_i / 256.0;
+}
+
static av_cold int init(AVFilterContext *ctx)
{
VolumeContext *vol = ctx->priv;
if (vol->precision == PRECISION_FIXED) {
- vol->volume_i = (int)(vol->volume * 256 + 0.5);
- vol->volume = vol->volume_i / 256.0;
+ set_fixed_volume(vol, vol->volume);
av_log(ctx, AV_LOG_VERBOSE, "volume:(%d/256)(%f)(%1.2fdB) precision:fixed\n",
vol->volume_i, vol->volume, 20.0*log(vol->volume)/M_LN10);
} else {
@@ -216,11 +224,31 @@ static int config_output(AVFilterLink *outlink)
static int filter_frame(AVFilterLink *inlink, AVFrame *buf)
{
- VolumeContext *vol = inlink->dst->priv;
- AVFilterLink *outlink = inlink->dst->outputs[0];
+ AVFilterContext *ctx = inlink->dst;
+ VolumeContext *vol = ctx->priv;
+ AVFilterLink *outlink = ctx->outputs[0];
int nb_samples = buf->nb_samples;
AVFrame *out_buf;
+ if (vol->metadata) {
+ double loudness, new_volume, pow_volume, timestamp, mx;
+ AVDictionaryEntry *e;
+ mx = 20;
+ timestamp = (float)(1.0 * buf->pts / outlink->sample_rate);
+ mx = fmin(mx, timestamp);
+ e = av_dict_get(buf->metadata, vol->metadata, NULL, 0);
+ if (e) {
+ loudness = av_strtod(e->value, NULL);
+ if (loudness > -69) {
+ new_volume = fmax(-mx, fmin(mx, (vol->normvol - loudness)));
+ pow_volume = pow(10, new_volume / 20);
+ av_log(ctx, AV_LOG_VERBOSE, "loudness=%f => %f => volume=%f\n",
+ loudness, new_volume, pow_volume);
+ set_fixed_volume(vol, pow_volume);
+ }
+ }
+ }
+
if (vol->volume == 1.0 || vol->volume_i == 256)
return ff_filter_frame(outlink, buf);
diff --git a/libavfilter/af_volume.h b/libavfilter/af_volume.h
index bd7932e..d79d040 100644
--- a/libavfilter/af_volume.h
+++ b/libavfilter/af_volume.h
@@ -48,6 +48,8 @@ typedef struct VolumeContext {
void (*scale_samples)(uint8_t *dst, const uint8_t *src, int nb_samples,
int volume);
int samples_align;
+ char *metadata;
+ double normvol;
} VolumeContext;
void ff_volume_init_x86(VolumeContext *vol);
More information about the ffmpeg-devel
mailing list