[FFmpeg-devel] One pass volume normalization (ebur128)

Sat Jul 13 22:15:43 CEST 2013

Nicolas George in gmane.comp.video.ffmpeg.devel (Sat, 13 Jul 2013
21:41:52 +0200):
>Le quintidi 25 messidor, an CCXXI, Jan Ehrhardt a écrit :
>> Subject: [FFmpeg-devel] One pass volume normalization (ebur128)
>
>Single-pass volume normalization is not possible, please do not call the
>feature that way.

Call it what you like. I am using it in a single pass transcode. Just
like the -af volnorm filter in MEncoder.

>r128.I is not a good choice, but there is nothing better yet.

You can use all the r128 variables, that are inserted in the metadata.

>Missing documentation update.

I know.

>> @@ -51,18 +51,24 @@ static const AVOption volume_options[] = {
>>          { "fixed",  "select 8-bit fixed-point",     0, AV_OPT_TYPE_CONST, { .i64 = PRECISION_FIXED  }, INT_MIN, INT_MAX, A|F, "precision" },
>>          { "float",  "select 32-bit floating-point", 0, AV_OPT_TYPE_CONST, { .i64 = PRECISION_FLOAT  }, INT_MIN, INT_MAX, A|F, "precision" },
>>          { "double", "select 64-bit floating-point", 0, AV_OPT_TYPE_CONST, { .i64 = PRECISION_DOUBLE }, INT_MIN, INT_MAX, A|F, "precision" },
>
>> +    { "metadata", "set the metadata key for loudness normalization", OFFSET(metadata), AV_OPT_TYPE_STRING, { .str = NULL }, .flags = A|F },
>
>Inconsistent indentation.

Not really. If you look at the origional you will see that fixed, float
and double are values for the precision.

>> +    if (vol->metadata) {
>> +        double loudness, new_volume, timestamp, mx;
>> +        AVDictionaryEntry *e;
>> +        mx = 20; 
>> +        timestamp = (float)(1.0 * buf->pts / outlink->sample_rate);
>> +        mx = fmin(mx, timestamp);
>> +        e = av_dict_get(buf->metadata, vol->metadata, NULL, 0);
>> +        if (e) {
>> +            loudness = av_strtod(e->value, NULL);
>> +            if (loudness > -69) {
>> +                new_volume = fmax(-mx,fmin(mx,(-23 - loudness)));
>> +                av_log(NULL, AV_LOG_VERBOSE, "loudness=%f => %f => volume=%f\n",
>> +                    loudness, new_volume, pow(10, new_volume / 20));
>> +                set_fixed_volume(vol, pow(10, new_volume / 20));
>> +            }
>
>This paragraph has several problems. First, it is missing spaces around
>words, that is easy to fix.

ACK.

>Second, it has a duplicated mathematical formula, which is pretty much a
>recipe for inconsistency. That is easy to fix too.

ACK.

>Third, it has several hardcoded values, and that is not good design.

Two of the three hardcoded values should be hardcoded. The -23 is part
of the EBU R128 specs: http://tech.ebu.ch/loudness

The 69 was suggested by Clement. If there is no sound at all, the volume
level seems to be reported as -71 or somemething like that. -69 means
there is sound (with a very low volume).

The 20 is indeed an arbitrary choice, to maximize the volume adjustment
during the first 20 seconds of a video.

>It seems to me that using an expression, evaluated each time the metadata
>value changes and with that value available as a variable would be a much
>nicer design.

I agree, but this is a little above my head.

>AFAIK, this is unneeded since the "evil plan".

I do not even know what the "evil plan" is...

>> diff --git a/libavfilter/f_ebur128.c b/libavfilter/f_ebur128.c
>> index 88d37e8..f4ce6d9 100644
>> --- a/libavfilter/f_ebur128.c
>> +++ b/libavfilter/f_ebur128.c
>
>Unrelated.

Not quite either. f_ebur128.c hardcodes the errorlevel to verbose if the
metadata are set. You do not want to see the intermediate metadata if
you do a 'one pass' transcoode. If needed you can always set the
loglevel to view them.

Jan