[FFmpeg-user] Preserving perceived loudness when downmixing audio from 5.1 AC3 to stereo AAC

Wed Aug 14 14:36:32 CEST 2013

> -----Original Message-----
> From: ffmpeg-user-bounces at ffmpeg.org [mailto:ffmpeg-user-
> bounces at ffmpeg.org] On Behalf Of Nicolas George
> Sent: 07 August 2013 20:01
> To: FFmpeg user questions
> Subject: Re: [FFmpeg-user] Preserving perceived loudness when
> downmixing audio from 5.1 AC3 to stereo AAC
> 
> Le decadi 20 thermidor, an CCXXI, Francois Visagie a écrit :
> > Is it therefore correct to say that:
> > 	* the only input codec-independent way of downmixing to stereo is
> > ‘-ac 2’/‘-filter:a aformat=channel_layouts=stereo’/‘-filter:a
> > aresample=ocl=3’ (which now all behave the same?), and
> 
> Yes. Note that it is always safe to specify both that and "-
> request_channels 2": codecs that do not support it will just ignore the option,
> and if the option is supported, the filters will just do nothing.
> 
> > 	* if one wants to preserve perceived input volume one needs to
> adjust
> > gain during encoding?
> 
> Yes, but if you do that, unless your input was never at peak level, you will get
> clipping, and that is probably worse than low volume.
> 
> > Further to that, for a given energy level per input channel, does the
> > current down-mixing mechanism produce differing output energy levels
> > depending on the _number_ of input channels? I.e. is it expected that
> > different input layouts (with the same energy level per channel) would
> > require different gain factors for equally loud outputs, or will one
> > be able to find a suitable gain factor and use that regardless of
> > number of input channels?
> 
> Well, of course, it depends on the number of input channels. If you want to
> mix one channel into one, you do not need to lower the volume. If you want
> to mix forty-two channels into one, you need to divide the amplitude by
> forty-two to avoid clipping. What it does to energy depends on the input. If
> the channels are in phase, the energy is preserved; if they are not, each the
> energy of each channel is divided by forty-two squared, and then the
> energies are summed, the net result is a division by forty-two.
> 

Is it possible to normalise audio levels using ffmpeg? The 'pan' filter documentation mentions:

"If the ‘=’ in a channel specification is replaced by ‘<’, then the gains for that specification will be renormalized so that the total is 1, thus avoiding clipping noise."

I.e., having downmixed to stereo, can one expect correct normalisation from '-filter:a pan=stereo:c0<c0:c1<c1'?

If not, does ffmpeg provide a better mechanism, or is something like that in planning?

Thanks,
Francois

> lswr uses rather tricky coefficients to match the standard loudness and
> geometry of speakers. You can get the exact matrix with -loglevel debug. For
> example, here is the 5.1 -> stereo matrix:
> 
> 0.414214 0.000000 0.292893 0.000000 0.292893 0.000000
> 0.000000 0.414214 0.292893 0.000000 0.000000 0.292893
> 
> That means: out_left = 0.414 front_left + 0.293 center + 0.293 back_left and
> the symmetrical formula for right; note that LFE is discarded.
> 
> Regards,
> 
> --
>   Nicolas George