[FFmpeg-user] Preserving perceived loudness when downmixing audio from 5.1 AC3 to stereo AAC

Nicolas George nicolas.george at normalesup.org
Wed Aug 7 20:00:30 CEST 2013


Le decadi 20 thermidor, an CCXXI, Francois Visagie a écrit :
> Is it therefore correct to say that:
> 	* the only input codec-independent way of downmixing to stereo is
> ‘-ac 2’/‘-filter:a
> aformat=channel_layouts=stereo’/‘-filter:a aresample=ocl=3’ (which now all
> behave the same?), and

Yes. Note that it is always safe to specify both that and
"-request_channels 2": codecs that do not support it will just ignore the
option, and if the option is supported, the filters will just do nothing.

> 	* if one wants to preserve perceived input volume one needs to
> adjust gain during encoding?

Yes, but if you do that, unless your input was never at peak level, you will
get clipping, and that is probably worse than low volume.

> Further to that, for a given energy level per input channel, does the
> current down-mixing mechanism produce differing output energy levels
> depending on the _number_ of input channels? I.e. is it expected that
> different input layouts (with the same energy level per channel) would
> require different gain factors for equally loud outputs, or will one be able
> to find a suitable gain factor and use that regardless of number of input
> channels?

Well, of course, it depends on the number of input channels. If you want to
mix one channel into one, you do not need to lower the volume. If you want
to mix forty-two channels into one, you need to divide the amplitude by
forty-two to avoid clipping. What it does to energy depends on the input. If
the channels are in phase, the energy is preserved; if they are not, each
the energy of each channel is divided by forty-two squared, and then the
energies are summed, the net result is a division by forty-two.

lswr uses rather tricky coefficients to match the standard loudness and
geometry of speakers. You can get the exact matrix with -loglevel debug. For
example, here is the 5.1 -> stereo matrix:

0.414214 0.000000 0.292893 0.000000 0.292893 0.000000 
0.000000 0.414214 0.292893 0.000000 0.000000 0.292893 

That means: out_left = 0.414 front_left + 0.293 center + 0.293 back_left
and the symmetrical formula for right; note that LFE is discarded.

Regards,

-- 
  Nicolas George
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-user/attachments/20130807/099225e2/attachment.asc>


More information about the ffmpeg-user mailing list