[FFmpeg-user] Preserving perceived loudness when downmixing audio from 5.1 AC3 to stereo AAC
Francois Visagie
francois.visagie at gmail.com
Thu Aug 8 09:59:25 CEST 2013
> -----Original Message-----
> From: ffmpeg-user-bounces at ffmpeg.org [mailto:ffmpeg-user-
> bounces at ffmpeg.org] On Behalf Of Nicolas George
> Sent: 07 August 2013 20:01
> To: FFmpeg user questions
> Subject: Re: [FFmpeg-user] Preserving perceived loudness when
> downmixing audio from 5.1 AC3 to stereo AAC
>
> Le decadi 20 thermidor, an CCXXI, Francois Visagie a écrit :
> > Is it therefore correct to say that:
> > * the only input codec-independent way of downmixing to stereo is
> > ‘-ac 2’/‘-filter:a aformat=channel_layouts=stereo’/‘-filter:a
> > aresample=ocl=3’ (which now all behave the same?), and
>
> Yes. Note that it is always safe to specify both that and "-
> request_channels 2": codecs that do not support it will just ignore the option,
> and if the option is supported, the filters will just do nothing.
>
> > * if one wants to preserve perceived input volume one needs to
> adjust
> > gain during encoding?
>
> Yes, but if you do that, unless your input was never at peak level, you will get
> clipping, and that is probably worse than low volume.
>
> > Further to that, for a given energy level per input channel, does the
> > current down-mixing mechanism produce differing output energy levels
> > depending on the _number_ of input channels? I.e. is it expected that
> > different input layouts (with the same energy level per channel) would
> > require different gain factors for equally loud outputs, or will one
> > be able to find a suitable gain factor and use that regardless of
> > number of input channels?
>
> Well, of course, it depends on the number of input channels. If you want to
> mix one channel into one, you do not need to lower the volume. If you want
> to mix forty-two channels into one, you need to divide the amplitude by
> forty-two to avoid clipping. What it does to energy depends on the input. If
> the channels are in phase, the energy is preserved; if they are not, each the
> energy of each channel is divided by forty-two squared, and then the
> energies are summed, the net result is a division by forty-two.
>
> lswr uses rather tricky coefficients to match the standard loudness and
> geometry of speakers. You can get the exact matrix with -loglevel debug. For
> example, here is the 5.1 -> stereo matrix:
>
> 0.414214 0.000000 0.292893 0.000000 0.292893 0.000000
> 0.000000 0.414214 0.292893 0.000000 0.000000 0.292893
>
> That means: out_left = 0.414 front_left + 0.293 center + 0.293 back_left and
> the symmetrical formula for right; note that LFE is discarded.
Thanks for the confirmations, Nicolas, and many thanks for Andy Furniss' contributions also.
>
> Regards,
>
> --
> Nicolas George
More information about the ffmpeg-user
mailing list