[FFmpeg-devel] [libav-devel] [PATCH] aacpsy: avoid norm_fac becoming NaN

Mon Apr 20 22:15:44 CEST 2015

On Sat, Apr 18, 2015 at 10:34 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Sat, Apr 18, 2015 at 01:59:53PM +0200, Andreas Cadhalpun wrote:
>> On 18.04.2015 04:40, Michael Niedermayer wrote:
>> > On Sat, Apr 18, 2015 at 12:55:08AM +0200, Andreas Cadhalpun wrote:
>> >> The problem is that minath is not the minimum, only close:
>> >>     minath = ath(3410, ATH_ADD) = -5.24237967
>> >>              ath(3407, ATH_ADD) = -5.24241638
>> >
>> > the exact location of the minimum depends on teh "add" value
>> > its around 3410 for add=0 and around 3407 for add=4
>>
>> True.
>>
>> > for fun, 3407.080774800152 is even closer than 3407 for add=4
>>
>> Yes, but the ath function calculates with floats and thus is
>> inaccurate enough that it doesn't matter if the input is 3407
>> or 3407.1.
>>
>> > but the "add" parameter should probably be user selectable
>>
>> Currently ATH_ADD is a #define, but if that was made user selectable,
>> one could approximate the position of the minimum:
>>     minath = ath(3410 - 0.733 * ATH_ADD, ATH_ADD)
>>
>> > also if you want to prevent coeffs[].ath from becoming negative then
>>
>> That isn't strictly required, because 0 is a valid value and is just
>> as bad as a negative one.
>
>> But I assume the model works better, if it uses something closer to the
>> minimum of the ath function.
>
> thats more a question for claudio than me id say

TL;DR, band->thr should not be negative ever, band->thr == 0.0f would
cause lots of issues on its own if band->energy != 0.0f in such a case
(though I don't see how band->thr can be 0.0f if band->energy is not),
and ath <= 0.0f can happen and should be no trouble if it does.

The long version:

ath should approximate the shape of the absolute hearing threshold, so
yes, it's best if it really uses the minimum, since that will prevent
clipping of the ath curve and result in a more accurate threshold
computation.

Still, when

band->thr_quiet = band->thr = FFMAX(band->thr, coeffs[g].ath);

Is computed, correct me if I'm wrong, but band->thr is the band's
energy (sum of squares), so I see how that can be zero, but not how it
can be negative.

Thus, if ath became negative, its effective shape would be clipped by band->thr.

The whole point of ath is to avoid spending lots of bits for signals
normally too faint to be heard. The case of band->energy==0 is already
handled by zero flags, but faint noise in higher frequencies needs an
absolute hearing threshold curve to properly decide when not to waste
bits in those bands. But since people can adjust the volume and you
never know the final SPL at which the signal will be played, a precise
calculation of said threshold is pointless, what I try to do in the
patch series in issue #2686, is to attempt to shift it to the
equivalent power of a 16-bit signal's quantization noise, which one
would assume should be below the absolute hearing threshold in any
sane reproduction environment - so it's a conservative estimate.

That said, ath should be > 0, not >= 0. But it's hard to enforce that
without clipping it, and it's not worth the trouble attempting it. I
don't believe accurate computation of ath to that point would improve
encoding that much. Any reasonable approximation will do.