[FFmpeg-devel] [PATCH] AAC Decoder - Round 2.

Sun Jun 29 11:42:40 CEST 2008

2008/6/27 Michael Niedermayer <michaelni at gmx.at>:
> On Fri, Jun 27, 2008 at 03:35:06PM +0100, Robert Swain wrote:
>> 2008/6/23 Michael Niedermayer <michaelni at gmx.at>:
>> > On Mon, Jun 23, 2008 at 02:10:56PM +0100, Robert Swain wrote:
>> >> 2008/6/20 Michael Niedermayer <michaelni at gmx.at>:
>> >> > On Thu, Jun 19, 2008 at 04:22:57PM +0100, Robert Swain wrote:
>> >> > [...]
>> >> >> +
>> >> >> +    for (g = 0; g < ics->num_window_groups; g++) {
>> >> >> +        for (i = 0; i < ics->max_sfb; i++) {
>> >> >> +            if (cb[g][i] == NOISE_HCB) {
>> >> >> +                for (group = 0; group < ics->group_len[g]; group++) {
>> >> >> +                    float energy = 0;
>> >> >> +                    float scale = 1.;// / (float)(offsets[i+1] - offsets[i]);
>> >> >> +                    for (k = offsets[i]; k < offsets[i+1]; k++)
>> >> >> +                        energy += (float)icoef[group*128+k] * icoef[group*128+k];
>> >> >> +                    scale *= sf[g][i] / sqrt(energy);
>> >> >
>> >> > are you sure that the random values have to be normalized like that?
>> >> > I suspect energy is supposed tp be a constant.
>> >>
>> >> That's how it is in the spec. From section 4.6.13 Perceptual Noise
>> >> Substitution (PNS):
>> >
>> > Ive checked the spec before my reply, and i belive your code is wrong.
>> >
>> >> The energy information for percpetual noise substitution decoding is
>> >> represented by a "noise energy" value indicating the overall power of
>> >> the substituted spectral coefficients in steps of 1.5 dB. If noise
>> >> substitution coding is active for a particular group and scalefactor
>> >> band, a noise energy value is transmitted instead of the scalefactor
>> >> of the respective channel.
>> >
>> > Doesnt say that the output from the random number generator should be choped
>> > up in bands and each independantly renormalized.
>> >
>> > Heres what the spec says:
>> >    /* Decode noise energies for this group */
>> >    for (sfb=0; sfb<max_sfb; sfb++)
>> >        if (is_noise(g,sfb))
>> >            noise_nrg[g][sfb] = nrg += dpcm_noise_nrg[g][sfb];
>> >    /* Do perceptual noise substitution decoding */
>> >    for (b=0; b<window_group_length[g]; b++) {
>> >        for (sfb=0; sfb<max_sfb; sfb++) {
>> >            if (is_noise(g,sfb)) {
>> >                offs = swb_offset[sfb];
>> >                size = swb_offset[sfb+1] - offs;
>> >                /* Generate random vector */
>> >                gen_rand_vector( &spec[g][b][sfb][0], size );
>> >                scale = 1/(size * sqrt(MEAN_NRG));
>> >                scale *= 2.0^(0.25*noise_nrg [g][sfb]);
>> >                /* Scale random vector to desired target energy */
>> >                for (i=0; i<len; i++)
>> >                    spec[g][b][sfb][i] *= scale;
>> >            }
>> >        }
>> >    }
>> >    ...
>> >    The function gen_rand_vector( addr, size ) generates a vector of length <size> with signed random values of
>> >    average energy MEAN_NRG per random value. A suitable random number generator can be realized using one
>> >    multiplication/accumulation per random value.
>> >
>> > No weird renormalization!
>> > also the size factor is commented out in our code, i guess to cancel the
>> > incorrect normalization mostly out.
>>
>> OK, I understand what you mean now. I see that the current code we
>> have calculates the energy of the band in question (hence there's no
>> need for /size) and scales to that energy rather than scaling to
>> 1/(size * MEAN_NRG). I have a few questions:
>>
>
>> - Is the av_random() & 0xFFFF code OK or should the 32-bit random
>> value rather be scaled to 16-bit? I suspect it should be scaled (i.e.
>> >> 16).
>
> Why dont you use the whole 32bit value? Or was the array int16_t in
> which case >>16 is as good as &0xFFFF given the random number generator
> is good.

I might as well and I'll probably move the av_random() calls to
quant_to_spec_tool().

>> - How does one analytically calculate the mean energy of the noise we
>> generate such that this value can be defined as a constant and used in
>> our code?
>
> How do you define energy?

For a discrete signal x_i for i = [0, N], the energy of the signal is
sum(x_i * x_i, i=0, i=N).

> What is the distribution of the values av_random() outputs?

/** generates a random number on [0,0xffffffff]-interval */

If it claims to generate a random number on that interval then without
stating any bias in the distribution, it should mean generates values
in that interval with equal probability.

>> After assuming white noise and dabbling about a bit with
>> some maths it seems it should be 1/3 * maximum possible energy i.e.
>> #define MEAN_NRG sqrt(3.0) * 32768.0 or something like that. Is this
>> correct?
>
> If you know how you define energy then you can easily check this by
> calculating the exact nummerical energy of many values from av_random()
> your calculation should be close to the value returned ...

I tried my sum for large values of N with x_i on the interval [0.0,
1.0] and it came close to 1/3.

Ideally, for the mean energy, one would use the limit as N->inf of the
above-mentioned sum / N

> If you further assume random() has a flat equi distibuted output than
> a simple
> for(i=0; i<=0xFFFF; i++)
>    find_energy(&energy, i);
> will give you the exact value
> ... the analytical one, assuming you solved the correct integral ;) is
> actually just an approximation for the continuous case ...

Right, and I was just wondering if you could clarify the continuous case. :)

By the way, I've just looked at the FAAD and 3GPP implementations and
they both scale per random vector, i.e. per band as the current code
does. They both actually do the scaling within the random vector
generation function though I guess FAAD is based off the 3GPP
reference code.

So, which should we use?

- Scaling per band as in the reference code
- Scaling with constant mean energy

If the latter, what precision should be used for the sqrt(3.0) * 2^31 constant?

Rob