[FFmpeg-devel] [PATCH] AAC Decoder - Round 2.

Fri Jun 27 16:35:06 CEST 2008

2008/6/23 Michael Niedermayer <michaelni at gmx.at>:
> On Mon, Jun 23, 2008 at 02:10:56PM +0100, Robert Swain wrote:
>> 2008/6/20 Michael Niedermayer <michaelni at gmx.at>:
>> > On Thu, Jun 19, 2008 at 04:22:57PM +0100, Robert Swain wrote:
>> > [...]
>> >> +
>> >> +    for (g = 0; g < ics->num_window_groups; g++) {
>> >> +        for (i = 0; i < ics->max_sfb; i++) {
>> >> +            if (cb[g][i] == NOISE_HCB) {
>> >> +                for (group = 0; group < ics->group_len[g]; group++) {
>> >> +                    float energy = 0;
>> >> +                    float scale = 1.;// / (float)(offsets[i+1] - offsets[i]);
>> >> +                    for (k = offsets[i]; k < offsets[i+1]; k++)
>> >> +                        energy += (float)icoef[group*128+k] * icoef[group*128+k];
>> >> +                    scale *= sf[g][i] / sqrt(energy);
>> >
>> > are you sure that the random values have to be normalized like that?
>> > I suspect energy is supposed tp be a constant.
>>
>> That's how it is in the spec. From section 4.6.13 Perceptual Noise
>> Substitution (PNS):
>
> Ive checked the spec before my reply, and i belive your code is wrong.
>
>> The energy information for percpetual noise substitution decoding is
>> represented by a "noise energy" value indicating the overall power of
>> the substituted spectral coefficients in steps of 1.5 dB. If noise
>> substitution coding is active for a particular group and scalefactor
>> band, a noise energy value is transmitted instead of the scalefactor
>> of the respective channel.
>
> Doesnt say that the output from the random number generator should be choped
> up in bands and each independantly renormalized.
>
> Heres what the spec says:
>    /* Decode noise energies for this group */
>    for (sfb=0; sfb<max_sfb; sfb++)
>        if (is_noise(g,sfb))
>            noise_nrg[g][sfb] = nrg += dpcm_noise_nrg[g][sfb];
>    /* Do perceptual noise substitution decoding */
>    for (b=0; b<window_group_length[g]; b++) {
>        for (sfb=0; sfb<max_sfb; sfb++) {
>            if (is_noise(g,sfb)) {
>                offs = swb_offset[sfb];
>                size = swb_offset[sfb+1] - offs;
>                /* Generate random vector */
>                gen_rand_vector( &spec[g][b][sfb][0], size );
>                scale = 1/(size * sqrt(MEAN_NRG));
>                scale *= 2.0^(0.25*noise_nrg [g][sfb]);
>                /* Scale random vector to desired target energy */
>                for (i=0; i<len; i++)
>                    spec[g][b][sfb][i] *= scale;
>            }
>        }
>    }
>    ...
>    The function gen_rand_vector( addr, size ) generates a vector of length <size> with signed random values of
>    average energy MEAN_NRG per random value. A suitable random number generator can be realized using one
>    multiplication/accumulation per random value.
>
> No weird renormalization!
> also the size factor is commented out in our code, i guess to cancel the
> incorrect normalization mostly out.

OK, I understand what you mean now. I see that the current code we
have calculates the energy of the band in question (hence there's no
need for /size) and scales to that energy rather than scaling to
1/(size * MEAN_NRG). I have a few questions:

- Is the av_random() & 0xFFFF code OK or should the 32-bit random
value rather be scaled to 16-bit? I suspect it should be scaled (i.e.
>> 16).
- How does one analytically calculate the mean energy of the noise we
generate such that this value can be defined as a constant and used in
our code? After assuming white noise and dabbling about a bit with
some maths it seems it should be 1/3 * maximum possible energy i.e.
#define MEAN_NRG sqrt(3.0) * 32768.0 or something like that. Is this
correct? What precision would be good for this value?

I haven't found any samples which use this tool yet to test it. I
tried to generate one using an old Psytel AAC Encoder windows binary
that claims to encode PNS but the produced ADTS format file didn't
work with either FAAD or ffaac. As far as I know, none of the files on
mphq use PNS either.

Rob