[FFmpeg-devel] [PATCH] AAC Decoder - Round 2.

Sun Jun 29 15:50:52 CEST 2008

2008/6/29 Michael Niedermayer <michaelni at gmx.at>:
> On Sun, Jun 29, 2008 at 10:42:40AM +0100, Robert Swain wrote:
>> 2008/6/27 Michael Niedermayer <michaelni at gmx.at>:
>> > On Fri, Jun 27, 2008 at 03:35:06PM +0100, Robert Swain wrote:
>> >> 2008/6/23 Michael Niedermayer <michaelni at gmx.at>:
>> >> > On Mon, Jun 23, 2008 at 02:10:56PM +0100, Robert Swain wrote:
>> >> >> 2008/6/20 Michael Niedermayer <michaelni at gmx.at>:
>> >> >> > On Thu, Jun 19, 2008 at 04:22:57PM +0100, Robert Swain wrote:
>> >> >> > [...]
>> >> >> >> +
>> >> >> >> +    for (g = 0; g < ics->num_window_groups; g++) {
>> >> >> >> +        for (i = 0; i < ics->max_sfb; i++) {
>> >> >> >> +            if (cb[g][i] == NOISE_HCB) {
>> >> >> >> +                for (group = 0; group < ics->group_len[g]; group++) {
>> >> >> >> +                    float energy = 0;
>> >> >> >> +                    float scale = 1.;// / (float)(offsets[i+1] - offsets[i]);
>> >> >> >> +                    for (k = offsets[i]; k < offsets[i+1]; k++)
>> >> >> >> +                        energy += (float)icoef[group*128+k] * icoef[group*128+k];
>> >> >> >> +                    scale *= sf[g][i] / sqrt(energy);
>> >> >> >
>> >> >> > are you sure that the random values have to be normalized like that?
>> >> >> > I suspect energy is supposed tp be a constant.
>> >> >>
>> >> >> That's how it is in the spec. From section 4.6.13 Perceptual Noise
>> >> >> Substitution (PNS):
>> >> >
>> >> > Ive checked the spec before my reply, and i belive your code is wrong.
>> >> >
>> >> >> The energy information for percpetual noise substitution decoding is
>> >> >> represented by a "noise energy" value indicating the overall power of
>> >> >> the substituted spectral coefficients in steps of 1.5 dB. If noise
>> >> >> substitution coding is active for a particular group and scalefactor
>> >> >> band, a noise energy value is transmitted instead of the scalefactor
>> >> >> of the respective channel.
>> >> >
>> >> > Doesnt say that the output from the random number generator should be choped
>> >> > up in bands and each independantly renormalized.
>> >> >
>> >> > Heres what the spec says:
>> >> >    /* Decode noise energies for this group */
>> >> >    for (sfb=0; sfb<max_sfb; sfb++)
>> >> >        if (is_noise(g,sfb))
>> >> >            noise_nrg[g][sfb] = nrg += dpcm_noise_nrg[g][sfb];
>> >> >    /* Do perceptual noise substitution decoding */
>> >> >    for (b=0; b<window_group_length[g]; b++) {
>> >> >        for (sfb=0; sfb<max_sfb; sfb++) {
>> >> >            if (is_noise(g,sfb)) {
>> >> >                offs = swb_offset[sfb];
>> >> >                size = swb_offset[sfb+1] - offs;
>> >> >                /* Generate random vector */
>> >> >                gen_rand_vector( &spec[g][b][sfb][0], size );
>> >> >                scale = 1/(size * sqrt(MEAN_NRG));
>> >> >                scale *= 2.0^(0.25*noise_nrg [g][sfb]);
>> >> >                /* Scale random vector to desired target energy */
>> >> >                for (i=0; i<len; i++)
>> >> >                    spec[g][b][sfb][i] *= scale;
>> >> >            }
>> >> >        }
>> >> >    }
>> >> >    ...
>> >> >    The function gen_rand_vector( addr, size ) generates a vector of length <size> with signed random values of
>> >> >    average energy MEAN_NRG per random value. A suitable random number generator can be realized using one
>> >> >    multiplication/accumulation per random value.
>> >> >
>> >> > No weird renormalization!
>> >> > also the size factor is commented out in our code, i guess to cancel the
>> >> > incorrect normalization mostly out.
>> >>
>> >> OK, I understand what you mean now. I see that the current code we
>> >> have calculates the energy of the band in question (hence there's no
>> >> need for /size) and scales to that energy rather than scaling to
>> >> 1/(size * MEAN_NRG). I have a few questions:
>> >>
>> >
>> >> - Is the av_random() & 0xFFFF code OK or should the 32-bit random
>> >> value rather be scaled to 16-bit? I suspect it should be scaled (i.e.
>> >> >> 16).
>> >
>> > Why dont you use the whole 32bit value? Or was the array int16_t in
>> > which case >>16 is as good as &0xFFFF given the random number generator
>> > is good.
>>
>> I might as well and I'll probably move the av_random() calls to
>> quant_to_spec_tool().
>>
>> >> - How does one analytically calculate the mean energy of the noise we
>> >> generate such that this value can be defined as a constant and used in
>> >> our code?
>> >
>> > How do you define energy?
>>
>> For a discrete signal x_i for i = [0, N], the energy of the signal is
>> sum(x_i * x_i, i=0, i=N).
>>
>> > What is the distribution of the values av_random() outputs?
>>
>> /** generates a random number on [0,0xffffffff]-interval */
>>
>> If it claims to generate a random number on that interval then without
>> stating any bias in the distribution, it should mean generates values
>> in that interval with equal probability.
>>
>> >> After assuming white noise and dabbling about a bit with
>> >> some maths it seems it should be 1/3 * maximum possible energy i.e.
>> >> #define MEAN_NRG sqrt(3.0) * 32768.0 or something like that. Is this
>> >> correct?
>> >
>> > If you know how you define energy then you can easily check this by
>> > calculating the exact nummerical energy of many values from av_random()
>> > your calculation should be close to the value returned ...
>>
>> I tried my sum for large values of N with x_i on the interval [0.0,
>> 1.0] and it came close to 1/3.
>>
>> Ideally, for the mean energy, one would use the limit as N->inf of the
>> above-mentioned sum / N
>>
>> > If you further assume random() has a flat equi distibuted output than
>> > a simple
>> > for(i=0; i<=0xFFFF; i++)
>> >    find_energy(&energy, i);
>> > will give you the exact value
>> > ... the analytical one, assuming you solved the correct integral ;) is
>> > actually just an approximation for the continuous case ...
>>
>> Right, and I was just wondering if you could clarify the continuous case. :)
>>
>> By the way, I've just looked at the FAAD and 3GPP implementations and
>> they both scale per random vector, i.e. per band as the current code
>> does. They both actually do the scaling within the random vector
>> generation function though I guess FAAD is based off the 3GPP
>> reference code.
>>
>> So, which should we use?
>>
>> - Scaling per band as in the reference code
>> - Scaling with constant mean energy
>
> the second, because
> 1. thats what the spec we have says
> 2. its faster
> 3. its simpler
> 4. its random, rescaling subsets of numbers "dynamically" is not random
>   anymore.
>
> If of course all files we had with thsis feature sounded better with the
> faad/3gpp variant then we should choose the first but IIRC we dont have
> any such files ...
>
>
>>
>> If the latter, what precision should be used for the sqrt(3.0) * 2^31 constant?
>
> as the output is float, a float constant seems logic ...

See attached.

Rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 20080629-1450-noise-use_mean_energy.diff
Type: text/x-diff
Size: 2302 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080629/acf1712c/attachment.diff>