[FFmpeg-devel] [PATCH] AAC decoder

Sun May 25 20:27:31 CEST 2008

2008/5/25 Ivan Kalvachev <ikalvachev at gmail.com>:
> On 5/25/08, Michael Niedermayer <michaelni at gmx.at> wrote:
>> On Sun, May 25, 2008 at 02:55:07PM +0100, Robert Swain wrote:
>>> 2008/5/24 Michael Niedermayer <michaelni at gmx.at>:
>>> > On Sat, May 24, 2008 at 06:35:37PM +0100, Robert Swain wrote:
>>> >> 2008/5/23 Michael Niedermayer <michaelni at gmx.at>:
>>> >> > On Fri, May 23, 2008 at 01:59:41PM +0100, Robert Swain wrote:
>>> >> >> Index: aac.c
>>> >> >> ===================================================================
>>> >> >> --- aac.c     (revision 2185)
>>> >> >> +++ aac.c     (working copy)
>>> >> >> @@ -366,7 +366,7 @@
>>> >> >>      DECLARE_ALIGNED_16(float, sine_short_128[128]);
>>> >> >>      DECLARE_ALIGNED_16(float, pow2sf_tab[256]);
>>> >> >>      DECLARE_ALIGNED_16(float, intensity_tab[256]);
>>> >> >> -    DECLARE_ALIGNED_16(float, ivquant_tab[256]);
>>> >> >> +    DECLARE_ALIGNED_16(float, ivquant_tab[128]);
>>> >> >>      MDCTContext mdct;
>>> >> >>      MDCTContext mdct_small;
>>> >> >>      MDCTContext *mdct_ltp;
>>> >> >> @@ -890,8 +890,11 @@
>>> >> >>      // BIAS method instead needs values -1<x<1
>>> >> >>      for (i = 0; i < 256; i++)
>>> >> >>          ac->intensity_tab[i] = pow(0.5, (i - 100) / 4.);
>>> >> >> -    for (i = 0; i <
>>> >> >> sizeof(ac->ivquant_tab)/sizeof(ac->ivquant_tab[0]); i++)
>>> >> >> -        ac->ivquant_tab[i] = pow(i, 4./3);
>>> >> >> +    for (i = 0; i <
>>> >> >> sizeof(ac->ivquant_tab)/(sizeof(ac->ivquant_tab[0])<<1); i++) {
>>> >> >> +        int idx = i<<1;
>>> >> >> +        ac->ivquant_tab[idx]     =  pow(i, 4./3);
>>> >> >> +        ac->ivquant_tab[idx + 1] = -ac->ivquant_tab[idx];
>>> >> >> +    }
>>> >> >>
>>> >> >>      if(ac->dsp.float_to_int16 == ff_float_to_int16_c) {
>>> >> >>          ac->add_bias = 385.0f;
>>> >> >
>>> >> >> @@ -1035,13 +1038,12 @@
>>> >> >>  }
>>> >> >>
>>> >> >>  static inline float ivquant(AACContext * ac, int a) {
>>> >> >
>>> >> >> -    static const float sign[2] = { -1., 1. };
>>> >> >>      int tmp = (a>>31);
>>> >> >>      int abs_a = (a^tmp)-tmp;
>>> >> >> -    if (abs_a < sizeof(ac->ivquant_tab)/sizeof(ac->ivquant_tab[0]))
>>> >> >> -        return sign[tmp+1] * ac->ivquant_tab[abs_a];
>>> >> >> +    if (abs_a <
>>> >> >> sizeof(ac->ivquant_tab)/(sizeof(ac->ivquant_tab[0])<<1))
>>> >> >> +        return ac->ivquant_tab[(abs_a<<1) + !!tmp];
>>> >> >
>>> >> > ehh... this should be:
>>> >> >
>>> >> > if(a + 127U < 255U)
>>> >> >    return ivquant_tab[a + 127U];
>>> >> >
>>> >> > (or other constants depending on what table size is best ...)
>>> >> >
>>> >> >
>>> >> >>      else
>>> >> >> -        return sign[tmp+1] * pow(abs_a, 4./3);
>>> >> >> +        return (2 * tmp + 1) * pow(abs_a, 4./3);
>>> >> >
>>> >> > pow(fabs(a), 1./3) * a;
>>> >>
>>> >> With those suggestions it is much faster. The alternating sign
>>> >> construction for the table wasn't my idea, but I won't name names. :)
>>> >> Anyway, see attached. Benchmarks on the same FAAC encoded South Park
>>> >> episode:
>>> >>
>>> >> old size 256
>>> > [...]
>>> >> 3956 dezicycles in ivquant, 2096816 runs, 336 skipsup=0 drop=0
>>> >>
>>> >> new size 8
>>> > [...]
>>> >> 4840 dezicycles in ivquant, 2066668 runs, 30484 skips=0 drop=0
>>> >>
>>> >> new size 16
>>> > [...]
>>> >> 3650 dezicycles in ivquant, 2093424 runs, 3728 skipsp=0 drop=0
>>> >>
>>> >> new size 32
>>> > [...]
>>> >> 3438 dezicycles in ivquant, 2096888 runs, 264 skipsup=0 drop=0
>>> >>
>>> >> new size 64
>>> > [...]
>>> >> 3447 dezicycles in ivquant, 2096915 runs, 237 skipsup=0 drop=0
>>> >>
>>> >> new size 128
>>> > [...]
>>> >> 3431 dezicycles in ivquant, 2096918 runs, 234 skipsup=0 drop=0
>>> >>
>>> >> new size 256
>>> > [...]
>>> >> 3431 dezicycles in ivquant, 2096953 runs, 199 skipsup=0 drop=0
>>> >>
>>> >> new size 512
>>> > [...]
>>> >> 3438 dezicycles in ivquant, 2097093 runs, 59 skipsdup=0 drop=0
>>> >>
>>> >> It looks to me like there's little difference in performance when the
>>> >> table is of size 32 or larger. Should I use size 32?
>>> >
>>> > From the numbers i see, yes 32 seems the best choice.
>>> >
>>> > What bitrate did your test file have? High bitrate files might be faster
>>> > with larger tables, so if it was low bitrate then it might be worth
>>> > retrying
>>> > with some higher bitrate.
>>>
>>> Same audio source but encoded to 320kbps with QuickTime.
>>
>>> I included
>>> the full listings as some table sizes seem to behave strangely based
>>> on the number of calls.
>>
>> Effects of the skipping of (some) pow() i assume ...
>>
>>
>>>
>>> size 32
>> [...]
>>> 16429 dezicycles in ivquant, 4169262 runs, 25042 skips drop=0
>>>
>>> size 64
>> [...]
>>> 11718 dezicycles in ivquant, 4147408 runs, 46896 skips drop=0
>>>
>>> size 128
>> [...]
>>> 7687 dezicycles in ivquant, 4148129 runs, 46175 skips0 drop=0
>>>
>>> size 256
>> [...]
>>> 5174 dezicycles in ivquant, 4166995 runs, 27309 skips0 drop=0
>>>
>>> size 512
>> [...]
>>> 3826 dezicycles in ivquant, 4183674 runs, 10630 skips0 drop=0
>>>
>>> size 1024
>> [...]
>>> 3250 dezicycles in ivquant, 4191225 runs, 3079 skips=0 drop=0
>>>
>>> size 2048
>> [...]
>>> 3109 dezicycles in ivquant, 4193283 runs, 1021 skips=0 drop=0
>>
>> From these numbers a table size of 1024 seems to be the lowest acceptable.
>> I guess the 4kb space wont matter compared to the speed loss a small table
>> would cause with such files.
>>
>>
>>>
>>> > [...]
>>> >> +    for (i = 1; i < IVQUANT_SIZE/2; i++) {
>>> >> +        ac->ivquant_tab[IVQUANT_SIZE/2 - 1 + i] =  pow(i, 4./3);
>>> >> +        ac->ivquant_tab[IVQUANT_SIZE/2 - 1 - i] =
>>> >> -ac->ivquant_tab[IVQUANT_SIZE/2 - 1 + i];
>>> >> +    }
>>> >
>>> > cant that be simplified with pow(fabs(i), 1./3) * i as well?
>
> Isn't i^(1/3) actually  cube root? There is C99 math.h function cbrt()
> that calculates it, it may be a little faster.

A good point.

> BTW, if results are floats, why function that operate on doubles are used.
> (vs. fabsf, powf etc..).

Another good point.

current:
3237 dezicycles in ivquant, 4191580 runs, 2724 skips=0 drop=0

cbrt:
3169 dezicycles in ivquant, 4193238 runs, 1066 skips=0 drop=0

float funcs without cbrt:
3119 dezicycles in ivquant, 4193791 runs, 513 skipsp=0 drop=0

float funcs with cbrtf:
3070 dezicycles in ivquant, 4194193 runs, 111 skipsp=0 drop=0

I'm not sure if it's the best method of testing but I decoded the file
to pcm_s16le using faad and the code using float funcs and cbrtf and
compared them using tiny_psnr and:

stddev:  0.01 PSNR:136.17 bytes:232734720

Shall I commit with the alterations suggested (table size 1024,
explicit casts to unsigned int where necessary) plus use of cbrtf and
fabsf for these functions?

Shall I also go through the other math calls that are using the double
precision functions and change them to the float functions?

Rob