[FFmpeg-devel] [PATCH] AAC decoder

Sun May 25 21:42:58 CEST 2008

2008/5/25 Michael Niedermayer <michaelni at gmx.at>:
> On Sun, May 25, 2008 at 07:27:31PM +0100, Robert Swain wrote:
>> 2008/5/25 Ivan Kalvachev <ikalvachev at gmail.com>:
>> > On 5/25/08, Michael Niedermayer <michaelni at gmx.at> wrote:
>> >> On Sun, May 25, 2008 at 02:55:07PM +0100, Robert Swain wrote:
>> >>> 2008/5/24 Michael Niedermayer <michaelni at gmx.at>:
>> >>> > On Sat, May 24, 2008 at 06:35:37PM +0100, Robert Swain wrote:
>> >>> >> 2008/5/23 Michael Niedermayer <michaelni at gmx.at>:
>> >>> >> > On Fri, May 23, 2008 at 01:59:41PM +0100, Robert Swain wrote:
>> >>> >> >> Index: aac.c
>> >>> >> >> ===================================================================
>> >>> >> >> --- aac.c     (revision 2185)
>> >>> >> >> +++ aac.c     (working copy)
>> >>> >> >> @@ -366,7 +366,7 @@
>> >>> >> >>      DECLARE_ALIGNED_16(float, sine_short_128[128]);
>> >>> >> >>      DECLARE_ALIGNED_16(float, pow2sf_tab[256]);
>> >>> >> >>      DECLARE_ALIGNED_16(float, intensity_tab[256]);
>> >>> >> >> -    DECLARE_ALIGNED_16(float, ivquant_tab[256]);
>> >>> >> >> +    DECLARE_ALIGNED_16(float, ivquant_tab[128]);
>> >>> >> >>      MDCTContext mdct;
>> >>> >> >>      MDCTContext mdct_small;
>> >>> >> >>      MDCTContext *mdct_ltp;
>> >>> >> >> @@ -890,8 +890,11 @@
>> >>> >> >>      // BIAS method instead needs values -1<x<1
>> >>> >> >>      for (i = 0; i < 256; i++)
>> >>> >> >>          ac->intensity_tab[i] = pow(0.5, (i - 100) / 4.);
>> >>> >> >> -    for (i = 0; i <
>> >>> >> >> sizeof(ac->ivquant_tab)/sizeof(ac->ivquant_tab[0]); i++)
>> >>> >> >> -        ac->ivquant_tab[i] = pow(i, 4./3);
>> >>> >> >> +    for (i = 0; i <
>> >>> >> >> sizeof(ac->ivquant_tab)/(sizeof(ac->ivquant_tab[0])<<1); i++) {
>> >>> >> >> +        int idx = i<<1;
>> >>> >> >> +        ac->ivquant_tab[idx]     =  pow(i, 4./3);
>> >>> >> >> +        ac->ivquant_tab[idx + 1] = -ac->ivquant_tab[idx];
>> >>> >> >> +    }
>> >>> >> >>
>> >>> >> >>      if(ac->dsp.float_to_int16 == ff_float_to_int16_c) {
>> >>> >> >>          ac->add_bias = 385.0f;
>> >>> >> >
>> >>> >> >> @@ -1035,13 +1038,12 @@
>> >>> >> >>  }
>> >>> >> >>
>> >>> >> >>  static inline float ivquant(AACContext * ac, int a) {
>> >>> >> >
>> >>> >> >> -    static const float sign[2] = { -1., 1. };
>> >>> >> >>      int tmp = (a>>31);
>> >>> >> >>      int abs_a = (a^tmp)-tmp;
>> >>> >> >> -    if (abs_a < sizeof(ac->ivquant_tab)/sizeof(ac->ivquant_tab[0]))
>> >>> >> >> -        return sign[tmp+1] * ac->ivquant_tab[abs_a];
>> >>> >> >> +    if (abs_a <
>> >>> >> >> sizeof(ac->ivquant_tab)/(sizeof(ac->ivquant_tab[0])<<1))
>> >>> >> >> +        return ac->ivquant_tab[(abs_a<<1) + !!tmp];
>> >>> >> >
>> >>> >> > ehh... this should be:
>> >>> >> >
>> >>> >> > if(a + 127U < 255U)
>> >>> >> >    return ivquant_tab[a + 127U];
>> >>> >> >
>> >>> >> > (or other constants depending on what table size is best ...)
>> >>> >> >
>> >>> >> >
>> >>> >> >>      else
>> >>> >> >> -        return sign[tmp+1] * pow(abs_a, 4./3);
>> >>> >> >> +        return (2 * tmp + 1) * pow(abs_a, 4./3);
>> >>> >> >
>> >>> >> > pow(fabs(a), 1./3) * a;
>> >>> >>
>> >>> >> With those suggestions it is much faster. The alternating sign
>> >>> >> construction for the table wasn't my idea, but I won't name names. :)
>> >>> >> Anyway, see attached. Benchmarks on the same FAAC encoded South Park
>> >>> >> episode:
>> >>> >>
>> >>> >> old size 256
>> >>> > [...]
>> >>> >> 3956 dezicycles in ivquant, 2096816 runs, 336 skipsup=0 drop=0
>> >>> >>
>> >>> >> new size 8
>> >>> > [...]
>> >>> >> 4840 dezicycles in ivquant, 2066668 runs, 30484 skips=0 drop=0
>> >>> >>
>> >>> >> new size 16
>> >>> > [...]
>> >>> >> 3650 dezicycles in ivquant, 2093424 runs, 3728 skipsp=0 drop=0
>> >>> >>
>> >>> >> new size 32
>> >>> > [...]
>> >>> >> 3438 dezicycles in ivquant, 2096888 runs, 264 skipsup=0 drop=0
>> >>> >>
>> >>> >> new size 64
>> >>> > [...]
>> >>> >> 3447 dezicycles in ivquant, 2096915 runs, 237 skipsup=0 drop=0
>> >>> >>
>> >>> >> new size 128
>> >>> > [...]
>> >>> >> 3431 dezicycles in ivquant, 2096918 runs, 234 skipsup=0 drop=0
>> >>> >>
>> >>> >> new size 256
>> >>> > [...]
>> >>> >> 3431 dezicycles in ivquant, 2096953 runs, 199 skipsup=0 drop=0
>> >>> >>
>> >>> >> new size 512
>> >>> > [...]
>> >>> >> 3438 dezicycles in ivquant, 2097093 runs, 59 skipsdup=0 drop=0
>> >>> >>
>> >>> >> It looks to me like there's little difference in performance when the
>> >>> >> table is of size 32 or larger. Should I use size 32?
>> >>> >
>> >>> > From the numbers i see, yes 32 seems the best choice.
>> >>> >
>> >>> > What bitrate did your test file have? High bitrate files might be faster
>> >>> > with larger tables, so if it was low bitrate then it might be worth
>> >>> > retrying
>> >>> > with some higher bitrate.
>> >>>
>> >>> Same audio source but encoded to 320kbps with QuickTime.
>> >>
>> >>> I included
>> >>> the full listings as some table sizes seem to behave strangely based
>> >>> on the number of calls.
>> >>
>> >> Effects of the skipping of (some) pow() i assume ...
>> >>
>> >>
>> >>>
>> >>> size 32
>> >> [...]
>> >>> 16429 dezicycles in ivquant, 4169262 runs, 25042 skips drop=0
>> >>>
>> >>> size 64
>> >> [...]
>> >>> 11718 dezicycles in ivquant, 4147408 runs, 46896 skips drop=0
>> >>>
>> >>> size 128
>> >> [...]
>> >>> 7687 dezicycles in ivquant, 4148129 runs, 46175 skips0 drop=0
>> >>>
>> >>> size 256
>> >> [...]
>> >>> 5174 dezicycles in ivquant, 4166995 runs, 27309 skips0 drop=0
>> >>>
>> >>> size 512
>> >> [...]
>> >>> 3826 dezicycles in ivquant, 4183674 runs, 10630 skips0 drop=0
>> >>>
>> >>> size 1024
>> >> [...]
>> >>> 3250 dezicycles in ivquant, 4191225 runs, 3079 skips=0 drop=0
>> >>>
>> >>> size 2048
>> >> [...]
>> >>> 3109 dezicycles in ivquant, 4193283 runs, 1021 skips=0 drop=0
>> >>
>> >> From these numbers a table size of 1024 seems to be the lowest acceptable.
>> >> I guess the 4kb space wont matter compared to the speed loss a small table
>> >> would cause with such files.
>> >>
>> >>
>> >>>
>> >>> > [...]
>> >>> >> +    for (i = 1; i < IVQUANT_SIZE/2; i++) {
>> >>> >> +        ac->ivquant_tab[IVQUANT_SIZE/2 - 1 + i] =  pow(i, 4./3);
>> >>> >> +        ac->ivquant_tab[IVQUANT_SIZE/2 - 1 - i] =
>> >>> >> -ac->ivquant_tab[IVQUANT_SIZE/2 - 1 + i];
>> >>> >> +    }
>> >>> >
>> >>> > cant that be simplified with pow(fabs(i), 1./3) * i as well?
>> >
>> > Isn't i^(1/3) actually  cube root? There is C99 math.h function cbrt()
>> > that calculates it, it may be a little faster.
>>
>> A good point.
>>
>> > BTW, if results are floats, why function that operate on doubles are used.
>> > (vs. fabsf, powf etc..).
>>
>> Another good point.
>>
>>
>> current:
>> 3237 dezicycles in ivquant, 4191580 runs, 2724 skips=0 drop=0
>>
>> cbrt:
>> 3169 dezicycles in ivquant, 4193238 runs, 1066 skips=0 drop=0
>>
>> float funcs without cbrt:
>> 3119 dezicycles in ivquant, 4193791 runs, 513 skipsp=0 drop=0
>>
>> float funcs with cbrtf:
>> 3070 dezicycles in ivquant, 4194193 runs, 111 skipsp=0 drop=0
>>
>>
>> I'm not sure if it's the best method of testing but I decoded the file
>> to pcm_s16le using faad and the code using float funcs and cbrtf and
>> compared them using tiny_psnr and:
>>
>> stddev:  0.01 PSNR:136.17 bytes:232734720
>>
>> Shall I commit with the alterations suggested (table size 1024,
>> explicit casts to unsigned int where necessary) plus use of cbrtf and
>> fabsf for these functions?
>
> yes

Committed using doubles in init and floats in ivquant()

>> Shall I also go through the other math calls that are using the double
>> precision functions and change them to the float functions?
>
> where it makes a difference for speed, yes. The init code can keep using
> doubles ...

I'll put this on the todo and leave it for the moment as I'd rather
spend time getting the code into SVN and working on SBR and PS than
optimising. We can optimise it more later.

Rob