[FFmpeg-devel] Fixpoint FFT optimization, with MDCT and IMDCT wrappers for audio optimization

Sun Jul 29 02:47:08 CEST 2007

On 7/27/07, Michael Niedermayer <michaelni at gmx.at> wrote:
> Hi
>
> On Fri, Jul 27, 2007 at 09:56:34PM -0400, Marc Hoffman wrote:
> > Hi,
> >
> > On 7/27/07, Michael Niedermayer <michaelni at gmx.at> wrote:
> > > Hi
> > >
> > > On Fri, Jul 27, 2007 at 05:40:14PM -0400, mmh wrote:
> > > [...]
> > > > +/*
> > > > +  This is a fixedpoint inplace 32bit FFT which accepts 3 arguments:
> > > > +
> > > > +  @param X   - input signal in format 1.31
> > > > +  @param W   - phase factors in 1.31 format
> > > > +  @param lgN - log_2(N) where N is the size of the input data set.
> > > > +
> > > > +*/
> > > > +void ff_fft32 (FFTContext *c, FFTComplex32 *X)
> > > > +{
> > > > +    FFTComplex16 *W = c->exptab16;
> > > > +    int lgN    = c->nbits;
> > > > +    int N      = 1<<lgN;
> > > > +    int s, j, k, m, w, ws;
> > > > +    int64_t wwr,wwi;
> > > > +
> > > > +    for (s=1; s<=lgN; s++) {
> > > > +        m  = 1<<s;
> > > > +        ws = 1<<(lgN-s);
> > > > +        w=0;
> > > > +        for (j=0; j<m/2; j++) {
> > > > +            wwr=W[w].re;
> > > > +            wwi=W[w].im;
> > > > +            w+=ws;
> > > > +            for (k=j; k<N; k+=m) {
> > > > +                long tr,ti;
> > > > +                int k2 = k+m/2;
> > >
> > > signed and /2 is slow either changed to unsiged or use >>1
> > >
> > Compiler should take care of this via strength reduction and to boot
> > its a constant in the loop. I think m/2 is ok but will change if you
> > want me too semantics are identical.
>
> you put a lot of trust in the compiler, this is not a simple optimization
> it has to proof m can not reach values where m>>1 != m/2
>

Come on this is trivial...

>
> >
> > >
> > > > +
> > > > +                tr        = (X[k2].re*wwr - X[k2].im*wwi + 0x4000)>>15;
> > > > +                ti        = (X[k2].re*wwi + X[k2].im*wwr + 0x4000)>>15;
> > > > +
> > > > +                X[k2].re  = (X[k].re - tr);
> > > > +                X[k2].im  = (X[k].im - ti);
> > > > +
> > > > +                X[k].re   = (X[k].re + tr);
> > > > +                X[k].im   = (X[k].im + ti);
> > > > +            }
> > > > +        }
> > > > +    }
> > > > +}
> > >
> > > what about implementing a split radix fft?
> > > (http://en.wikipedia.org/wiki/Split-radix_FFT)
> >
> > I'm thinking this is the begining of the whole fixedpoint thing.  My
> > real goal here is to get some of this working with an audio codec and
> > see what we really need to optimize.
>
> if you agree to implement a split radix fixedpoint & float fft later
> then ive no objections, otherwise i do
>

Sure when the time comes I will do this.

>
> >
> > >
> > > also several iterations have trivial (1/-1/0) w* these should not be done
> > > in this generic way
> > ?
>
> these things called twiddle factors, which are represented by wwr and wwi in
> your code have values like 1/-1/0, well 32768/-32768/0 as its fixedpoint
> in some iterations and its very lame to do a full complex multiplication
> for w being one of 1,-1,i,-i
>

no problem we can factor out the end stages.  I was thinking we would
start simple get one of the audio things working then optimize it.
There are a lot of things we can do this is just one of them.