[FFmpeg-devel] Fixpoint FFT optimization, with MDCT and IMDCT wrappers for audio optimization
Trent Piepho
xyzzy
Sun Jul 29 11:21:51 CEST 2007
On Sun, 29 Jul 2007, Loren Merritt wrote:
> On Sat, 28 Jul 2007, Marc Hoffman wrote:
> > On 7/27/07, Michael Niedermayer <michaelni at gmx.at> wrote:
> >> On Fri, Jul 27, 2007 at 09:56:34PM -0400, Marc Hoffman wrote:
> >>> On 7/27/07, Michael Niedermayer <michaelni at gmx.at> wrote:
> >>>> On Fri, Jul 27, 2007 at 05:40:14PM -0400, mmh wrote:
> >>>> [...]
> >>>>> +/*
> >>>>> + This is a fixedpoint inplace 32bit FFT which accepts 3 arguments:
> >>>>> +
> >>>>> + @param X - input signal in format 1.31
> >>>>> + @param W - phase factors in 1.31 format
> >>>>> + @param lgN - log_2(N) where N is the size of the input data set.
> >>>>> +
> >>>>> +*/
> >>>>> +void ff_fft32 (FFTContext *c, FFTComplex32 *X)
> >>>>> +{
> >>>>> + FFTComplex16 *W = c->exptab16;
> >>>>> + int lgN = c->nbits;
> >>>>> + int N = 1<<lgN;
> >>>>> + int s, j, k, m, w, ws;
> >>>>> + int64_t wwr,wwi;
> >>>>> +
> >>>>> + for (s=1; s<=lgN; s++) {
> >>>>> + m = 1<<s;
> >>>>> + ws = 1<<(lgN-s);
> >>>>> + w=0;
> >>>>> + for (j=0; j<m/2; j++) {
> >>>>> + wwr=W[w].re;
> >>>>> + wwi=W[w].im;
> >>>>> + w+=ws;
> >>>>> + for (k=j; k<N; k+=m) {
> >>>>> + long tr,ti;
> >>>>> + int k2 = k+m/2;
> >>>>
> >>>> signed and /2 is slow either changed to unsiged or use >>1
> >>>>
> >>> Compiler should take care of this via strength reduction and to boot
> >>> its a constant in the loop. I think m/2 is ok but will change if you
> >>> want me too semantics are identical.
> >>
> >> you put a lot of trust in the compiler, this is not a simple optimization
> >> it has to proof m can not reach values where m>>1 != m/2
> >
> > Come on this is trivial...
>
> Nonetheless, gcc 4.1.2 fails to make this optimization, even in cases much
> more trivial than yours.
For signed values, it's not a valid optimization, unless gcc is somehow
supposed to know that it can't be negative. Change it to unsigned and gcc
will use shifting.
I'd be tempted to try lgN-- before the first loop and change s to start at
0. ws will be the same, but m will be 1/2 its previous value. Now all the
m/2 can be changed to just m, and k+=m becomes k+=m+m (which can be
calculated in a single lea instruction, e.g. "lea (%[k],%[m],2), %[k]").
Rather than worry about gcc turning the division into a shift, just get rid
of the operation entirely.
More information about the ffmpeg-devel
mailing list