[FFmpeg-devel] Fixpoint FFT optimization, with MDCT and IMDCT wrappers for audio optimization
Loren Merritt
lorenm
Sun Jul 29 10:51:27 CEST 2007
On Sat, 28 Jul 2007, Marc Hoffman wrote:
> On 7/27/07, Michael Niedermayer <michaelni at gmx.at> wrote:
>> On Fri, Jul 27, 2007 at 09:56:34PM -0400, Marc Hoffman wrote:
>>> On 7/27/07, Michael Niedermayer <michaelni at gmx.at> wrote:
>>>> On Fri, Jul 27, 2007 at 05:40:14PM -0400, mmh wrote:
>>>> [...]
>>>>> +/*
>>>>> + This is a fixedpoint inplace 32bit FFT which accepts 3 arguments:
>>>>> +
>>>>> + @param X - input signal in format 1.31
>>>>> + @param W - phase factors in 1.31 format
>>>>> + @param lgN - log_2(N) where N is the size of the input data set.
>>>>> +
>>>>> +*/
>>>>> +void ff_fft32 (FFTContext *c, FFTComplex32 *X)
>>>>> +{
>>>>> + FFTComplex16 *W = c->exptab16;
>>>>> + int lgN = c->nbits;
>>>>> + int N = 1<<lgN;
>>>>> + int s, j, k, m, w, ws;
>>>>> + int64_t wwr,wwi;
>>>>> +
>>>>> + for (s=1; s<=lgN; s++) {
>>>>> + m = 1<<s;
>>>>> + ws = 1<<(lgN-s);
>>>>> + w=0;
>>>>> + for (j=0; j<m/2; j++) {
>>>>> + wwr=W[w].re;
>>>>> + wwi=W[w].im;
>>>>> + w+=ws;
>>>>> + for (k=j; k<N; k+=m) {
>>>>> + long tr,ti;
>>>>> + int k2 = k+m/2;
>>>>
>>>> signed and /2 is slow either changed to unsiged or use >>1
>>>>
>>> Compiler should take care of this via strength reduction and to boot
>>> its a constant in the loop. I think m/2 is ok but will change if you
>>> want me too semantics are identical.
>>
>> you put a lot of trust in the compiler, this is not a simple optimization
>> it has to proof m can not reach values where m>>1 != m/2
>
> Come on this is trivial...
Nonetheless, gcc 4.1.2 fails to make this optimization, even in cases much
more trivial than yours.
~> cat foo.c
void bar(int x);
void foo0(void) {
int i;
for(i=0; i<100; i++)
bar(i/2);
}
void foo1(void) {
int i;
for(i=0; i<100; i++)
bar(i>>1);
}
~> gcc -O3 -c foo.c -o foo.o
~> objdump -d foo.o
foo.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <foo0>:
0: 53 push %rbx
1: 31 db xor %ebx,%ebx
3: 89 df mov %ebx,%edi
5: c1 ef 1f shr $0x1f,%edi
8: 01 df add %ebx,%edi
a: ff c3 inc %ebx
c: d1 ff sar %edi
e: e8 00 00 00 00 callq 13 <foo0+0x13>
13: 83 fb 64 cmp $0x64,%ebx
16: 75 eb jne 3 <foo0+0x3>
18: 5b pop %rbx
19: c3 retq
0000000000000020 <foo1>:
20: 53 push %rbx
21: 31 db xor %ebx,%ebx
23: 89 df mov %ebx,%edi
25: ff c3 inc %ebx
27: d1 ff sar %edi
29: e8 00 00 00 00 callq 2e <foo1+0xe>
2e: 83 fb 64 cmp $0x64,%ebx
31: 75 f0 jne 23 <foo1+0x3>
33: 5b pop %rbx
34: c3 retq
More information about the ffmpeg-devel
mailing list