[FFmpeg-devel] [PATCH] split-radix FFT

Loren Merritt lorenm
Tue Jul 29 08:22:59 CEST 2008


AOn Tue, 29 Jul 2008, Michael Niedermayer wrote:
> On Fri, Jul 25, 2008 at 08:14:00PM -0600, Loren Merritt wrote:
>
>> +#ifdef EMULATE_3DNOWEXT
>> +#define PSWAPD(s,d)\
>> +    "movq "#s","#d"\n"\
>> +    "psrlq $32,"#d"\n"\
>> +    "punpckldq "#s","#d"\n"
>
>> +#define PSWAPD_UNARY(s)\
>> +    "sub $8, %%"REG_SP"\n"\
>> +    "movd "#s", 4(%%"REG_SP")\n"\
>> +    "punpckhdq (%%"REG_SP"), "#s"\n"\
>> +    "add $8, %%"REG_SP"\n"
>
> Gcc failed with a "+m" ?

No, I just designed the 3dn1 emulation of 3dn2 for simplicity (including 
code locality) rather than speed. I wouldn't have written it at all 
except that then I wouldn't be able to delete the radix-2 init code. 
(I still can't delete it until someone ports split-radix to altivec, 
but I assume that'll happen.)

>> +static void fft4(FFTComplex *z)
>>  {
>> -    int ln = s->nbits;
>> -    long j;
>> -    x86_reg i;
>> -    long nblocks, nloops;
>> -    FFTComplex *p, *cptr;
>> +    T2(z[0], z[1], %%mm0, %%mm1);
>> +    LOAD(z[2], %%mm2);
>> +    LOAD(z[3], %%mm3);
>> +    T4(%%mm0, %%mm1, %%mm2, %%mm3, %%mm4, %%mm5);
>> +    PUNPCK(%%mm0, %%mm1, %%mm4);
>> +    PUNPCK(%%mm2, %%mm3, %%mm5);
>> +    SAVE(z[0], %%mm0);
>> +    SAVE(z[1], %%mm4);
>> +    SAVE(z[2], %%mm2);
>> +    SAVE(z[3], %%mm5);
>> +}
>
> is there any reason why seperate asm() are chained? I think a single
> asm block, or even nasm/yasm if you prefer would be better.

Because it works for me, and I don't see any alternatives that are as 
concise.
yasm, ok.

> The way its written is almost asking for gcc to put something in between,
> iam especially concerned about the -fPIC case and gcc putting all the GOT
> "magic" in between the asms ...

Is gcc so stupid as to emit GOT stuff when dereferencing a pointer that's 
already in a register, no global variables involved?

--Loren Merritt




More information about the ffmpeg-devel mailing list