[FFmpeg-devel] [PATCH] split-radix FFT
Loren Merritt
lorenm
Tue Jul 29 08:22:59 CEST 2008
AOn Tue, 29 Jul 2008, Michael Niedermayer wrote:
> On Fri, Jul 25, 2008 at 08:14:00PM -0600, Loren Merritt wrote:
>
>> +#ifdef EMULATE_3DNOWEXT
>> +#define PSWAPD(s,d)\
>> + "movq "#s","#d"\n"\
>> + "psrlq $32,"#d"\n"\
>> + "punpckldq "#s","#d"\n"
>
>> +#define PSWAPD_UNARY(s)\
>> + "sub $8, %%"REG_SP"\n"\
>> + "movd "#s", 4(%%"REG_SP")\n"\
>> + "punpckhdq (%%"REG_SP"), "#s"\n"\
>> + "add $8, %%"REG_SP"\n"
>
> Gcc failed with a "+m" ?
No, I just designed the 3dn1 emulation of 3dn2 for simplicity (including
code locality) rather than speed. I wouldn't have written it at all
except that then I wouldn't be able to delete the radix-2 init code.
(I still can't delete it until someone ports split-radix to altivec,
but I assume that'll happen.)
>> +static void fft4(FFTComplex *z)
>> {
>> - int ln = s->nbits;
>> - long j;
>> - x86_reg i;
>> - long nblocks, nloops;
>> - FFTComplex *p, *cptr;
>> + T2(z[0], z[1], %%mm0, %%mm1);
>> + LOAD(z[2], %%mm2);
>> + LOAD(z[3], %%mm3);
>> + T4(%%mm0, %%mm1, %%mm2, %%mm3, %%mm4, %%mm5);
>> + PUNPCK(%%mm0, %%mm1, %%mm4);
>> + PUNPCK(%%mm2, %%mm3, %%mm5);
>> + SAVE(z[0], %%mm0);
>> + SAVE(z[1], %%mm4);
>> + SAVE(z[2], %%mm2);
>> + SAVE(z[3], %%mm5);
>> +}
>
> is there any reason why seperate asm() are chained? I think a single
> asm block, or even nasm/yasm if you prefer would be better.
Because it works for me, and I don't see any alternatives that are as
concise.
yasm, ok.
> The way its written is almost asking for gcc to put something in between,
> iam especially concerned about the -fPIC case and gcc putting all the GOT
> "magic" in between the asms ...
Is gcc so stupid as to emit GOT stuff when dereferencing a pointer that's
already in a register, no global variables involved?
--Loren Merritt
More information about the ffmpeg-devel
mailing list