[FFmpeg-devel] [PATCH] split-radix FFT
Michael Niedermayer
michaelni
Sat Aug 9 19:16:31 CEST 2008
On Thu, Aug 07, 2008 at 08:22:35PM -0600, Loren Merritt wrote:
> On Thu, 7 Aug 2008, Michael Niedermayer wrote:
> >
> > iam not sure if its worth it to simplify this, but i think if we dont attempt
> > to mask of the high bits inside the function then the following might work:
> >
> > if(!(i & m)) return split_radix_permutation(i, m, inverse)<<1;
> > m >>= 1;
> > if(inverse == !(i&m)) return (split_radix_permutation(i, m, inverse)<<2) + 1;
> > else return (split_radix_permutation(i, m, inverse)<<2) - 1;
>
> done
>
> > s->revtab[(-split_radix_permutation(i, n, s->inverse)) & (n-1)] = i;
>
> done
>
> > It would be nice if the forced duplication could be limited to
> > #ifndef CONFIG_SMALL unless its significantly slower that way
>
> I tried several combinations of recursive fft##n and/or re-rolling
> pass{,_big} and/or re-rolling fft16 and/or removing pass or pass_big.
> I can make it smaller and retain speed on core2 or prescott, but not both
> cpus at once.
> k8 is equally happy with any version.
>
> 2^4 2^5 2^6 2^7 2^8 2^9 2^10 2^11 2^12 code_size
> penryn:
> 142 417 1120 2837 6589 14935 33433 74609 164273 fft.00 4070
> 142 418 1132 2863 6662 15108 33844 74712 165418 fft.11 3189
> 142 417 1120 2838 6590 14938 46809 114069 282947 fft.10 3133
> 142 462 1231 3011 6982 15769 35297 78270 170920 fft.05 2572
> 142 462 1194 2997 6947 15780 48557 117461 289381 fft.01 2516
> 175 516 1396 3338 7673 17166 51432 123494 301169 fft.03 1652
> 180 542 1411 3414 7853 17452 51895 124489 304666 fft.04 1175
>
> prescott:
> 423 1122 2854 7044 16366 37274 84451 187963 418948 fft.10 2414
> 423 1120 2855 7056 16390 37437 87674 196322 442723 fft.00 3176
> 420 1162 2972 7082 16693 38034 85973 189885 421885 fft.01 1745
> 466 1235 3149 7451 17410 39395 89301 202842 447159 fft.03 1162
> 472 1209 3130 7543 17438 40310 91024 206670 456248 fft.04 830
> 425 1227 3217 8032 18968 43605 98880 219511 487624 fft.11 2532
> 421 1286 3316 8082 19250 44563 99940 223647 495350 fft.05 1872
>
> .00 is the previous patch, all compiled with -Os
> fft.10 (that's removing pass_big) might be a decent compromise if you
> don't care about a huge speed regression in cases that aren't currently
> used by any audio codec.
Pick what you like best, speed on x86 probably does not matter too much
for the CONFIG_SMALL case. Its more usefull for devices with ARM
and rather little storage.
The non CONFIG_SMALL wouldnt be affected by any changes anyway if
i understand correctly ...
>
> >> + int n = 1<<s->nbits;
> >> + int i;
> >> + ff_fft_dispatch_3dn2(z, s->nbits);
> >> asm volatile("femms");
> >> + for(i=0; i<n; i+=2)
> >> + FFSWAP(FFTSample, z[i].im, z[i+1].re);
> >> }
> >
> > could you elaborate on why this FFSWAP pass is needed?
>
> Intermediate results are not arrays of complex numbers, but rather group
> reals and imaginaries into blocks according to the simd register size. I
> suppose I could merge the swap pass into the last fft pass, like I did for
> sse.
If the swaping could be done (nearly) for free in the last pass that would
be great. If OTOH it would slow down the IMDCT it would probably be better
to leave it as is as we dont really need a FFT anyway. But for others who
might want to borrow out fft it surely would be nicer if no extra swaping
would be needed.
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Thouse who are best at talking, realize last or never when they are wrong.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080809/42d3f39a/attachment.pgp>
More information about the ffmpeg-devel
mailing list