[Ffmpeg-devel] [PATCH] SSE counterpart of ff_imdct_calc_3dn2
Loren Merritt
lorenm
Mon Aug 21 03:00:49 CEST 2006
On Sun, 20 Aug 2006, Zuxy Meng wrote:
> The patch is simply a re-write of Loren's recent work. fft-test shows
> a speed-up around 18%~20% in my Pentium M 2G, not very exciting but
> faster indeed. Please kindly take a review.
> + /* XXX: Could be vectorized, but can't do better than the compiler */
> + for(k = 0; k < n8; k++) {
> + output[2*k] = -z[k].im;
> + output[n2 - 1 - 2*k] = z[k].im;
> + output[2*k + 1] = z[-k - 1].re;
> + output[n2 - 2 - 2*k] = -z[-k - 1].re;
> + output[n2 + 2*k] = -z[k].re;
> + output[n - 1 - 2*k] = -z[k].re;
> + output[n2 + 2*k + 1] = z[-k - 1].im;
> + output[n - 2 - 2*k] = z[-k - 1].im;
> + }
If you can't make an sse version that's faster than C, have you tried mmx?
Just take the one from 3dn2 and change pswapd to pshufw.
--Loren Merritt
More information about the ffmpeg-devel
mailing list