[Ffmpeg-devel] [PATCH] SSE counterpart of ff_imdct_calc_3dn2

Loren Merritt lorenm
Mon Aug 21 03:00:49 CEST 2006


On Sun, 20 Aug 2006, Zuxy Meng wrote:

> The patch is simply a re-write of Loren's recent work. fft-test shows
> a speed-up around 18%~20% in my Pentium M 2G, not very exciting but
> faster indeed. Please kindly take a review.

> + /* XXX: Could be vectorized, but can't do better than the compiler */
> + for(k = 0; k < n8; k++) {
> +     output[2*k] = -z[k].im;
> +     output[n2 - 1 - 2*k] = z[k].im;
> +     output[2*k + 1] = z[-k - 1].re;
> +     output[n2 - 2 - 2*k] = -z[-k - 1].re;
> +     output[n2 + 2*k] = -z[k].re;
> +     output[n - 1 - 2*k] = -z[k].re;
> +     output[n2 + 2*k + 1] = z[-k - 1].im;
> +     output[n - 2 - 2*k] = z[-k - 1].im;
> + }

If you can't make an sse version that's faster than C, have you tried mmx? 
Just take the one from 3dn2 and change pswapd to pshufw.

--Loren Merritt




More information about the ffmpeg-devel mailing list