[Ffmpeg-devel] [PATCH] SSE counterpart of ff_imdct_calc_3dn2
Rich Felker
dalias
Thu Aug 24 20:25:06 CEST 2006
On Thu, Aug 24, 2006 at 10:50:41AM -0700, Loren Merritt wrote:
> On Thu, 24 Aug 2006, Luca Barbato wrote:
>
> >Loren Merritt wrote:
> >>On Thu, 24 Aug 2006, Luca Barbato wrote:
> >>
> >>>Zuxy Meng wrote:
> >>>
> >>>>+ n = 1 << s->nbits;
> >>>>+ n8 = n >> 3;
> >>>[...]
> >>>>+ z += n8;
> >>>[...]
> >>>>+ for(k = 0; k < n8; k += 2) {
> >>>[...]
> >>>>+ asm (
> >>>>+ "movaps %4, %%xmm0 \n\t" // xmm0 = 0 1 2 3
> >>>>+ "movaps %5, %%xmm1 \n\t" // xmm1 = 4 5 6 7
> >>>[...]
> >>>>+ :"m"(z[k]), "m"(z[-2 - k])
> >>>
> >>>I'm missing something or it could be unaligned?
> >>>z is 8 byte not 16.
> >>
> >>The array index is even.
> >I know
> >
> >>In order for n8 to be odd you'd need an 8
> >>element fft.
> >
> >I need an odd multiple of 8
>
> But fft size can only be a power of 2.
Strictly speaking fft can be done with any number but as the prime
factors get larger the efficiency becomes rather poor, with the worst
case being large prime sizes. Of course you need a very different
implementation to support sizes that are not powers of two and very
few people are interested in the "not power of two" case.
Rich
More information about the ffmpeg-devel
mailing list