[FFmpeg-devel] [PATCH 2/2] mpegaudiodec: add SSE-optimized imdct36()

Vitor Sessak vitor1001 at gmail.com
Sat Aug 27 21:42:21 CEST 2011


On Sat, Aug 27, 2011 at 7:33 AM, Loren Merritt <lorenm at u.washington.edu> wrote:
>> On Sun, Aug 21, 2011 at 04:53:19PM +0200, Vitor Sessak wrote:
>
>> %macro BUTTERF 3
>>     movhlps %2, %1
>>     movlhps %2, %1
>
> pshufd would reduce number of uops, although I haven't checked what it
> would do to number of uops on the bottlenecked execution unit(s) or
> latency.

I was doing a SSE1 version. New patch attached with SSE1+SSE2+SSE3+AVX.

>>     xorps  %2, [ps_p1p1m1m1]
>
> Can you xorps %1 instead to reduce dependency chain?

done.

>>     addps  %1, %2
>>     mulps  %1, %3
>>     mova   %2, %1
>>     shufps %1, %1, 0xb1
>
> pshufd again
>
>>     xorps  %2, [ps_p1m1p1m1]
>>     addps  %1, %2
>> %endmacro
>
>> %macro SWAP_64BITS 2
>> %ifdef ARCH_X86_64
>>    SWAP %1, %2
>> %endif
>> %endmacro
>
> What good is this doing? There's no %else, so the code must also work
> (with no extra instructions) if you don't swap...?

I was hoping that swapping the temp variable in code like

mova m5, m0
addps m5, m1
mulps m2, m5

SWAP_64BITS m5, m10

mova m5, m3
addps m5, m6
mulps m7, m5

would allow a x32_64 CPU to use out-of-order execution to interleave
the two blocks of instructions in any order.

> A bunch of mova (maybe all of them) could be eliminated in avx.

Done.

> On Sat, 27 Aug 2011, Michael Niedermayer wrote:
>
>> The main optimization i see is to interleave a few blocks so as to
>> simplify the shuffling of data
>
> Agreed.

I'm also attaching a pseudo-SIMD C version of the code. I've done my
best to have the minimum number of shuffles, but suggestions are
welcome.

Thanks Michael and Loren for the review.

-Vitor
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-mpegaudiodec-add-SSE-optimized-imdct36.patch
Type: text/x-patch
Size: 12421 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20110827/a7e718fe/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: b.c
Type: text/x-csrc
Size: 5555 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20110827/a7e718fe/attachment-0001.bin>


More information about the ffmpeg-devel mailing list