[FFmpeg-devel] [PATCH 2/2] mpegaudiodec: add SSE-optimized imdct36()
Vitor Sessak
vitor1001 at gmail.com
Sat Aug 27 21:42:21 CEST 2011
On Sat, Aug 27, 2011 at 7:33 AM, Loren Merritt <lorenm at u.washington.edu> wrote:
>> On Sun, Aug 21, 2011 at 04:53:19PM +0200, Vitor Sessak wrote:
>
>> %macro BUTTERF 3
>> movhlps %2, %1
>> movlhps %2, %1
>
> pshufd would reduce number of uops, although I haven't checked what it
> would do to number of uops on the bottlenecked execution unit(s) or
> latency.
I was doing a SSE1 version. New patch attached with SSE1+SSE2+SSE3+AVX.
>> xorps %2, [ps_p1p1m1m1]
>
> Can you xorps %1 instead to reduce dependency chain?
done.
>> addps %1, %2
>> mulps %1, %3
>> mova %2, %1
>> shufps %1, %1, 0xb1
>
> pshufd again
>
>> xorps %2, [ps_p1m1p1m1]
>> addps %1, %2
>> %endmacro
>
>> %macro SWAP_64BITS 2
>> %ifdef ARCH_X86_64
>> SWAP %1, %2
>> %endif
>> %endmacro
>
> What good is this doing? There's no %else, so the code must also work
> (with no extra instructions) if you don't swap...?
I was hoping that swapping the temp variable in code like
mova m5, m0
addps m5, m1
mulps m2, m5
SWAP_64BITS m5, m10
mova m5, m3
addps m5, m6
mulps m7, m5
would allow a x32_64 CPU to use out-of-order execution to interleave
the two blocks of instructions in any order.
> A bunch of mova (maybe all of them) could be eliminated in avx.
Done.
> On Sat, 27 Aug 2011, Michael Niedermayer wrote:
>
>> The main optimization i see is to interleave a few blocks so as to
>> simplify the shuffling of data
>
> Agreed.
I'm also attaching a pseudo-SIMD C version of the code. I've done my
best to have the minimum number of shuffles, but suggestions are
welcome.
Thanks Michael and Loren for the review.
-Vitor
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-mpegaudiodec-add-SSE-optimized-imdct36.patch
Type: text/x-patch
Size: 12421 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20110827/a7e718fe/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: b.c
Type: text/x-csrc
Size: 5555 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20110827/a7e718fe/attachment-0001.bin>
More information about the ffmpeg-devel
mailing list