[FFmpeg-devel] [PATCH 2/2] mpegaudiodec: add SSE-optimized imdct36()

Loren Merritt lorenm at u.washington.edu
Sat Aug 27 07:33:56 CEST 2011


> On Sun, Aug 21, 2011 at 04:53:19PM +0200, Vitor Sessak wrote:

> %macro BUTTERF 3
>     movhlps %2, %1
>     movlhps %2, %1

pshufd would reduce number of uops, although I haven't checked what it
would do to number of uops on the bottlenecked execution unit(s) or
latency.

>     xorps  %2, [ps_p1p1m1m1]

Can you xorps %1 instead to reduce dependency chain?

>     addps  %1, %2
>     mulps  %1, %3
>     mova   %2, %1
>     shufps %1, %1, 0xb1

pshufd again

>     xorps  %2, [ps_p1m1p1m1]
>     addps  %1, %2
> %endmacro

> %macro SWAP_64BITS 2
> %ifdef ARCH_X86_64
>    SWAP %1, %2
> %endif
> %endmacro

What good is this doing? There's no %else, so the code must also work
(with no extra instructions) if you don't swap...?

A bunch of mova (maybe all of them) could be eliminated in avx.


On Sat, 27 Aug 2011, Michael Niedermayer wrote:

> The main optimization i see is to interleave a few blocks so as to
> simplify the shuffling of data

Agreed.

--Loren Merritt


More information about the ffmpeg-devel mailing list