[Ffmpeg-devel] a little optim for a SSE version of H263_LOOP_FILTER

Michael Niedermayer michaelni
Sun Nov 5 19:20:35 CET 2006


Hi

On Sun, Nov 05, 2006 at 07:06:28PM +0100, Guillaume POIRIER wrote:
> Hi,
> 
> On 11/5/06, Michael Niedermayer <michaelni at gmx.at> wrote:
> >Hi
> >
> >On Sun, Nov 05, 2006 at 04:50:10PM +0100, Guillaume POIRIER wrote:
> 
> [...]
> 
> >> Note that movq is very slow on P4, so any code that removes
> >> mov(q|dqu|..) provides an interesting speed-up.
> >
> >why dont you try to replace all reg, reg movq by pshufw? if theres a
> >speed up then we could make movq a macro which expends depending on
> >cpu type to movq or pshufw $11100100b, ...
> 
> P4 optimization manual actually advises to try to use shuffle
> operations instead of mov between vector regs.
> 
> However, unconditionally replacing movs by shuffles won't work. mov*
> use FP_MOV unit, whereas *shuf* uses MMX_SHIFT unit, which is  (see
> the diagram here: http://www.tommesani.com/P4MMX.html )
> 
> That means that you'd put pressure in FP_EXECUTE unit, on port 1 of
> the micro-arch, whereas FP_MOV is hooked-up to port 0....
> 
> Per my understanding, if FP_EXECUTE is not too crowded, you could gain
> from using shuffle operation, but only in that case.

this is not entirely true, if the following instructions depend on the
destinaton register of the movq then they will be delayed by the latency
so even if your code uses only port 1 it can still benefit from pshufw

also there might be a few
movq mem, reg1
movq reg1, reg2

in the code these could be replaced by
movq mem, reg1
movq mem, reg2

should be easy to try if anyone has a P4 and is bored


> It's sufficiently uneasy to guess when this or that unit is used in a
> massive OOO CPU such as the P4 that I'm just reluctant to spend much
> time trying to see what works best.
> Moreover, it would only work on P4, which is the only cpu in x86 world
> that has such peculiar instruction latencies.
> 
> On top of that, I don't even own a P4 ;-)

neither do i and iam happy about that ;)

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

In the past you could go to a library and read, borrow or copy any book
Today you'd get arrested for mere telling someone where the library is




More information about the ffmpeg-devel mailing list