[FFmpeg-devel] [PATCH] Dsputilize some functions from APE decode 2/2 - SSE2

Tue Jul 8 14:18:25 CEST 2008

On Tue, Jul 08, 2008 at 01:48:21PM +0300, Kostya wrote:
> On Tue, Jul 08, 2008 at 12:07:49AM +0200, Michael Niedermayer wrote:
> > On Mon, Jul 07, 2008 at 09:21:07PM +0300, Kostya wrote:
[...]
> > also the loop will likely significantly benefit from being unrolled once
> 
> len is declared as multiple of 8, and loop handles 8 elements 

is it ever used with %16 != 0 len ?

[...]
> > > +        "add     $16,    %0            \n\t"
> > > +        "add     $16,    %1            \n\t"
> > > +        "sub     $16,    %3            \n\t"
> > > +        "jnz     1b                    \n\t"
> > > +        "movd    %%xmm7, %2            \n\t"
> > > +        : "+r"(v1), "+r"(v2), "=r"(res), "+r"(order)
> > 
> > > +        : "m"(sh)
> > 
> > should be in a register
> 
> why? psrad takes either (x)mm register, immediate value or memory
> for input.

memory is likely slower than a register


[...]
> +static void add_int16_sse2(int16_t * v1, int16_t * v2, int order)
> +{
> +    x86_reg o = order - 8;
> +    asm volatile(
> +        "1:                           \n\t"
> +        "movdqu  (%1,%2,2), %%xmm0    \n\t"
> +        "paddw   (%0,%2,2), %%xmm0    \n\t"
> +        "movdqa  %%xmm0,    (%0,%2,2) \n\t"
> +        "sub     $8,        %2        \n\t"
> +        "jge     1b                   \n\t"
> +        : "+r"(v1), "+r"(v2), "+r"(o)
> +    );

accessing arrays from end to start is likely slower than start to end
(if you want it i want to see benchmarks of it against the equivalent
forward code)
also (a,b) might be faster than (a,b,2)
a add %2, %2 before the loop would avoid it


[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Complexity theory is the science of finding the exact solution to an
approximation. Benchmarking OTOH is finding an approximation of the exact
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080708/5a957f2f/attachment.pgp>