[FFmpeg-devel] [PATCH] get_pixels_sse2

Michael Niedermayer michaelni
Thu Oct 9 19:01:21 CEST 2008


On Wed, Oct 08, 2008 at 06:48:30PM -0700, Baptiste Coudurier wrote:
> Hi
> 
> $subject.
> 
> 1987 dezicycles in get pixels mmx, 131063 runs, 9 skips
> 2014 dezicycles in get pixels mmx, 262129 runs, 15 skips
> 2005 dezicycles in get pixels mmx, 524258 runs, 30 skips
> 2009 dezicycles in get pixels mmx, 1048513 runs, 63 skip
> 2025 dezicycles in get pixels mmx, 2097009 runs, 143 skips
> 
> 1820 dezicycles in get pixels sse2, 131061 runs, 11 skips
> 1828 dezicycles in get pixels sse2, 262125 runs, 19 skips
> 1819 dezicycles in get pixels sse2, 524259 runs, 29 skips
> 1814 dezicycles in get pixels sse2, 1048524 runs, 52 skips
> 1813 dezicycles in get pixels sse2, 2097063 runs, 89 skips
> 
> -- 
> Baptiste COUDURIER                              GnuPG Key Id: 0x5C1ABAAA
> Smartjog USA Inc.                                http://www.smartjog.com
> Key fingerprint                 8D77134D20CC9220201FC5DB0AC9325C5C1ABAAA

> Index: libavcodec/i386/dsputilenc_mmx.c
> ===================================================================
> --- libavcodec/i386/dsputilenc_mmx.c	(revision 15588)
> +++ libavcodec/i386/dsputilenc_mmx.c	(working copy)
> @@ -56,6 +56,40 @@
>      );
>  }
>  
> +static void get_pixels_sse2(DCTELEM *block, const uint8_t *pixels, int line_size)
> +{
> +    asm volatile(
> +        "pxor %%xmm7,      %%xmm7         \n\t"
> +        "movq (%0),        %%xmm0         \n\t"
> +        "movq (%0, %2),    %%xmm1         \n\t"
> +        "movq (%0, %2,2),  %%xmm2         \n\t"
> +        "movq (%0, %3),    %%xmm3         \n\t"
> +        "punpcklbw %%xmm7, %%xmm0         \n\t"
> +        "punpcklbw %%xmm7, %%xmm1         \n\t"
> +        "punpcklbw %%xmm7, %%xmm2         \n\t"
> +        "punpcklbw %%xmm7, %%xmm3         \n\t"
> +        "movdqa %%xmm0,      (%1)         \n\t"
> +        "movdqa %%xmm1,    16(%1)         \n\t"
> +        "movdqa %%xmm2,    32(%1)         \n\t"
> +        "movdqa %%xmm3,    48(%1)         \n\t"
> +        "lea (%0,%2,4), %0                \n\t"
> +        "movq (%0),        %%xmm0         \n\t"

my gut feeling says that the code should be faster with the lea moved farther
up, but i might be wrong ...


[...]
> @@ -1332,7 +1366,11 @@
>              }
>          }
>  
> -        c->get_pixels = get_pixels_mmx;
> +        if(mm_flags & MM_SSE2)
> +            c->get_pixels = get_pixels_sse2;
> +        else
> +            c->get_pixels = get_pixels_mmx;
> +
>          c->diff_pixels = diff_pixels_mmx;
>          c->pix_sum = pix_sum16_mmx;

there is a if(mm_flags & MM_SSE2) below, this could be used instead
of adding a new if()

except these, looks ok

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

I have often repented speaking, but never of holding my tongue.
-- Xenocrates
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20081009/524154c7/attachment.pgp>



More information about the ffmpeg-devel mailing list