[FFmpeg-devel] [PATCH] get_pixels_sse2
Michael Niedermayer
michaelni
Thu Oct 9 19:01:21 CEST 2008
On Wed, Oct 08, 2008 at 06:48:30PM -0700, Baptiste Coudurier wrote:
> Hi
>
> $subject.
>
> 1987 dezicycles in get pixels mmx, 131063 runs, 9 skips
> 2014 dezicycles in get pixels mmx, 262129 runs, 15 skips
> 2005 dezicycles in get pixels mmx, 524258 runs, 30 skips
> 2009 dezicycles in get pixels mmx, 1048513 runs, 63 skip
> 2025 dezicycles in get pixels mmx, 2097009 runs, 143 skips
>
> 1820 dezicycles in get pixels sse2, 131061 runs, 11 skips
> 1828 dezicycles in get pixels sse2, 262125 runs, 19 skips
> 1819 dezicycles in get pixels sse2, 524259 runs, 29 skips
> 1814 dezicycles in get pixels sse2, 1048524 runs, 52 skips
> 1813 dezicycles in get pixels sse2, 2097063 runs, 89 skips
>
> --
> Baptiste COUDURIER GnuPG Key Id: 0x5C1ABAAA
> Smartjog USA Inc. http://www.smartjog.com
> Key fingerprint 8D77134D20CC9220201FC5DB0AC9325C5C1ABAAA
> Index: libavcodec/i386/dsputilenc_mmx.c
> ===================================================================
> --- libavcodec/i386/dsputilenc_mmx.c (revision 15588)
> +++ libavcodec/i386/dsputilenc_mmx.c (working copy)
> @@ -56,6 +56,40 @@
> );
> }
>
> +static void get_pixels_sse2(DCTELEM *block, const uint8_t *pixels, int line_size)
> +{
> + asm volatile(
> + "pxor %%xmm7, %%xmm7 \n\t"
> + "movq (%0), %%xmm0 \n\t"
> + "movq (%0, %2), %%xmm1 \n\t"
> + "movq (%0, %2,2), %%xmm2 \n\t"
> + "movq (%0, %3), %%xmm3 \n\t"
> + "punpcklbw %%xmm7, %%xmm0 \n\t"
> + "punpcklbw %%xmm7, %%xmm1 \n\t"
> + "punpcklbw %%xmm7, %%xmm2 \n\t"
> + "punpcklbw %%xmm7, %%xmm3 \n\t"
> + "movdqa %%xmm0, (%1) \n\t"
> + "movdqa %%xmm1, 16(%1) \n\t"
> + "movdqa %%xmm2, 32(%1) \n\t"
> + "movdqa %%xmm3, 48(%1) \n\t"
> + "lea (%0,%2,4), %0 \n\t"
> + "movq (%0), %%xmm0 \n\t"
my gut feeling says that the code should be faster with the lea moved farther
up, but i might be wrong ...
[...]
> @@ -1332,7 +1366,11 @@
> }
> }
>
> - c->get_pixels = get_pixels_mmx;
> + if(mm_flags & MM_SSE2)
> + c->get_pixels = get_pixels_sse2;
> + else
> + c->get_pixels = get_pixels_mmx;
> +
> c->diff_pixels = diff_pixels_mmx;
> c->pix_sum = pix_sum16_mmx;
there is a if(mm_flags & MM_SSE2) below, this could be used instead
of adding a new if()
except these, looks ok
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
I have often repented speaking, but never of holding my tongue.
-- Xenocrates
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20081009/524154c7/attachment.pgp>
More information about the ffmpeg-devel
mailing list