[FFmpeg-devel] [PATCH] get_pixels_sse2
Baptiste Coudurier
baptiste.coudurier
Thu Oct 9 19:47:48 CEST 2008
Hi Michael,
Michael Niedermayer wrote:
> On Wed, Oct 08, 2008 at 06:48:30PM -0700, Baptiste Coudurier wrote:
>> Hi
>>
>> $subject.
>>
>> 1987 dezicycles in get pixels mmx, 131063 runs, 9 skips
>> 2014 dezicycles in get pixels mmx, 262129 runs, 15 skips
>> 2005 dezicycles in get pixels mmx, 524258 runs, 30 skips
>> 2009 dezicycles in get pixels mmx, 1048513 runs, 63 skip
>> 2025 dezicycles in get pixels mmx, 2097009 runs, 143 skips
>>
>> 1820 dezicycles in get pixels sse2, 131061 runs, 11 skips
>> 1828 dezicycles in get pixels sse2, 262125 runs, 19 skips
>> 1819 dezicycles in get pixels sse2, 524259 runs, 29 skips
>> 1814 dezicycles in get pixels sse2, 1048524 runs, 52 skips
>> 1813 dezicycles in get pixels sse2, 2097063 runs, 89 skips
>>
>> --
>> Baptiste COUDURIER GnuPG Key Id: 0x5C1ABAAA
>> Smartjog USA Inc. http://www.smartjog.com
>> Key fingerprint 8D77134D20CC9220201FC5DB0AC9325C5C1ABAAA
>
>> Index: libavcodec/i386/dsputilenc_mmx.c
>> ===================================================================
>> --- libavcodec/i386/dsputilenc_mmx.c (revision 15588)
>> +++ libavcodec/i386/dsputilenc_mmx.c (working copy)
>> @@ -56,6 +56,40 @@
>> );
>> }
>>
>> +static void get_pixels_sse2(DCTELEM *block, const uint8_t *pixels, int line_size)
>> +{
>> + asm volatile(
>> + "pxor %%xmm7, %%xmm7 \n\t"
>> + "movq (%0), %%xmm0 \n\t"
>> + "movq (%0, %2), %%xmm1 \n\t"
>> + "movq (%0, %2,2), %%xmm2 \n\t"
>> + "movq (%0, %3), %%xmm3 \n\t"
>> + "punpcklbw %%xmm7, %%xmm0 \n\t"
>> + "punpcklbw %%xmm7, %%xmm1 \n\t"
>> + "punpcklbw %%xmm7, %%xmm2 \n\t"
>> + "punpcklbw %%xmm7, %%xmm3 \n\t"
>> + "movdqa %%xmm0, (%1) \n\t"
>> + "movdqa %%xmm1, 16(%1) \n\t"
>> + "movdqa %%xmm2, 32(%1) \n\t"
>> + "movdqa %%xmm3, 48(%1) \n\t"
>> + "lea (%0,%2,4), %0 \n\t"
>> + "movq (%0), %%xmm0 \n\t"
>
> my gut feeling says that the code should be faster with the lea moved farther
> up, but i might be wrong ...
Changed. I don't really see the difference in benchmark though.
> [...]
>> @@ -1332,7 +1366,11 @@
>> }
>> }
>>
>> - c->get_pixels = get_pixels_mmx;
>> + if(mm_flags & MM_SSE2)
>> + c->get_pixels = get_pixels_sse2;
>> + else
>> + c->get_pixels = get_pixels_mmx;
>> +
>> c->diff_pixels = diff_pixels_mmx;
>> c->pix_sum = pix_sum16_mmx;
>
> there is a if(mm_flags & MM_SSE2) below, this could be used instead
> of adding a new if()
Ok, done.
Updated patch attached.
--
Baptiste COUDURIER GnuPG Key Id: 0x5C1ABAAA
Smartjog USA Inc. http://www.smartjog.com
Key fingerprint 8D77134D20CC9220201FC5DB0AC9325C5C1ABAAA
-------------- next part --------------
A non-text attachment was scrubbed...
Name: get_pixels_sse2_2.patch
Type: text/x-diff
Size: 2147 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20081009/462ce700/attachment.patch>
More information about the ffmpeg-devel
mailing list