[FFmpeg-devel] MMX version for put_no_rnd_h264_chroma_mc8_c
Christophe GISQUET
christophe.gisquet
Sun Nov 25 00:35:12 CET 2007
Good evening,
Michael Niedermayer a ?crit :
> also the //START_TIMER dont belong in the patch
It was intended in that case, to show how I compared the versions, and
it seems it was worth it. Here are the new results.
Before:
VC-1: 2085 dezicycles in rnd, 1047692 runs, 884 skips
h264: 1093 dezicycles in rnd, 2096936 runs, 216 skips
Patch applied:
VC-1: 2106 dezicycles in rnd, 1047537 runs, 1039 skips
2119 dezicycles in no_rnd, 1047384 runs, 1192 skips
h264: 1097 dezicycles in rnd, 2096867 runs, 285 skips
And using a global benchmarking, without the *_TIMER macro, yields no
measurable difference.
>> const int dxy = x ? 1 : stride;
>>
>> asm volatile(
>> + "movq %2, %%mm6\n\t"
>> "movd %0, %%mm5\n\t"
>> "movq %1, %%mm4\n\t"
>> "punpcklwd %%mm5, %%mm5\n\t"
>> "punpckldq %%mm5, %%mm5\n\t" /* mm5 = B = x */
>> - "movq %%mm4, %%mm6\n\t"
>> "pxor %%mm7, %%mm7\n\t"
>> "psubw %%mm5, %%mm4\n\t" /* mm4 = A = 8-x */
>> - "psrlw $1, %%mm6\n\t" /* mm6 = 4 */
>> - :: "rm"(x+y), "m"(ff_pw_8));
>> + "psrlw $3, %%mm6" /* mm6 = rnd */
>> + :: "rm"(x+y), "m"(ff_pw_8), "m"(*rnd_reg));
>
> the psrlw can be avoided by shifting the constant right
The bilinear case further down doesn't do that psrlw and use the
constant as is. Still I applied your suggestion, that you can observe in
the attached patch.
Best regards,
--
Christophe GISQUET
-------------- next part --------------
A non-text attachment was scrubbed...
Name: h264.2.diff
Type: text/x-patch
Size: 5355 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20071125/1ce4589d/attachment.bin>
More information about the ffmpeg-devel
mailing list