[FFmpeg-devel] MMX accelerated DSP functions for VC1/WMV3 decoders

Sun Jul 1 11:53:48 CEST 2007

Hello,

Zuxy Meng a ?crit :
>>>> +     "psllw     $1, %%mm1               \n\t"                   \
>>>> +     "psllw     $1, %%mm2               \n\t"                   \
>>> paddw
>> Is that always faster?
> 
> According to Intel & AMD's manuals, same latency on P6/Pentium 4/Core
> 2/K7/K8/K10, more throughput on Core 2. So paddw is good.

Thanks for the information!

I managed to get on an Athlon 1200. Indeed the execution time is the
same there, 10.7s for the 1 billion iterations. So this may not be a
pairing issue but more a matter of throughput.

> a codec expert). Of course as the author who really understands what
> you're doing you can do better than that. So would u mind providing an
> SSE2 optimization at the very beginning?

I think I prefer to wait for the plain-MMX version to be accepted: I'm
maybe far from anything acceptable by Michael, so much time could be
wasted before that. When the pure code issues are fixed, then I may
provide a patch to add SSE2 code inside of vc1dsp_mmx.c

Best regards,
Christophe GISQUET