[FFmpeg-devel] MMX accelerated DSP functions for VC1/WMV3 decoders
Christophe GISQUET
christophe.gisquet
Sun Jul 1 11:53:48 CEST 2007
Hello,
Zuxy Meng a ?crit :
>>>> + "psllw $1, %%mm1 \n\t" \
>>>> + "psllw $1, %%mm2 \n\t" \
>>> paddw
>> Is that always faster?
>
> According to Intel & AMD's manuals, same latency on P6/Pentium 4/Core
> 2/K7/K8/K10, more throughput on Core 2. So paddw is good.
Thanks for the information!
I managed to get on an Athlon 1200. Indeed the execution time is the
same there, 10.7s for the 1 billion iterations. So this may not be a
pairing issue but more a matter of throughput.
> a codec expert). Of course as the author who really understands what
> you're doing you can do better than that. So would u mind providing an
> SSE2 optimization at the very beginning?
I think I prefer to wait for the plain-MMX version to be accepted: I'm
maybe far from anything acceptable by Michael, so much time could be
wasted before that. When the pure code issues are fixed, then I may
provide a patch to add SSE2 code inside of vc1dsp_mmx.c
Best regards,
Christophe GISQUET
More information about the ffmpeg-devel
mailing list