[FFmpeg-devel] [PATCH] VC-1 MMX DSP functions

Christophe GISQUET christophe.gisquet
Sun Jul 8 15:26:06 CEST 2007


Hello,

Zuxy Meng a ?crit :
> I did a quick test on 64-bit K8 tonight thanks to Stephan's testbed.

And myself on a x86-64 core2 system.

> The result wasn't promising. In short, from fastest to slowest:
> MMX > SSE2 w/o sw pipeling > SSE2 w/ sw pipeling

I haven't tested the mid-performer, but I can confirm this. Using
START/STOP_TIMER, the figures are (on a 1080p sequence): ~2800
dezicycles for MMX, ~3800 for SSE2.

> So the conclusion is that I can't make  a conclusion. Any suggestions?

Maybe have a look at the attached opannotate (based on 4 runs) for your
s/w pipelined SSE2 functions?

The 1/4 and 3/4 seem well pipelined, with only the output that's costly.
However, if opannotate is to be believed (because some timings are very
surprising), the 1/2 gets quite a lot of stalls, probably up to the
point where they make up for most of the execution time.

Best regards,
-- 
Christophe GISQUET
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: sse2_64.txt
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070708/8a43decf/attachment.txt>



More information about the ffmpeg-devel mailing list