[FFmpeg-devel] [PATCH] VC-1 MMX DSP functions
Sat Jul 7 18:55:20 CEST 2007
2007/7/7, Christophe GISQUET <christophe.gisquet at free.fr>:
> here are the MMX functions now licensed under the MIT license.
> Zuxy Meng has been working on SSE2 versions of those; I'm not sure if he
> would agree to contribute to this file using MIT license. In that case,
> I don't mind the license being changed, but I would prefer having the
> MIT licensing available in the svn history.
I care less about license issues than raw performance :-)
I did a quick test on 64-bit K8 tonight thanks to Stephan's testbed.
The result wasn't promising. In short, from fastest to slowest:
MMX > SSE2 w/o sw pipeling > SSE2 w/ sw pipeling
The reason may be that on K8 SSE2 is thoughput bound (K8 can decode 3
MMX instructions per cycle, but only 1.5 SSE2 ones), and sw pipeling
increase the # of instructions per loop. If AMD does what they've
promised on their upcoming K10, I guess the result will be:
SSE2 w/o sw pipeling > SSE2 w/ sw pipeling > MMX
And IIRC on your 32-bit Conroe, where SSE2 is latency bound (punpcklbw
and unaligned movq are slow), the list is somewhat different:
SSE2 w/ sw pipeling > MMX > SSE2 w/o sw pipeling
On my Dothan:
MMX > SSE2 w/ sw pipeling > SSE2 w/o sw pipeling
So the conclusion is that I can't make a conclusion. Any suggestions?
Beauty is truth,
While truth is beauty.
PGP KeyID: E8555ED6
More information about the ffmpeg-devel