[FFmpeg-devel] gcc 2.95.3 support plan
Loren Merritt
lorenm
Wed Feb 11 21:43:35 CET 2009
On Wed, 11 Feb 2009, Ivan Kalvachev wrote:
> So on AMD the plain asm cmov function is faster than mmxext.
> Can you show benchmarks?
benchmarking add_hfyu_median_prediction with width=1280.
2**20 runs. stddev is about 4 cycles.
Intel Core2 e6600, 64bit, gcc-4.2.3
24929 cycles in plain c
19086 cycles in c with HAVE_CMOV (i.e. asm mid_pred())
16489 cycles in cmov asm
8869 cycles in mmx
AMD K8 3400+, 64bit, gcc-4.2.3
21165 cycles in plain c
14361 cycles in c with HAVE_CMOV
9398 cycles in cmov asm
14048 cycles in mmx
The numbers are easily explained by:
My mmx doesn't use any simd, it just applies mmx ops with one value per
reg. (simd decoding is impossible in huffyuv. I'm writing a new format to
remedy that, among other improvements. Not ready yet.)
On P3, PM, and Core2, cmov has latency 2 and pmaxub has latency 1.
On K7, K8, and K10, cmov has latency 1 and pmaxub has latency 2.
The critical path is made almost entirely of those instructions.
--Loren Merritt
More information about the ffmpeg-devel
mailing list