[MPlayer-dev-eng] [PATCH] SSE2 optimizations for libmpeg2

Loren Merritt lorenm at u.washington.edu
Sun Feb 17 18:26:07 CET 2008


On Sat, 16 Feb 2008, Diego Biurrun wrote:

> I found this patch on the libmpeg2 mailing list, here it is, slightly
> adapted and cleaned up.  I could only test compilation without SSE2 as I
> don't have a SSE2 processor.  I'd be happy to hear about test results
> and benchmarks.

Cpu detection is broken, attached patch fixes it (to be applied on top of 
the previous patch). Furthermore, cpu detection is doubly redundant: Not 
only is libmpeg2's own detection not used, but libmpeg2's wrapper for 
mplayer's detection is also not used. vd_libmpeg2 overrides them.

What is DO_NOT_MIX_MMX_AND_SSE2 for? There's nothing wrong with mixing mmx 
and sse2. Pick whichever is fastest for any given function, or even mix 
them in the same function.

sse2 output is not identical to mmx output (65 dB). Is the idct supposed 
to differ?

quick test on a single 414 second dvd source, core2 e6600:
ffmpeg2:       18.33 +/- .11 sec (678 fps)
libmpeg2 mmx:  16.91 +/- .09 sec (734 fps)
libmpeg2 sse2: 14.75 +/- .05 sec (842 fps)

oprofile (filtered to merge the many mc functions into a single line)
ffmpeg2:
samples  %        symbol name
128869  38.4785  mpeg_decode_mb
  58947  17.6007  ff_simple_idct_add_mmx
  32611   9.7372  ff_simple_idct_put_mmx
  30083   8.9823  put_pixels_mmx2
  12113   3.6168  MPV_decode_mb
  11071   3.3056  MPV_motion
  10564   3.1543  clear_blocks_mmx
   9792   2.9238  add_pixels_clamped_mmx
   8167   2.4386  demux_pattern_3
   8034   2.3988  mpeg_decode_motion
   7522   2.2460  decode_dc
   6130   1.8303  mpeg_decode_slice
   3191   0.9528  prefetch_mmx2
   3157   0.9426  fast_memcpy
   1136   0.3393  avg_pixels_mmx2

libmpeg2 mmx:
75716  25.2977  mmxext_idct
75500  25.2255  get_non_intra_block
33795  11.2913  MC_put_mmxext
30040  10.0368  get_intra_block_B15
20569   6.8724  mpeg2_slice
15603   5.2132  mpeg2_idct_add_mmxext
  8583   2.8677  mpeg2_parse
  8566   2.8620  slice_intra_DCT
  8067   2.6953  demux_pattern_3
  6589   2.2015  motion_fr_frame_420
  5965   1.9930  mpeg2_idct_copy_mmxext
  3080   1.0291  motion_fr_field_420
  2955   0.9873  fast_memcpy
  1489   0.4975  MC_avg_mmxext
   649   0.2168  motion_reuse_420

libmpeg2 sse2:
73447  28.2600  get_non_intra_block
41666  16.0317  mpeg2_idct_add_sse2
33042  12.7135  MC_put_sse2
29754  11.4484  get_intra_block_B15
22896   8.8096  mpeg2_idct_copy_sse2
19205   7.3895  mpeg2_slice
  8828   3.3967  mpeg2_parse
  8248   3.1736  demux_pattern_3
  6420   2.4702  motion_fr_frame_420
  6355   2.4452  slice_intra_DCT
  2952   1.1358  motion_fr_field_420
  2918   1.1228  fast_memcpy
  1422   0.5473  MC_avg_sse2
   613   0.2359  motion_reuse_420

... seems to show that the mc part is useless, and speedup is due entirely 
to idct.

--Loren Merritt
-------------- next part --------------
--- libmpeg2/cpu_accel.c~	2008-02-17 08:54:13.000000000 -0700
+++ libmpeg2/cpu_accel.c	2008-02-17 09:06:06.000000000 -0700
@@ -97,8 +97,6 @@
     if (!eax)			/* vendor string only */
 	return 0;
 
-    if (edx & 0x04000000)	/* SSE2 */
-	accel |= MPEG2_ACCEL_X86_SSE2;
     AMD = (ebx == 0x68747541) && (ecx == 0x444d4163) && (edx == 0x69746e65);
 
     cpuid (0x00000001, eax, ebx, ecx, edx);
@@ -108,6 +106,8 @@
     caps = MPEG2_ACCEL_X86_MMX;
     if (edx & 0x02000000)	/* SSE - identical to AMD MMX extensions */
 	caps = MPEG2_ACCEL_X86_MMX | MPEG2_ACCEL_X86_MMXEXT;
+    if (edx & 0x04000000)	/* SSE2 */
+	caps |= MPEG2_ACCEL_X86_SSE2;
 
     cpuid (0x80000000, eax, ebx, ecx, edx);
     if (eax < 0x80000001)	/* no extended capabilities */
Index: libmpcodecs/vd_libmpeg2.c
===================================================================
--- libmpcodecs/vd_libmpeg2.c	(revision 26016)
+++ libmpcodecs/vd_libmpeg2.c	(working copy)
@@ -72,6 +72,8 @@
        accel |= MPEG2_ACCEL_X86_MMX;
     if(gCpuCaps.hasMMX2)
        accel |= MPEG2_ACCEL_X86_MMXEXT;
+    if(gCpuCaps.hasSSE2)
+       accel |= MPEG2_ACCEL_X86_SSE2;
     if(gCpuCaps.has3DNow)
        accel |= MPEG2_ACCEL_X86_3DNOW;
     if(gCpuCaps.hasAltiVec)


More information about the MPlayer-dev-eng mailing list