[MPlayer-dev-eng] [RFC] make vo_draw_alpha_yuy2 MMX match C code closer

Reimar Döffinger Reimar.Doeffinger at stud.uni-karlsruhe.de
Wed Oct 3 13:04:51 CEST 2007


Hello,
without FAST_OSD at least, the C code of vo_draw_alpha_yuy2 also filters
the U and V components. Attached patch adds this also to the MMX code.
Disadvantage: even on my CPU it is about 10 % slower, probably more on
older ones.
Do you think it is okay to apply?

Greetings,
Reimar Döffinger

P.S.: My speed measurements:

Values from 3 runs current code:
1098150 dezicycles in draw_alpha yuy2, 1 runs, 0 skips
1136995 dezicycles in draw_alpha yuy2, 2 runs, 0 skips
1134700 dezicycles in draw_alpha yuy2, 4 runs, 0 skips
1153833 dezicycles in draw_alpha yuy2, 8 runs, 0 skips
1143333 dezicycles in draw_alpha yuy2, 16 runs, 0 skips
1142766 dezicycles in draw_alpha yuy2, 32 runs, 0 skips
1116784 dezicycles in draw_alpha yuy2, 64 runs, 0 skips
848987 dezicycles in draw_alpha yuy2, 128 runs, 0 skips

1069290 dezicycles in draw_alpha yuy2, 1 runs, 0 skips
1059705 dezicycles in draw_alpha yuy2, 2 runs, 0 skips
1056630 dezicycles in draw_alpha yuy2, 4 runs, 0 skips
1073432 dezicycles in draw_alpha yuy2, 8 runs, 0 skips
1064333 dezicycles in draw_alpha yuy2, 16 runs, 0 skips
1064457 dezicycles in draw_alpha yuy2, 32 runs, 0 skips
1035626 dezicycles in draw_alpha yuy2, 64 runs, 0 skips
814684 dezicycles in draw_alpha yuy2, 128 runs, 0 skips

1057750 dezicycles in draw_alpha yuy2, 1 runs, 0 skips
1050555 dezicycles in draw_alpha yuy2, 2 runs, 0 skips
1049690 dezicycles in draw_alpha yuy2, 4 runs, 0 skips
1067691 dezicycles in draw_alpha yuy2, 8 runs, 0 skips
1077448 dezicycles in draw_alpha yuy2, 16 runs, 0 skips
1067423 dezicycles in draw_alpha yuy2, 32 runs, 0 skips
1040845 dezicycles in draw_alpha yuy2, 64 runs, 0 skips
819728 dezicycles in draw_alpha yuy2, 128 runs, 0 skips

3 runs with my patch:
1102110 dezicycles in draw_alpha yuy2, 1 runs, 0 skips
1117680 dezicycles in draw_alpha yuy2, 2 runs, 0 skips
1115292 dezicycles in draw_alpha yuy2, 4 runs, 0 skips
1144153 dezicycles in draw_alpha yuy2, 8 runs, 0 skips
1127935 dezicycles in draw_alpha yuy2, 16 runs, 0 skips
1143373 dezicycles in draw_alpha yuy2, 32 runs, 0 skips
1105286 dezicycles in draw_alpha yuy2, 64 runs, 0 skips
856815 dezicycles in draw_alpha yuy2, 128 runs, 0 skips

1130070 dezicycles in draw_alpha yuy2, 1 runs, 0 skips
1171290 dezicycles in draw_alpha yuy2, 2 runs, 0 skips
1166595 dezicycles in draw_alpha yuy2, 4 runs, 0 skips
1180088 dezicycles in draw_alpha yuy2, 8 runs, 0 skips
1174431 dezicycles in draw_alpha yuy2, 16 runs, 0 skips
1174303 dezicycles in draw_alpha yuy2, 32 runs, 0 skips
1152215 dezicycles in draw_alpha yuy2, 64 runs, 0 skips
899249 dezicycles in draw_alpha yuy2, 128 runs, 0 skips

1188150 dezicycles in draw_alpha yuy2, 1 runs, 0 skips
1172545 dezicycles in draw_alpha yuy2, 2 runs, 0 skips
1171170 dezicycles in draw_alpha yuy2, 4 runs, 0 skips
1187917 dezicycles in draw_alpha yuy2, 8 runs, 0 skips
1178810 dezicycles in draw_alpha yuy2, 16 runs, 0 skips
1185929 dezicycles in draw_alpha yuy2, 32 runs, 0 skips
1150871 dezicycles in draw_alpha yuy2, 64 runs, 0 skips
892911 dezicycles in draw_alpha yuy2, 128 runs, 0 skips

-------------- next part --------------
Index: libvo/osd_template.c
===================================================================
--- libvo/osd_template.c	(revision 24688)
+++ libvo/osd_template.c	(working copy)
@@ -108,7 +137,7 @@
         "pcmpeqb %%mm5, %%mm5\n\t" // F..F
         "movq %%mm5, %%mm6\n\t"
         "movq %%mm5, %%mm4\n\t"
-        "psllw $8, %%mm5\n\t" //FF00FF00FF00
+        "psllw $15, %%mm5\n\t" //800080008000
         "psrlw $8, %%mm4\n\t" //00FF00FF00FF
         ::);        
 #endif
@@ -136,7 +165,10 @@
 		"punpcklbw %%mm7, %%mm2\n\t"	//srca 0D0C0B0A
 		"pmullw	%%mm2, %%mm0\n\t"
 		"psrlw	$8, %%mm0\n\t"
-		"pand %%mm5, %%mm1\n\t" 	//U0V0U0V0
+		"pxor   %%mm5, %%mm1\n\t"  // U, V - 128
+		"pmulhw %%mm2, %%mm1\n\t"
+		"psllw  $8, %%mm1\n\t"
+		"pxor   %%mm5, %%mm1\n\t"  // U, V + 128
 		"movd %2, %%mm2\n\t"		//src 0000DCBA
 		"punpcklbw %%mm7, %%mm2\n\t"	//srca 0D0C0B0A
 		"por %%mm1, %%mm0\n\t"


More information about the MPlayer-dev-eng mailing list