[MPlayer-dev-eng] [RFC] make vo_draw_alpha_yuy2 MMX match C code closer
Reimar Döffinger
Reimar.Doeffinger at stud.uni-karlsruhe.de
Wed Oct 3 13:04:51 CEST 2007
Hello,
without FAST_OSD at least, the C code of vo_draw_alpha_yuy2 also filters
the U and V components. Attached patch adds this also to the MMX code.
Disadvantage: even on my CPU it is about 10 % slower, probably more on
older ones.
Do you think it is okay to apply?
Greetings,
Reimar Döffinger
P.S.: My speed measurements:
Values from 3 runs current code:
1098150 dezicycles in draw_alpha yuy2, 1 runs, 0 skips
1136995 dezicycles in draw_alpha yuy2, 2 runs, 0 skips
1134700 dezicycles in draw_alpha yuy2, 4 runs, 0 skips
1153833 dezicycles in draw_alpha yuy2, 8 runs, 0 skips
1143333 dezicycles in draw_alpha yuy2, 16 runs, 0 skips
1142766 dezicycles in draw_alpha yuy2, 32 runs, 0 skips
1116784 dezicycles in draw_alpha yuy2, 64 runs, 0 skips
848987 dezicycles in draw_alpha yuy2, 128 runs, 0 skips
1069290 dezicycles in draw_alpha yuy2, 1 runs, 0 skips
1059705 dezicycles in draw_alpha yuy2, 2 runs, 0 skips
1056630 dezicycles in draw_alpha yuy2, 4 runs, 0 skips
1073432 dezicycles in draw_alpha yuy2, 8 runs, 0 skips
1064333 dezicycles in draw_alpha yuy2, 16 runs, 0 skips
1064457 dezicycles in draw_alpha yuy2, 32 runs, 0 skips
1035626 dezicycles in draw_alpha yuy2, 64 runs, 0 skips
814684 dezicycles in draw_alpha yuy2, 128 runs, 0 skips
1057750 dezicycles in draw_alpha yuy2, 1 runs, 0 skips
1050555 dezicycles in draw_alpha yuy2, 2 runs, 0 skips
1049690 dezicycles in draw_alpha yuy2, 4 runs, 0 skips
1067691 dezicycles in draw_alpha yuy2, 8 runs, 0 skips
1077448 dezicycles in draw_alpha yuy2, 16 runs, 0 skips
1067423 dezicycles in draw_alpha yuy2, 32 runs, 0 skips
1040845 dezicycles in draw_alpha yuy2, 64 runs, 0 skips
819728 dezicycles in draw_alpha yuy2, 128 runs, 0 skips
3 runs with my patch:
1102110 dezicycles in draw_alpha yuy2, 1 runs, 0 skips
1117680 dezicycles in draw_alpha yuy2, 2 runs, 0 skips
1115292 dezicycles in draw_alpha yuy2, 4 runs, 0 skips
1144153 dezicycles in draw_alpha yuy2, 8 runs, 0 skips
1127935 dezicycles in draw_alpha yuy2, 16 runs, 0 skips
1143373 dezicycles in draw_alpha yuy2, 32 runs, 0 skips
1105286 dezicycles in draw_alpha yuy2, 64 runs, 0 skips
856815 dezicycles in draw_alpha yuy2, 128 runs, 0 skips
1130070 dezicycles in draw_alpha yuy2, 1 runs, 0 skips
1171290 dezicycles in draw_alpha yuy2, 2 runs, 0 skips
1166595 dezicycles in draw_alpha yuy2, 4 runs, 0 skips
1180088 dezicycles in draw_alpha yuy2, 8 runs, 0 skips
1174431 dezicycles in draw_alpha yuy2, 16 runs, 0 skips
1174303 dezicycles in draw_alpha yuy2, 32 runs, 0 skips
1152215 dezicycles in draw_alpha yuy2, 64 runs, 0 skips
899249 dezicycles in draw_alpha yuy2, 128 runs, 0 skips
1188150 dezicycles in draw_alpha yuy2, 1 runs, 0 skips
1172545 dezicycles in draw_alpha yuy2, 2 runs, 0 skips
1171170 dezicycles in draw_alpha yuy2, 4 runs, 0 skips
1187917 dezicycles in draw_alpha yuy2, 8 runs, 0 skips
1178810 dezicycles in draw_alpha yuy2, 16 runs, 0 skips
1185929 dezicycles in draw_alpha yuy2, 32 runs, 0 skips
1150871 dezicycles in draw_alpha yuy2, 64 runs, 0 skips
892911 dezicycles in draw_alpha yuy2, 128 runs, 0 skips
-------------- next part --------------
Index: libvo/osd_template.c
===================================================================
--- libvo/osd_template.c (revision 24688)
+++ libvo/osd_template.c (working copy)
@@ -108,7 +137,7 @@
"pcmpeqb %%mm5, %%mm5\n\t" // F..F
"movq %%mm5, %%mm6\n\t"
"movq %%mm5, %%mm4\n\t"
- "psllw $8, %%mm5\n\t" //FF00FF00FF00
+ "psllw $15, %%mm5\n\t" //800080008000
"psrlw $8, %%mm4\n\t" //00FF00FF00FF
::);
#endif
@@ -136,7 +165,10 @@
"punpcklbw %%mm7, %%mm2\n\t" //srca 0D0C0B0A
"pmullw %%mm2, %%mm0\n\t"
"psrlw $8, %%mm0\n\t"
- "pand %%mm5, %%mm1\n\t" //U0V0U0V0
+ "pxor %%mm5, %%mm1\n\t" // U, V - 128
+ "pmulhw %%mm2, %%mm1\n\t"
+ "psllw $8, %%mm1\n\t"
+ "pxor %%mm5, %%mm1\n\t" // U, V + 128
"movd %2, %%mm2\n\t" //src 0000DCBA
"punpcklbw %%mm7, %%mm2\n\t" //srca 0D0C0B0A
"por %%mm1, %%mm0\n\t"
More information about the MPlayer-dev-eng
mailing list