[MPlayer-dev-eng] [PATCH] yadif SSE2/SSSE3 optimization
Zhou, Zongyi
zz65 at cornell.edu
Tue Nov 18 17:15:05 CET 2008
Hi all,
I split the patch into two parts. The first part adds cpu detection for SSSE3 and SSE4a.
I found that on Phenom SSE2 is much faster than MMX2 so SSE2 will be enabled on all Intel CPUs and Phenom.
I am still working on the second part that adds SSE2/SSSE3 to yadif. All duplicated codes will be removed.
Following is the first part of the patch. I think other filters and codecs can also use it.
Regards,
ZZ
===================================================================
--- cpudetect.c (revision 27949)
+++ cpudetect.c (working copy)
@@ -144,6 +144,8 @@
caps->hasMMX = (regs2[3] & (1 << 23 )) >> 23; // 0x0800000
caps->hasSSE = (regs2[3] & (1 << 25 )) >> 25; // 0x2000000
caps->hasSSE2 = (regs2[3] & (1 << 26 )) >> 26; // 0x4000000
+ caps->hasSSSE3 = (regs2[3] & (1 << 9 )) >> 9; // 0x00000200
+ caps->hasSSE4a = (regs2[3] & (1 << 6 )) >> 6; // 0x00000040
caps->hasMMX2 = caps->hasSSE; // SSE cpus supports mmxext too
cl_size = ((regs2[1] >> 8) & 0xFF)*8;
if(cl_size) caps->cl_size = cl_size;
@@ -496,6 +498,8 @@
caps->has3DNowExt=0;
caps->hasSSE=0;
caps->hasSSE2=0;
+ caps->hasSSSE3=0;
+ caps->hasSSE4a=0;
caps->isX86=0;
caps->hasAltiVec = 0;
#ifdef HAVE_ALTIVEC
Index: cpudetect.h
===================================================================
--- cpudetect.h (revision 27949)
+++ cpudetect.h (working copy)
@@ -44,6 +44,8 @@
int has3DNowExt;
int hasSSE;
int hasSSE2;
+ int hasSSSE3;
+ int hasSSE4a;
int isX86;
unsigned cl_size; /* size of cache line */
int hasAltiVec;
More information about the MPlayer-dev-eng
mailing list