[MPlayer-dev-eng] [PATCH] yadif SSE2/SSSE3 optimization

Zhou, Zongyi zz65 at cornell.edu
Tue Nov 18 17:15:05 CET 2008


Hi all,

I split the patch into two parts. The first part adds cpu detection for SSSE3 and SSE4a.
I found that on Phenom SSE2 is much faster than MMX2 so SSE2 will be enabled on all Intel CPUs and Phenom.

I am still working on the second part that adds SSE2/SSSE3 to yadif. All duplicated codes will be removed.

Following is the first part of the patch. I think other filters and codecs can also use it.

Regards,

ZZ

=================================================================== 
--- cpudetect.c (revision 27949) 
+++ cpudetect.c (working copy) 
@@ -144,6 +144,8 @@ 
  caps->hasMMX  = (regs2[3] & (1 << 23 )) >> 23; // 0x0800000 
  caps->hasSSE  = (regs2[3] & (1 << 25 )) >> 25; // 0x2000000 
  caps->hasSSE2 = (regs2[3] & (1 << 26 )) >> 26; // 0x4000000 
+ caps->hasSSSE3 = (regs2[3] & (1 << 9 )) >>  9; // 0x00000200 
+ caps->hasSSE4a = (regs2[3] & (1 << 6 )) >>  6; // 0x00000040 
  caps->hasMMX2 = caps->hasSSE; // SSE cpus supports mmxext too 
  cl_size = ((regs2[1] >> 8) & 0xFF)*8; 
  if(cl_size) caps->cl_size = cl_size; 
@@ -496,6 +498,8 @@ 
  caps->has3DNowExt=0; 
  caps->hasSSE=0; 
  caps->hasSSE2=0; 
+ caps->hasSSSE3=0; 
+ caps->hasSSE4a=0; 
  caps->isX86=0; 
  caps->hasAltiVec = 0; 
 #ifdef HAVE_ALTIVEC    
Index: cpudetect.h 
=================================================================== 
--- cpudetect.h (revision 27949) 
+++ cpudetect.h (working copy) 
@@ -44,6 +44,8 @@ 
  int has3DNowExt; 
  int hasSSE; 
  int hasSSE2; 
+ int hasSSSE3; 
+ int hasSSE4a; 
  int isX86; 
  unsigned cl_size; /* size of cache line */ 
         int hasAltiVec;


More information about the MPlayer-dev-eng mailing list