[FFmpeg-devel] [PATCH] VP8 luma(16) inner-MB H/V loopfilter MMX/SSE2

Sun Jul 11 18:52:04 CEST 2010

On Sun, 11 Jul 2010, Ronald S. Bultje wrote:

> You'll notice that the sse2 is significantly slower here, my rough
> guess is that this is because of my shitty CPU which pretty much
> emulates xmm-ops through mmx-ops, so it doesn't add a lot of benefit
> other than not having to setup the loop for doing the second 8 pixels,
> combined with the added complexity of a 8x16 transpose before the
> actual filter. I'm betting that on an actual sse2-supporting CPU
> (Jason?), this would still be faster, but we might want to put this
> under a FF_MM_SSE2_NOT_SHITTY flag or something along those lines. If
> you think my code is shitty, comments are welcome also. ;-).

Rather than special-casing most of the functions, we at x264 declared that 
Core1 doesn't have sse2, and changed the cpuid parser accordingly.
If you want to support the few cases where sse2 is slightly faster than 
mmx, I recommend picking a different flag for that and applying it only 
when you've tested on Core1, so that FF_MM_SSE2 can be trusted to dwim in 
the usual case.

--Loren Merritt
-------------- next part --------------

diff --git a/libavcodec/x86/cpuid.c b/libavcodec/x86/cpuid.c
index 1ed4d2e..8cd6714 100644
--- a/libavcodec/x86/cpuid.c
+++ b/libavcodec/x86/cpuid.c
@@ -42,6 +42,8 @@ int mm_support(void)
     int rval = 0;
     int eax, ebx, ecx, edx;
     int max_std_level, max_ext_level, std_caps=0, ext_caps=0;
+    int family=0, model=0;
+    union { int i[3]; char c[12]; } vendor;
 
 #if ARCH_X86_32
     x86_reg a, c;
@@ -70,10 +72,12 @@ int mm_support(void)
         return 0; /* CPUID not supported */
 #endif
 
-    cpuid(0, max_std_level, ebx, ecx, edx);
+    cpuid(0, max_std_level, vendor.i[0], vendor.i[2], vendor.i[1]);
 
     if(max_std_level >= 1){
         cpuid(1, eax, ebx, ecx, std_caps);
+        family = ((eax>>8)&0xf) + ((eax>>20)&0xff);
+        model  = ((eax>>4)&0xf) + ((eax>>12)&0xf0);
         if (std_caps & (1<<23))
             rval |= FF_MM_MMX;
         if (std_caps & (1<<25))
@@ -108,6 +112,14 @@ int mm_support(void)
             rval |= FF_MM_MMX2;
     }
 
+    if (!strncmp(vendor.c, "GenuineIntel", 12) &&
+        family == 6 && (model == 9 || model == 13 || model == 14)) {
+        /* 6/9 (pentium-m "banias"), 6/13 (pentium-m "dothan"), and 6/14 (core1 "yonah")
+         * theoretically support sse2, but it's usually slower than mmx,
+         * so let's just pretend they don't. */
+        rval &= ~(FF_MM_SSE2|FF_MM_SSE3);
+    }
+
 #if 0
     av_log(NULL, AV_LOG_DEBUG, "%s%s%s%s%s%s%s%s%s%s\n",
         (rval&FF_MM_MMX) ? "MMX ":"",