[Mplayer-dev-eng] [LONG] gcc-3.0.x: problems with mmx & ffdivx + patch

pl p_l at tfz.net
Tue Jul 24 18:58:52 CEST 2001


Hi,

On Mon, Jul 23, 2001 at 02:16:27PM +0200, pl wrote:
[...]
> Btw, it compiles with 3.0.x _but_ I think there is a gcc-3.0.x bug (?)
> with the ffdivx driver when mmx support is enabled => the image
> is completely garbled...
[...]

Well... Since I don't have a C bible available I can't say who's
wrong/right but here is what I found.

gcc-3.0 does not produce the code expected at optimization level -Ox (x>=1)
for main/libavcodec/i386/mpegvideo.c with MMX enabled.

The problem comes from the use of the const attribute for the two data arrays
(mm_wabs[], mm_wone[]).

As far as I can tell, expected code (first part of Nick's inline asm) is:

[...]
        movd    %edi, %mm6
        punpckldq %mm6, %mm6
        movq    mm_wabs, %mm4		*
        movq    %mm6, %mm7
        movq    mm_wone, %mm5		*
        packssdw %mm6, %mm7
        pxor    %mm6, %mm6
[...]
which effectively corresponds to what is expected load register %mm? with the
8 bytes of mm_w???[].

gcc-3.0 at -O4 generate the following "evil" code:
[...]
.LC113:
        .value  1
        .align 2
.LC114:
        .value  65535
        .text
        .align 16
[...]
        movd    %edi, %mm6
        punpckldq %mm6, %mm6
        movq    .LC114, %mm4		*	
        movq    %mm6, %mm7
        movq    .LC113, %mm5		*
        packssdw %mm6, %mm7
        pxor    %mm6, %mm6
[...]

And there it does not load completely what it should...
Therefore, it sucks.
The image is completely garbled, women yell, children cry, ...


So... as I don't know how the C standard defines the behavior of 
constant arrays VS inlined asm, here are possible fixes:

Fixes:
------
 - wait for gcc team to fix it

 - explicitely not use gcc-3.0.x for this file

 - use -O0 for this file

 - remove the const attribute for:
     static const unsigned short mm_wabs[4] __attribute__ ...
     static const unsigned short mm_wone[4] __attribute__ ...

 - change unsigned short[4] to unsigned long long
   (see patch enclosed) (preferred and maybe cleaner)


Best regards.
--
pl
-------------- next part --------------
--- mpegvideo.c	Tue Jul 24 18:09:47 2001
+++ mpegvideo-fixed.c	Tue Jul 24 18:09:31 2001
@@ -19,8 +19,8 @@
  * Optimized for ia32 cpus by Nick Kurshev <nickols_k at mail.ru>
  */
 #ifdef HAVE_MMX 
-static const unsigned short mm_wabs[4] __attribute__ ((aligned(8))) = { 0xffff, 0xffff, 0xffff, 0xffff };
-static const unsigned short mm_wone[4] __attribute__ ((aligned(8))) = { 0x1, 0x1, 0x1, 0x1 };
+static const unsigned long long mm_wabs __attribute__ ((aligned(8))) = 0xFFFFFFFFFFFFFFFF;
+static const unsigned long long mm_wone __attribute__ ((aligned(8))) = 0x0001000100010001;
 
 /*
   NK:
@@ -89,7 +89,7 @@
 	"movq	%1, %%mm5\n\t"
 	"packssdw %%mm6, %%mm7\n\t" /* mm7 = qscale | qscale | qscale | qscale */
 	"pxor	%%mm6, %%mm6\n\t"
-	::"g"(qscale),"m"(mm_wone[0]),"m"(mm_wabs[0]):"memory");
+	::"g"(qscale),"m"(mm_wone),"m"(mm_wabs):"memory");
         for(;i<64;i+=4) {
 		__asm __volatile(
 			"movq	%1, %%mm0\n\t"
@@ -142,7 +142,7 @@
 	"movq	%1, %%mm5\n\t"
 	"packssdw %%mm6, %%mm7\n\t" /* mm7 = qscale | qscale | qscale | qscale */
 	"pxor	%%mm6, %%mm6\n\t"
-	::"g"(qscale),"m"(mm_wone[0]),"m"(mm_wabs[0]):"memory");
+	::"g"(qscale),"m"(mm_wone),"m"(mm_wabs):"memory");
         for(;i<64;i+=4) {
 		__asm __volatile(
 			"movq	%1, %%mm0\n\t"


More information about the MPlayer-dev-eng mailing list