[Mplayer-dev-eng] [LONG] gcc-3.0.x: problems with mmx & ffdivx + patch
pl
p_l at tfz.net
Tue Jul 24 18:58:52 CEST 2001
Hi,
On Mon, Jul 23, 2001 at 02:16:27PM +0200, pl wrote:
[...]
> Btw, it compiles with 3.0.x _but_ I think there is a gcc-3.0.x bug (?)
> with the ffdivx driver when mmx support is enabled => the image
> is completely garbled...
[...]
Well... Since I don't have a C bible available I can't say who's
wrong/right but here is what I found.
gcc-3.0 does not produce the code expected at optimization level -Ox (x>=1)
for main/libavcodec/i386/mpegvideo.c with MMX enabled.
The problem comes from the use of the const attribute for the two data arrays
(mm_wabs[], mm_wone[]).
As far as I can tell, expected code (first part of Nick's inline asm) is:
[...]
movd %edi, %mm6
punpckldq %mm6, %mm6
movq mm_wabs, %mm4 *
movq %mm6, %mm7
movq mm_wone, %mm5 *
packssdw %mm6, %mm7
pxor %mm6, %mm6
[...]
which effectively corresponds to what is expected load register %mm? with the
8 bytes of mm_w???[].
gcc-3.0 at -O4 generate the following "evil" code:
[...]
.LC113:
.value 1
.align 2
.LC114:
.value 65535
.text
.align 16
[...]
movd %edi, %mm6
punpckldq %mm6, %mm6
movq .LC114, %mm4 *
movq %mm6, %mm7
movq .LC113, %mm5 *
packssdw %mm6, %mm7
pxor %mm6, %mm6
[...]
And there it does not load completely what it should...
Therefore, it sucks.
The image is completely garbled, women yell, children cry, ...
So... as I don't know how the C standard defines the behavior of
constant arrays VS inlined asm, here are possible fixes:
Fixes:
------
- wait for gcc team to fix it
- explicitely not use gcc-3.0.x for this file
- use -O0 for this file
- remove the const attribute for:
static const unsigned short mm_wabs[4] __attribute__ ...
static const unsigned short mm_wone[4] __attribute__ ...
- change unsigned short[4] to unsigned long long
(see patch enclosed) (preferred and maybe cleaner)
Best regards.
--
pl
-------------- next part --------------
--- mpegvideo.c Tue Jul 24 18:09:47 2001
+++ mpegvideo-fixed.c Tue Jul 24 18:09:31 2001
@@ -19,8 +19,8 @@
* Optimized for ia32 cpus by Nick Kurshev <nickols_k at mail.ru>
*/
#ifdef HAVE_MMX
-static const unsigned short mm_wabs[4] __attribute__ ((aligned(8))) = { 0xffff, 0xffff, 0xffff, 0xffff };
-static const unsigned short mm_wone[4] __attribute__ ((aligned(8))) = { 0x1, 0x1, 0x1, 0x1 };
+static const unsigned long long mm_wabs __attribute__ ((aligned(8))) = 0xFFFFFFFFFFFFFFFF;
+static const unsigned long long mm_wone __attribute__ ((aligned(8))) = 0x0001000100010001;
/*
NK:
@@ -89,7 +89,7 @@
"movq %1, %%mm5\n\t"
"packssdw %%mm6, %%mm7\n\t" /* mm7 = qscale | qscale | qscale | qscale */
"pxor %%mm6, %%mm6\n\t"
- ::"g"(qscale),"m"(mm_wone[0]),"m"(mm_wabs[0]):"memory");
+ ::"g"(qscale),"m"(mm_wone),"m"(mm_wabs):"memory");
for(;i<64;i+=4) {
__asm __volatile(
"movq %1, %%mm0\n\t"
@@ -142,7 +142,7 @@
"movq %1, %%mm5\n\t"
"packssdw %%mm6, %%mm7\n\t" /* mm7 = qscale | qscale | qscale | qscale */
"pxor %%mm6, %%mm6\n\t"
- ::"g"(qscale),"m"(mm_wone[0]),"m"(mm_wabs[0]):"memory");
+ ::"g"(qscale),"m"(mm_wone),"m"(mm_wabs):"memory");
for(;i<64;i+=4) {
__asm __volatile(
"movq %1, %%mm0\n\t"
More information about the MPlayer-dev-eng
mailing list