[MPlayer-dev-eng] ppc runtime-cpu-detection fails with gcc 3.3
Alan Curry
pacman at TheWorld.com
Sat Feb 25 01:14:26 CET 2006
Luca Barbato writes the following:
>
>Jan Paul Schmidt wrote:
>>> BTW, if somebody has OS X 10.3, please report...
>>
>> $gcc -maltivec -mabi=altivec -S -o - gccbug.c | grep stv
>
>-faltivec ....
Do you mean that Apple has a compiler that not only doesn't recognize the
upstream -m flags, but also _ignores_ unrecognized -m flags, allowing the
mistake to go undetected? Ouch.
Anyway, I'm now convinced that $cc_major and $cc_minor are pretty useless,
since people are out there making forks of gcc with version numbers that
don't indicate a particular relationship with the real gcc of that version.
Furthermore, I've made a new discovery (thanks to Andrea's work on the Amiga)
that is bad for runtime-cpudetection. When runtime-cpudetection is enabled,
we compile everything with -maltivec -mabi=altivec. -maltivec is necessary to
get at the functions in <altivec.h>, and I'm not sure what -mabi=altivec
actually does...
The problem is that -maltivec also advises gcc that it may emit altivec
instructions to implement ordinary C code. As of 4.0.2 it doesn't do very
much of that, but it does have special builtin memory-zeroing implementation
that uses vxor v0,v0,v0 followed by a series of stvx v0,rFOO,rBAR to zero
smallish blocks of memory in vector-sized bursts. It can only do this if it
knows that the memory to be zeroed (rFOO+rBAR) is aligned on a 16-byte
boundary, which usually is not known at compile time.
$ cat /tmp/t.c
#include <string.h>
extern char buf[16] __attribute__((aligned(16)));
void foo(void)
{
memset(buf, 0, sizeof buf);
}
$ /usr/local/gcc4/bin/gcc -maltivec -S -O -o - /tmp/t.c
.file "t.c"
.section ".text"
.align 2
.globl foo
.type foo, @function
foo:
stwu 1,-16(1)
vxor 0,0,0
lis 9,buf at ha
la 9,buf at l(9)
stvx 0,0,9
addi 1,1,16
blr
.size foo, .-foo
.ident "GCC: (GNU) 4.0.2"
.section .note.GNU-stack,"", at progbits
Apparently the Amiga gcc has better knowledge of the alignment of things
(like stack alignment maybe?) so it puts altivec instructions in lots of
places, where ordinary buffers are zeroed, including FD_ZERO'ing of fd_sets.
This is a sign that runtime-cpudetection for altivec currently works only by
luck. We're using -maltivec and expecting the altivec instructions to appear
only where vec_*() functions are explicitly called. That's a bad assumption,
and it's only going to get worse as gcc gets smarter and becomes capable of
using altivec instructions to implement ordinary C loops (autovectorization).
The only way around this is going to be to keep all the altivec-enhanced code
in separate files, and only compile those files with -maltivec, when
runtime-cpudetection is enabled. I haven't looked too closely, but I bet
that's not going to be easy. It definitely conflicts with the idea of
swscale_template.c being #included multiple times in swscale.c, with and
without altivec enabled. The whole file needs to be compile with -maltivec to
make the altivec parts work, and that means altivec instructions may appear
in the "non-altivec" code too.
Then there's the possibility of ditching the C language extensions and just
writing the altivec code with asm() like the MMX guys do...
More information about the MPlayer-dev-eng
mailing list