[FFmpeg-devel] [PATCH] Fix apply_welch_window_sse2 compilation on Mac OS X/x86
Loren Merritt
lorenm
Thu Oct 18 13:17:48 CEST 2007
On Thu, 18 Oct 2007, Pierre d'Herbemont wrote:
> On Oct 18, 2007, at 2:23 AM, Trent Piepho wrote:
>> On Thu, 18 Oct 2007, Pierre d'Herbemont wrote:
>>> On Oct 17, 2007, at 10:05 PM, Loren Merritt wrote:
>>>
>>>> But I don't think the fix is correct: It's not valid to jmp from
>>>> one asm block to another. If you split the asm block, you also have
>>>> to write the loop part in C. And the reason I put it in asm is that
>>>> gcc doesn't generate as efficient a loop.
>>>
>>> There is no problem in jumping from one asm block to an other.
>>> However there can be problem with labels.
>>
>> Suppose gcc _did_ spill a register to the stack?
>
> Right. But again, in this case, we are only assuming that %xmmx
> registers won't be messed up, we have don't use the stack directly,
> nor any general registers. We may not rely on this for sure.
> Experience prove that in this case it just work.
The whole point of splitting the asm block was to allow gcc to spill
registers in between, because it doesn't have 6 general regs free. And
look at your own disassembly: it did. So you jump from the 2nd asm block
to the 1st without running the appropriate spilling code, and run the 1st
block with the register values from the 2nd. Then you run the
initialization code for the 2nd block again, which gcc expected to only
run once.
BTW, spilling shouldn't be needed. It's possible to write the loop with 5
regs, but that's slower than 6 if you have 6. Ideally I'd be able write
the loop part in C and gcc would use 5 or 6 regs for addressing depending
on what's available, but that's not what happens in practice.
>> You also have no guarantee that gcc won't move code around so that
>> what was before or after an asm is now on the other side. And no,
>> making the asm volatile does not work! Suppose you code this:
>>
>> for(i=0;i<100;i++) {
>> asm volatile("1: ..." : : "=r"(i));
>> asm volatile("... je 1b" : :);
>> }
>>
>> gcc might decide the perfect place to put the i++ is between the
>> two asm
>> blocks, which will completely mess up the for loop, since now i is
>> incremented inside the asm loop.
>
> I am not sure of that. Since we are using "=r"(i), gcc can't put the i
> ++ part afterwards. Or that's a bug.
> Again, in our specific case we don't have that kind of problem anyway.
Details. Here's a trivial case that breaks in gcc-4.1.2:
void func0(void)
{
int i=9, j=0;
asm volatile("1: \n inc %0 \n dec %1 \n jg 1b \n" :"+r"(j), "+r"(i));
}
void func1(void)
{
int i=9, j=0;
asm volatile("1: \n inc %0 \n" :"+r"(j));
asm volatile("dec %0 \n jg 1b \n" :"+r"(i));
}
Disassembly of section .text:
00000000 <func0>:
0: 31 c0 xor %eax,%eax
2: ba 09 00 00 00 mov $0x9,%edx
7: 40 inc %eax
8: 4a dec %edx
9: 7f fc jg 7 <func0+0x7>
b: c3 ret
00000010 <func1>:
10: 31 c0 xor %eax,%eax
12: 40 inc %eax
13: ba 09 00 00 00 mov $0x9,%edx
18: 4a dec %edx
19: 7f f7 jg 12 <func1+0x2>
1b: c3 ret
Note how the initialization of i was moved inside the loop, thus making
it an infinite loop. And this is the same kind of breakage as occurs in
your patch.
--Loren Merritt
More information about the ffmpeg-devel
mailing list