[FFmpeg-devel] Subject: Re: swscale-test segfault with 64-bit icc 11.1
Måns Rullgård
mans
Wed Jul 21 01:23:41 CEST 2010
"Winterton, Richard" <richard.winterton at intel.com> writes:
>> On Sat, Jul 17, 2010 at 04:50:10PM -0300, Ramiro Polla wrote:
>>> Hi,
>>>
>>> swscale-test segfaults when built with 64-bit icc 11.1 (20100414). The
>>> function that fails is hyscale_fast_MMX2(). Here's a disassembly of
>>> the function:
>>> a4b0: 53 push %rbx
>>> a4b1: 48 8b 87 c8 30 00 00 mov 0x30c8(%rdi),%rax
>>> a4b8: 4c 8b 9f a8 30 00 00 mov 0x30a8(%rdi),%r11
>>> a4bf: 48 89 74 24 d8 mov %rsi,-0x28(%rsp)
>>> a4c4: 45 89 ca mov %r9d,%r10d
>>> a4c7: 48 89 54 24 e0 mov %rdx,-0x20(%rsp)
>>> a4cc: 41 f7 da neg %r10d
>>> a4cf: 83 bf 10 31 00 00 00 cmpl $0x0,0x3110(%rdi)
>>> a4d6: 48 89 4c 24 e8 mov %rcx,-0x18(%rsp)
>>> a4db: 48 89 44 24 d0 mov %rax,-0x30(%rsp)
>>> a4e0: 48 8b 87 00 31 00 00 mov 0x3100(%rdi),%rax
>>> a4e7: 4c 89 5c 24 f0 mov %r11,-0x10(%rsp)
>>> a4ec: 48 89 44 24 f8 mov %rax,-0x8(%rsp)
>>> a4f1: 0f 84 05 01 00 00 je a5fc <hyscale_fast_MMX2+0x14c>
>>> a4f7: 0f ef ff pxor %mm7,%mm7
>>> a4fa: 48 8b 4c 24 e8 mov -0x18(%rsp),%rcx
>>> a4ff: 48 8b 7c 24 d8 mov -0x28(%rsp),%rdi
>>> a504: 48 8b 54 24 f0 mov -0x10(%rsp),%rdx
>>> a509: 48 8b 5c 24 d0 mov -0x30(%rsp),%rbx
>>> a50e: 48 31 c0 xor %rax,%rax
>>> a511: 0f 18 01 prefetchnta (%rcx)
>>> a514: 0f 18 41 20 prefetchnta 0x20(%rcx)
>>> a518: 0f 18 41 40 prefetchnta 0x40(%rcx)
>>> a51c: 8b 33 mov (%rbx),%esi
>>> a51e: ff 54 24 f8 callq *-0x8(%rsp)
>>> a522: 8b 34 03 mov (%rbx,%rax,1),%esi
>>> a525: 48 01 f1 add %rsi,%rcx
>>> a528: 48 01 c7 add %rax,%rdi
>>> a52b: 48 31 c0 xor %rax,%rax
>>> a52e: 8b 33 mov (%rbx),%esi
>>> a530: ff 54 24 f8 callq *-0x8(%rsp)
>>> [...]
>>>
>>> Since no functions are being called in C inside hyscale_fast_MMX2(),
>>> the compiler decides it's ok to use -0x8(%rsp) instead of properly
>>> sub'ing rsp, as it supposedly won't get overwritten. But in this case
>>> we call the mmx2 code inside asm, overwriting -0x8(%rsp). The second
>>> callq goes to a522, and when run again, it tries to run some random
>>> code that was the next pointer on the stack. gcc does the same thing,
>>> but it seems it leaves -0x8(%rsp) alone and uses the stack -0x10(%rsp)
>>> and below.
>>>
>>> Is this a compiler bug (as in should it detect a call inside asm)?
>>> Could (or should) we hint to the compiler that a call is being made
>>> inside the asm block (I don't even know if this is possible)?
>> I would suggest that you ask intel (after checking the manual).
>> its surely possible to workaround this in various ways but this
>> feels unclean.
>
> I believe I was able to duplicate the issue described replicating
> the segment fault with a small snippet. I checked with a compiler
> engineer at and he replied with the following:
>
> The compiler is unable to detect which stack spaces the users uses
> in inlined asm, and avoid them. As a workaround, you can use
> -mno-red-zone to disable the optimization where we use the lower
> part of ESP in leaf functions, but this will disable red-zone for
> all other leaf functions also, and may cost performance.
>
> I can look into a modification of the assembly to work around the
> problem if you still have the issue.
This problem could potentially appear with any gcc version as well; I
ran into it on PPC64 a while ago. There is no point making only one
compiler safe in this manner, since we'd still need solve it for the
other ones.
--
M?ns Rullg?rd
mans at mansr.com
More information about the ffmpeg-devel
mailing list