[FFmpeg-devel] [RFC] clobbers for XMM registers

Alexander Strange astrange
Thu Sep 30 21:36:08 CEST 2010


On Sep 30, 2010, at 12:59 PM, M?ns Rullg?rd wrote:

> Alexander Strange <astrange at ithinksw.com> writes:
> 
>> On Thursday, September 30, 2010, M?ns Rullg?rd <mans at mansr.com> wrote:
>>> "Ronald S. Bultje" <rsbultje at gmail.com> writes:
>>> 
>>>> Hi,
>>>> 
>>>> 2010/9/28 M?ns Rullg?rd <mans at mansr.com>:
>>>>> Michael Niedermayer <michaelni at gmx.at> writes:
>>>>>> On Tue, Sep 28, 2010 at 09:36:40AM -0400, Ronald S. Bultje wrote:
>>>>>>> On Tue, Sep 28, 2010 at 8:34 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
>>>>>>>> you want to execute code from vp3dsp_sse2.c on a pre SSE cpu?
>>>>>>> 
>>>>>>> All _sse2 files are templates files that are included in dsputil_mmx.c
>>>>>>> or similar.
>>>>>> 
>>>>>> we could add the flags to dsputil_mmx then
>>>>> 
>>>>> That would allow the compiler to use SSE instructions in functions
>>>>> that should be MMX only.
>>>> 
>>>> I'm gonna start kicking this subject until it's solved. Come on guys,
>>>> keep this moving. Why don't we make it (the clobbering) a macro and
>>>> only enable this on x86-64. Don't forget all xmm registers are
>>>> caller-save on x86-32 and x86-64 has no issues with marking clobbers
>>> 
>>> The issue is not fundamentally about caller vs callee saved
>>> registers.  It is about telling the compiler which registers are
>>> clobbered, so that it can save and restore them if necessary.
>>> 
>>> The missing clobber lists caused the FFT to fail with suncc, despite
>>> all the used registers being caller-saved.  Apparently the compiler
>>> was using them for something outside the asm block.
>>> 
>>>> (and even if it did, -msse is fine, there is no single x86-64 CPU that
>>>> does not support SSE). We could consider making it as simple as :::
>>>> CLOBBER_IF_X86_64("%xmm6", "%xmm7",) "%eax" which evaluates to the
>>>> string in it (including commas) on x86-64 and nothing on x86-32 (and
>>>> omit the comma if that's the only thing in the clobberlist).
>>> 
>>> We obviously need a conditional of some kind, but it should be tested
>>> in configure and applied whenever the compiler recognises xmm registers.
>>> It is, however, not quite as straight forward as you make it out.
>>> Stray commas are not allowed, nor is an empty list.
>>> 
>>> One possible solution is to have the macro always include "cc".  Most
>>> of the asm blocks do clobber the condition flags, and for any that do
>>> not, it is unlikely to make any difference.  It also seems that
>>> including the stack pointer in the clobber list is ignored, although
>>> relying on this seems dubious at best.
>> 
>> asm blocks always clobber cc whether or not you put it in the list, so
>> the "cc" clobber is a no-op.
> 
> In that case always adding it is certainly harmless, and allows a
> single macro to be used.  Where is this documented BTW?  I couldn't
> find anything on the specifics of "cc" clobbers on x86.

It isn't. I don't think it's very surprising behavior, though - pretty much all the arithmetic instructions on x86 set eflags, so otherwise using 'add' in asm would break if you didn't clobber cc.

This applies to any architecture that doesn't have a "real" condition register. So PPC needs explicit "cc" clobbers (because it has cr0) but x86 doesn't.

...if you care about this, it's probably easier to use clang + -emit-llvm and see what it emits for the asm instruction.




More information about the ffmpeg-devel mailing list