[FFmpeg-devel] [RFC] clobbers for XMM registers

Ramiro Polla ramiro.polla
Thu Sep 30 19:19:49 CEST 2010

2010/9/30 M?ns Rullg?rd <mans at mansr.com>:
> Alexander Strange <astrange at ithinksw.com> writes:
>> On Thursday, September 30, 2010, M?ns Rullg?rd <mans at mansr.com> wrote:
>>> "Ronald S. Bultje" <rsbultje at gmail.com> writes:
>>>> 2010/9/28 M?ns Rullg?rd <mans at mansr.com>:
>>>>> Michael Niedermayer <michaelni at gmx.at> writes:
>>>>>> On Tue, Sep 28, 2010 at 09:36:40AM -0400, Ronald S. Bultje wrote:
>>>>>>> On Tue, Sep 28, 2010 at 8:34 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
>>>>>>> > you want to execute code from vp3dsp_sse2.c on a pre SSE cpu?
>>>>>>> All _sse2 files are templates files that are included in dsputil_mmx.c
>>>>>>> or similar.
>>>>>> we could add the flags to dsputil_mmx then
>>>>> That would allow the compiler to use SSE instructions in functions
>>>>> that should be MMX only.
>>>> I'm gonna start kicking this subject until it's solved. Come on guys,
>>>> keep this moving. Why don't we make it (the clobbering) a macro and
>>>> only enable this on x86-64. Don't forget all xmm registers are
>>>> caller-save on x86-32 and x86-64 has no issues with marking clobbers
>>> The issue is not fundamentally about caller vs callee saved
>>> registers. ?It is about telling the compiler which registers are
>>> clobbered, so that it can save and restore them if necessary.
>>> The missing clobber lists caused the FFT to fail with suncc, despite
>>> all the used registers being caller-saved. ?Apparently the compiler
>>> was using them for something outside the asm block.
>>>> (and even if it did, -msse is fine, there is no single x86-64 CPU that
>>>> does not support SSE). We could consider making it as simple as :::
>>>> CLOBBER_IF_X86_64("%xmm6", "%xmm7",) "%eax" which evaluates to the
>>>> string in it (including commas) on x86-64 and nothing on x86-32 (and
>>>> omit the comma if that's the only thing in the clobberlist).
>>> We obviously need a conditional of some kind, but it should be tested
>>> in configure and applied whenever the compiler recognises xmm registers.
>>> It is, however, not quite as straight forward as you make it out.
>>> Stray commas are not allowed, nor is an empty list.
>>> One possible solution is to have the macro always include "cc". ?Most
>>> of the asm blocks do clobber the condition flags, and for any that do
>>> not, it is unlikely to make any difference. ?It also seems that
>>> including the stack pointer in the clobber list is ignored, although
>>> relying on this seems dubious at best.
>> asm blocks always clobber cc whether or not you put it in the list, so
>> the "cc" clobber is a no-op.
> In that case always adding it is certainly harmless, and allows a
> single macro to be used.

What about
#    define XMM_CLOBBERS(a, ...) __VA_ARGS__
#    define XMM_CLOBBERS(a, ...) a

to be used as in lavc/x86/fft_sse.c:
        :"+r"(j), "+r"(k)
        :"r"(output+n4), "r"(output+n4*3),
        XMM_CLOBBERS(, : "%xmm0", "%xmm1", "%xmm7")

More information about the ffmpeg-devel mailing list