[FFmpeg-devel] [PATCH] update doc/optimization.txt
Ronald S. Bultje
Wed Sep 22 23:13:37 CEST 2010
On Wed, Sep 22, 2010 at 11:31 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Wed, Sep 22, 2010 at 09:54:42AM -0400, Ronald S. Bultje wrote:
>> +Do not use multiple inline asm blocks in a single C function. The compiler is
>> +not required to maintain register values between asm blocks, and depending on
>> +this behaviour can break with any future version of gcc.
> Using multiple asm blocks in a C function and having code that depends on
> register values to be maintained across asm blocks are 2 different things.
> Please use precisse language and make sure what you mean is also what is
> written there
How about the following:
Do not expect a compiler to maintain values in your registers between separate
(inline) asm code blocks. It is not required to. For example, this is bad:
__asm__("movdqa $0, %%xmm7" : src);
/* do something */
__asm__("movdqa %%xmm7, $1" : dst);
- first of all, you're assuming that the compiler will not use xmm7 in
between the two asm blocks. It probably won't when you test it, but it's
a poor assumption that will break at some point for some --cpu compiler flag
- secondly, you didn't mark xmm7 as clobbered. If you did, the compiler would
have restored the original value of xmm7 after the first asm block, thus
rendering the whole block of code invalid
Code that depends on data in registries being untouched, should be written as
a single __asm__() statement. Ideally, a single function contains only one
>> +For x86, mark registers that are clobbered in your asm. This means both
>> +general x86 registers (e.g. eax) as well as XMM registers. This last one is
>> +particularly important on Win64, where xmm6-15 are callee-save, and not
>> +restoring their contents leads to undefined results. In external asm, you do
>> +this by using: "cglobal functon_name, num_args, num_regs, num_xmm_regs". In
>> +inline asm, you specify clobbered registers at the end of your asm:
>> +__asm__(".." ::: "%eax").
> This recommandition has to be cross checked with generated code, for example
> we must make sure gcc does not emmit a *emms for mmx register clobbers.
For the example given here, we already use this, e.g. such code is
used in two places in libavcodec/cabac.h.
As for marking mmx/xmm clobbers, note that they're missing in this
example for inline asm. The reason I'm not (trying to) mark(ing)
mmx/xmm register clobbers in the inline asm example is because we
don't have a good system for that yet. Adding a clobber for "%xmm7"
breaks on some systems, so this requires some #ifdef'ery, which should
probably be in a utility header, and then use macros in these
functions. Reimar (ping, hi!) had a patch for that and I'm hoping that
my recent ""progress"" on making Win64 pass fate will motivate him to
get that patch going. What you're suggesting + updating this example
to include a clobber-mark for xmm registers should probably be part of
>> -Use __asm__() instead of intrinsics. The latter requires a good optimizing compiler
>> -which gcc is not.
>> +For x86, use yasm or __asm__(), do not use intrinsics. The latter requires a
>> +good optimizing compiler which gcc is not.
> non x86 intrinsics are none the better and the recommandition against them
> should stay, also i see no reason to mention x86 anyway, if someone manages
> to generate sparc asm with yasm cleanly, why not ...
Oops, you're right, sorry for adding "for x86", I don't know why I did that.
>> +Inline asm vs. external asm
>> +Both inline asm (__asm__("..") in a .c file, handled by a compiler such as gcc)
>> +and external asm (.s or .asm files, handled by an assembler such as yasm/nasm)
>> +are accepted in FFmpeg. Which one to use differs per specific case.
>> +- if your code is intended to be inlined in a C function, inline asm is always
>> + ? better, because external asm cannot be inlined
>> +- if your code calls external functions, yasm is always better
>> +- if your code takes huge and complex structs as function arguments (e.g.
>> + ? MpegEncContext; note that this is not ideal and is discouraged if there
>> + ? are alternatives), then inline asm is always better, because predicting
>> + ? member offsets in complex structs is almost impossible. It's safest to let
>> + ? the compiler take care of that
>> +- in many cases, both can be used and it just depends on the preference of the
>> + ? person writing the asm. For new asm, the choice is up to you. For existing
>> + ? asm, you'll likely want to maintain whatever form it is currently in unless
>> + ? there is a good reason to change it.
> also add that inline<->yasm changes must go in seperate patches please
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 3286 bytes
Desc: not available
More information about the ffmpeg-devel