[FFmpeg-devel] [PATCH] update doc/optimization.txt

Michael Niedermayer michaelni
Wed Sep 22 23:50:32 CEST 2010


On Wed, Sep 22, 2010 at 05:13:37PM -0400, Ronald S. Bultje wrote:
> Hi,
> 
> On Wed, Sep 22, 2010 at 11:31 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
> > On Wed, Sep 22, 2010 at 09:54:42AM -0400, Ronald S. Bultje wrote:
> >> +Do not use multiple inline asm blocks in a single C function. The compiler is
> >> +not required to maintain register values between asm blocks, and depending on
> >> +this behaviour can break with any future version of gcc.
> >
> > Using multiple asm blocks in a C function and having code that depends on
> > register values to be maintained across asm blocks are 2 different things.
> > Please use precisse language and make sure what you mean is also what is
> > written there
> 
> How about the following:
> 
> Do not expect a compiler to maintain values in your registers between separate
> (inline) asm code blocks. It is not required to. For example, this is bad:
> __asm__("movdqa $0, %%xmm7" : src);
> /* do something */
> __asm__("movdqa %%xmm7, $1" : dst);

ok above is a bit long but ok


[...]
> 
> >> +For x86, mark registers that are clobbered in your asm. This means both
> >> +general x86 registers (e.g. eax) as well as XMM registers. This last one is
> >> +particularly important on Win64, where xmm6-15 are callee-save, and not
> >> +restoring their contents leads to undefined results. In external asm, you do
> >> +this by using: "cglobal functon_name, num_args, num_regs, num_xmm_regs". In
> >> +inline asm, you specify clobbered registers at the end of your asm:
> >> +__asm__(".." ::: "%eax").
> >
> > This recommandition has to be cross checked with generated code, for example
> > we must make sure gcc does not emmit a *emms for mmx register clobbers.
> 
> For the example given here, we already use this, e.g. such code is
> used in two places in libavcodec/cabac.h.
> 
> As for marking mmx/xmm clobbers, note that they're missing in this
> example for inline asm. The reason I'm not (trying to) mark(ing)
> mmx/xmm register clobbers in the inline asm example is because we
> don't have a good system for that yet. Adding a clobber for "%xmm7"
> breaks on some systems, so this requires some #ifdef'ery, which should
> probably be in a utility header, and then use macros in these
> functions. Reimar (ping, hi!) had a patch for that and I'm hoping that
> my recent ""progress"" on making Win64 pass fate will motivate him to
> get that patch going. What you're suggesting + updating this example
> to include a clobber-mark for xmm registers should probably be part of
> that patch.

id like to point out that xmm/mmx clobbers are only ok if either
1. they are needed for correct functioning of actual binary code
2. they cause no speedloss and add no instructions

or to say it the other way around iam against making our code slower
for language lawyering correctness

this applies to register saving in yasm as well
so yes i oppose saving xmm registers on win64 if its unneeded
you can do such saveing & restoring at an outer layer like the slice decode
function so the clobbering cannot leak but doing it in each optimized function
is silly if its unneeded. (of course its not silly if it is needed ...)


[...]



-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Asymptotically faster algorithms should always be preferred if you have
asymptotical amounts of data
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100922/568b7ed3/attachment.pgp>



More information about the ffmpeg-devel mailing list