[FFmpeg-devel] Question about -fPIC usage for some files
Sun Feb 10 04:23:41 CET 2008
On Fri, 8 Feb 2008, Michael Niedermayer wrote:
> On Fri, Feb 08, 2008 at 01:03:56PM -0800, Trent Piepho wrote:
> > On Fri, 8 Feb 2008, Thorsten Jordan wrote:
> > > > Why does it fail with pic for you?
> > > the same problem that was discussed several times on this list, gcc
> > > fails to generate the code because it runs out of registers (ebx is used
> > > with -fPIC):
> > Since version 3 something, gcc can use other registers besides ebx, and
> > might not use ebx at all if the function doesn't do anything that requires
> > access to the pic pointer. If a function accesses no globals, does not
> > take the address of a fuction, or call a function in another shared
> > library, it shouldn't need to load the pic register.
> > The real problem isn't ebx, it's accessing globals. In non-PIC code, a
> > memory reference to a global takes zero registers. In PIC code, it takes
> > one register. In some cases multiple global references can share the same
> > register(s), so gcc doesn't always need one per global. But this could
> > still easily add a half dozen extra registers to an asm block.
> Lets look at the actual code that fails:
> :"+&r"(i), "=m"(autoc[j]), "=m"(autoc[j+1]), "=m"(autoc[j+2])
> :"r"(data1+len), "r"(data1+len-j), "m"(*ff_pd_1)
There aren't always going to be 8 registers. First of all, esp isn't a
general purpose register on x86. It's also somewhat problematic to save
and restore esp in a thread safe manner or receive signals with an invalid
stack. Regardless of why, gcc doesn't support esp as a general register.
So that leaves 7.
If the frame pointer is turned on, gcc doesn't support saving and restoring
that either. That would complicate stack traces in debugging, and if
you're not interested in that, you can always turn the frame pointer off.
So that might leave 6.
If gcc needs a pic pointer for the function, that takes another register.
I'm pretty sure gcc isn't able to flush and restore the pic pointer. And
as Loren pointer out, while gcc can use a register other than ebx for the
pic pointer, and it won't load a pic pointer if it's not needed, it still
reserves ebx. It's too bad that isn't fixed. That drops it the number of
registers down to 5.
So in pic mode, there are only 5 or 6 registers, depending on frame pointer
> needs 1 register
> needs 1 register
> needs 1 register
> is a global, and in PIC mode as you say might need 1 register, lets
> give it one
"m"(static global) doesn't need any registers beyond the pic pointer, which
you already stuck with in pic mode. "m"(non-static non-hidden global)
takes another register. If ff_pd_1 is static or has it's visibility set to
hidden so it's not exported from the library, it won't need another
> "=m"(autoc[j]), "=m"(autoc[j+1]), "=m"(autoc[j+2])
> needs 1 register (%%eax) 8(%%eax) 16(%%eax) for example
This is probably the problem. Suppose you have autoc and j in register
before the asm block, and you want them in registers after the asm block.
The fastest code would be to use (%[autoc],%[j],8), 8(%[autoc],%[j],8),
etc. That avoids combining autoc + j*8 into a register and then re-loading
them afterwards. If there were enough registers, and gcc didn't do it this
way, you'd say it was stupid for generating bad code.
But there aren't enough registers. gcc's register allocator is designed to
find the fastest code, not use the fewest registers. Normally gcc only has
to worry about one instruction at a time and this is the best course of
action. The inline asm construct was intended just for using extra machine
instructions or simple sequences like atomic spin-locks: code with few
constraints. When you have complex constraints that require an allocation
strategy designed to minimize register usage and not cpu cycles, gcc fails.
It certainly would be nice if gcc could cope with this. But it's not
something that many users outside of ffmpeg have a problem with, and I
guess there is no one with the necessary skill motivated enough to provide
You could always not use gcc style inline asm, and then be forced to code
an entire function in asm. No register allocation problems then.
More information about the ffmpeg-devel