[FFmpeg-devel] [PATCH] split-radix FFT
Måns Rullgård
mans
Tue Jul 29 20:39:25 CEST 2008
Michael Niedermayer <michaelni at gmx.at> writes:
> On Tue, Jul 29, 2008 at 05:20:15PM +0100, M?ns Rullg?rd wrote:
>>
>> Michael Niedermayer wrote:
>> > On Tue, Jul 29, 2008 at 06:26:49PM +0300, Uoti Urpala wrote:
>> >> On Tue, 2008-07-29 at 17:10 +0200, Michael Niedermayer wrote:
>> >> > And just to clarify, yes what i considered a good argument was the
>> >> sentance
>> >> > above where my reply is. That is to use MANGLE in speed critical code.
>> >> > That way most textrels are avoided while minimizing the speed impact.
>> >> >
>> >> > I do not think you ever argued for that.
>> >>
>> >> IIRC I did mention the possibility of omitting -fPIC for a subset of
>> >> files.
>> >>
>> >> > I remember you strongly arguing
>> >> > toward replacing all MANGLE by "m" knowing that it would break gcc 2.95
>> >> > and not really caring that it would slow down code compiled with -fPIC.
>> >>
>> >> Of course the code would be slower on x86. If you want it to be as fast
>> >> as possible then compile it with -fPIC on x86. I don't think it's
>> >> worthwhile to pick only the globals used inside asm for such special
>> >> treatment.
>> >
>> > x86-64 shared libs require -fPIC, unless that has been fixed.
>>
>> The x86-64 instruction set hasn't been "fixed", and I doubt it ever
>> will be. You simply can't fit a 64-bit offset in a 32-bit immediate
>> operand.
>
> Thats not what i meant
>
>>
>> > so the user does not always have the option to omit -fPIC
>>
>> But in these cases, forcing a textrel will break the build.
>
> MANGLE forces rip relative addressing on x86-64 and thus avoids the
> occasional GOT indirection gcc adds.
>
> Heres a example:
> long globivar;
>
> void func(){
> asm(
> "mov globivar(%rip), %rax\n\t"
> );
> asm(
> "mov %0, %%rax\n\t"
> :: "m"(globivar)
> );
> }
>
> results in:
> 0000000000000554 <func>:
> 554: 55 push %rbp
> 555: 48 89 e5 mov %rsp,%rbp
> 558: 48 8b 05 d1 02 20 00 mov 0x2002d1(%rip),%rax # 200830 <globivar>
> 55f: 48 8b 05 8a 02 20 00 mov 0x20028a(%rip),%rax # 2007f0 <_DYNAMIC+0x1b8>
> 566: 48 8b 00 mov (%rax),%rax
> 569: c9 leaveq
> 56a: c3 retq
>
> you can see the second needs 2 instructions, the first just 1.
There is no guarantee that &globivar is reachable with a 32-bit offset
from %rip (or any other register).
--
M?ns Rullg?rd
mans at mansr.com
More information about the ffmpeg-devel
mailing list