[FFmpeg-devel] [PATCH] split-radix FFT

Michael Niedermayer michaelni
Tue Jul 29 21:36:23 CEST 2008


On Tue, Jul 29, 2008 at 08:03:51PM +0100, M?ns Rullg?rd wrote:
> Michael Niedermayer <michaelni at gmx.at> writes:
> 
> > On Tue, Jul 29, 2008 at 07:39:25PM +0100, M?ns Rullg?rd wrote:
> >> Michael Niedermayer <michaelni at gmx.at> writes:
> >> 
> >> > On Tue, Jul 29, 2008 at 05:20:15PM +0100, M?ns Rullg?rd wrote:
> >> >> 
> >> >> Michael Niedermayer wrote:
> >> >> > On Tue, Jul 29, 2008 at 06:26:49PM +0300, Uoti Urpala wrote:
> >> >> >> On Tue, 2008-07-29 at 17:10 +0200, Michael Niedermayer wrote:
> >> >> >> > And just to clarify, yes what i considered a good argument
> >> >> >> > was the sentance above where my reply is. That is to use
> >> >> >> > MANGLE in speed critical code.  That way most textrels are
> >> >> >> > avoided while minimizing the speed impact.
> >> >> >> >
> >> >> >> > I do not think you ever argued for that.
> >> >> >>
> >> >> >> IIRC I did mention the possibility of omitting -fPIC for a subset of
> >> >> >> files.
> >> >> >>
> >> >> >> >  I remember you strongly arguing toward replacing all
> >> >> >> > MANGLE by "m" knowing that it would break gcc 2.95 and not
> >> >> >> > really caring that it would slow down code compiled with
> >> >> >> > -fPIC.
> >> >> >>
> >> >> >> Of course the code would be slower on x86. If you want it to
> >> >> >> be as fast as possible then compile it with -fPIC on x86. I
> >> >> >> don't think it's worthwhile to pick only the globals used
> >> >> >> inside asm for such special treatment.
> >> >> >
> >> >> > x86-64 shared libs require -fPIC, unless that has been fixed.
> >> >> 
> >> >> The x86-64 instruction set hasn't been "fixed", and I doubt it ever
> >> >> will be.  You simply can't fit a 64-bit offset in a 32-bit immediate
> >> >> operand.
> >> >
> >> > Thats not what i meant
> >> >
> >> >> 
> >> >> > so the user does not always have the option to omit -fPIC
> >> >> 
> >> >> But in these cases, forcing a textrel will break the build.
> >> >
> >> > MANGLE forces rip relative addressing on x86-64 and thus avoids the
> >> > occasional GOT indirection gcc adds.
> >> >
> >> > Heres a example:
> >> > long globivar;
> >> >
> >> > void func(){
> >> >     asm(
> >> >         "mov globivar(%rip), %rax\n\t"
> >> >     );
> >> >     asm(
> >> >         "mov %0, %%rax\n\t"
> >> >         :: "m"(globivar)
> >> >     );
> >> > }
> >> >
> >> > results in:
> >> > 0000000000000554 <func>:
> >> >  554:	55                   	push   %rbp
> >> >  555:	48 89 e5             	mov    %rsp,%rbp
> >> >  558:	48 8b 05 d1 02 20 00 	mov    0x2002d1(%rip),%rax        # 200830 <globivar>
> >> >  55f:	48 8b 05 8a 02 20 00 	mov    0x20028a(%rip),%rax        # 2007f0 <_DYNAMIC+0x1b8>
> >> >  566:	48 8b 00             	mov    (%rax),%rax
> >> >  569:	c9                   	leaveq 
> >> >  56a:	c3                   	retq   
> >> >
> >> > you can see the second needs 2 instructions, the first just 1.
> >> 
> >> There is no guarantee that &globivar is reachable with a 32-bit offset
> >> from %rip (or any other register).
> >
> > libavcodec is still smaller than 4gb so it would work fine within and thats
> > the only case we really care about. I do not think any of our asm() accesses
> > globals from outside and if it does thats a seperate thing that can use "m"
> 
> There is still no guarantee that the data section will be mapped
> within 4GB of the text section.

If i add static before long globivar then i get

0000000000000514 <func>:
 514:	55                   	push   %rbp
 515:	48 89 e5             	mov    %rsp,%rbp
 518:	48 8b 05 c9 02 20 00 	mov    0x2002c9(%rip),%rax        # 2007e8 <globivar>
 51f:	48 8b 05 c2 02 20 00 	mov    0x2002c2(%rip),%rax        # 2007e8 <globivar>
 526:	c9                   	leaveq 
 527:	c3                   	retq   

so i would assume that the data and text sections must be within 4gb.
Otherwise i wonder how above would work ...

also
static long globivar;

long func(){
    return globivar;
}
results in:

0000000000000514 <func>:
 514:	55                   	push   %rbp
 515:	48 89 e5             	mov    %rsp,%rbp
 518:	48 8b 05 c9 02 20 00 	mov    0x2002c9(%rip),%rax        # 2007e8 <globivar>
 51f:	c9                   	leaveq 
 520:	c3                   	retq   

again, this is limited to 4gb

compiled with (-fPIC -DPIC -shared -Wl,-Bsymbolic)

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Republics decline into democracies and democracies degenerate into
despotisms. -- Aristotle
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080729/19adddc4/attachment.pgp>



More information about the ffmpeg-devel mailing list