[FFmpeg-devel] [PATCH] fft_tab_neon is not PIC enough

Sun Jul 15 23:46:27 CEST 2012

On 07/15/2012 10:34 AM, Reimar Döffinger wrote:
> On Sun, Jul 15, 2012 at 07:55:34AM -0700, John Reiser wrote:
>> On 07/15/2012 02:04 AM, Reimar Döffinger wrote:
>>> On 14 Jul 2012, at 17:24, John Reiser <jreiser at bitwagon.com> wrote:
>>>> fft_tab_neon has addresses in .rodata, which is bad for shared libraries.
>>>
>>> Could you please be more specific about what exactly "bad" means?
>>
>> "Bad" means "does not fulfill the expectations of residing in a read-only
>> segment."  The question becomes, "After what time does 'read-only' apply?"
>> That is, what is the earliest time after which there are no more writes?
> 
> First, do you mean .rodata or .text? You switch between those.

.text has two meanings:  the .text section (Elf32_Shdr with .s_name ==>
".text", and other properties), and the .text segment (Elf32_Phdr with
.p_type==PT_LOAD and .p_flags &= ~PF_W).  In the address-space layout that
is available in every environment, then there are two segments.  One segment
(PT_LOAD) is read-only and is commonly known as the .text segment, although
a segment has no designated name.  The other segment is read-write and is
commonly known as the .data segment.  In the standard model, the .rodata
section belongs to the ".text segment" because the read-only property of
.rodata is valued more highly than the possible lack of suitability for
being fed into the instruction decoder.

> First of all .rodata IMHO means "the application will not try to write into
> that".

The definition of .rodata implies _nobody_ is allowed to write into it,
not even the dynamic linking mechanism.

> If the operating system/linker wants to write into it, that's something
> they should deal with amongst themselves, it shouldn't really be the
> task of the application to anticipate that (ideally).

I believe that you are talking about a section named .data.rel.ro, where
the dynamic linking mechanism is allowed to write into it, but which
is read-only after that.  One problem is that in some environments "the
dynamic linking mechanism" can be hard to distinguish from "ordinary" code.
Another problem is that the conventions of .data.rel.ro are relatively
recent (circa 2006 or so), about as recent as some assemblers for ARM
that accept two-operand "sub r4, #2" as a short-hand for three-operand
"sub r4, r4, #2".  See http://www.airs.com/blog/archives/189  (May 9, 2008)
and note the comment that .data.rel.ro (and the implied PT_GNU_RELRO or
equivalent) is a lot of work for somewhat minor benefit.

> 
>> For a physical ROM, the time is immediately after hardware manufacturing.
> 
> As far as this case is concerned I would wonder if you'd really have
> to/want to do runtime relocation and dynamic linking for code in ROM,
> that seems like a extremely special case (though I guess you are aiming
> at in-place execution of a full Linux system and e.g. EEPROM).
> 
>> My environment wants "no more writes to .text" after placement of the
>> library into the file system of an operating system, and current code
>> does not work with this restriction.  Current code requires writing
>> those addresses during dlopen() or LoadLibrary() into a process.
> 
> Not that I see any issue with the assembler/linker detecting that and just not
> putting that into the normal .rodata (and actually I thought they
> already do, by creating a .rodata.relro subsection - though possibly
> that's only handled at the compiler level yet).
> 
>>>> The attached patch provides the additional relocation by changing
>>>> the table to an array of branch instructions.  The patch also
>>>> checks the bounds of the index.  In Thumb mode the 11-bit displacement
>>>> field of a short unconditional branch can span +/- 1024 words
>>>> which is +/- 2048 bytes.  By moving the routine to near the middle, >>>> then the other code could double in size before exceeding
>>>> the limit on span.
>>>
>>> This seems extremely brittle to me, especially since I am not aware of a reason why the linker couldn't reorder the functions in whatever way it wants, so the absolute minimum would be supporting an offset that can handle the size of the whole .text segment (or reorganize the code so the linker will not be able to separate it, e.g. by putting everything in a single function, but that is still ugly and brittle).
>>> For other architectures we have solved this problem by storing the difference between a label and the symbols in a table (those will always be real constants), at full 32 or 64 bits and manually add those onto the PC, then jump to the calculated address.

Looking closely at the code of fft_neon.S, the linker must *not* change
the relative distance between any of the functions defined in that file.
There are internal calls among the fft<N>_neon and other functions.
These calls use the pc-relative addressing of 'bl' and 'b' opcodes, and
they are _not_ marked with relocations in the output of assembly by
GNU binutils-2.20.1-16, which is a current toolchain on Debian.  Thus the
linker cannot re-order correctly unless it exhaustively disassembles all
the code.  This stymies any function-by- function re-ordering at link time.
Re-ordering which keeps each compilation unit (object file, such as .o)
intact is OK; but such cannot affect the patch.

>>
>> If you want the property that the linker can change placement of the fft<N>
>> functions arbitrarily,
> 
> I'd rather say that with the trend to link-time optimization adding code
> that can't handle the linker optimizing things seems like aiming the gun at
> your foot.
> And making it .data everywhere when it can be in .rodata on all systems
> people use it commonly on seems overkill.

Making it .rodata costs a surprisingly large amount on Linux and Android.
Yes, it is functional, but the cost is high.  dlopen() must do the dance
of mprotect(,,PROT_WRITE), modify, mprotect(,,&~PROT_WRITE).  This takes
measurable time and battery energy.  Also, any page that is modified is
no longer shared, and some MMU have a page size of 16KB or larger, not 4KB.
Mis-using 1/1000 of RAM must be avoided.

> One thing that is trivial and I wouldn't have objections against is
> changing const/endconst for these cases to e.g. relconst/endrelconst
> so it's easy to change it all in one place.

Defining new macros (text or pseudo-ops) tends to remove those names
from the user symbol space. Usually that's OK, but every once in a while
it hurts.

> 
>> The current code is broken for my environment.  The patched code works in
>> my environment and all other known current cases.
> 
> That makes me wonder what "all other know current cases" means, how much
> did you test?
> Because I have another minor nit, that the new code uses the 2-operand
> "sub" mnemonic. I'm quite sure that the remaining code does not use that
> because some assemblers still do not support it but only the 3-operand form.

Where is the "style guide" or other written list of the *detailed*
restrictions of this kind that the project has decided to abide by?

> 
>> Enhancements are welcome, but DT_TEXTREL is prohibited.
> 
> I don't think I will have time, and also I certainly don't have
> any system I could actually test the changes on.
> So I was obviously hoping you'd be motivated to improve on it.
> I don't claim to have the last call here, but IMHO as long
> as it's only "your system" that is affected the requirements
> for a patch are "pretty damn near perfect" (or at least unlikely to
> cause any issues in any imaginable case) which I think yours
> doesn't quite fulfill.

Patches accepted.  Code trumps discussion. :-)

--