[FFmpeg-devel] [PATCH] Fix mm_flags, mm_support for ARM
Laurent Desnogues
laurent.desnogues
Tue Jul 1 12:56:13 CEST 2008
On Tue, Jul 1, 2008 at 12:25 PM, Siarhei Siamashka
<siarhei.siamashka at gmail.com> wrote:
>> What you say is not always true: when you have data close
>> to instructions, you pollute your Icache with data, and your
>> Dcache with instructions; on top of that you make sure you
>> need one Itlb *plus* one Dtlb entry.
>
> In order to reduce instruction/data cache pollution, data and code can
> be aligned at cache line boundaries, hence the use of .balign
> directives.
Doing so does not prevent from the above-mentionned TLB
thrashing; getting short on TLB entries is something you
really don't want and can kill performance by a big factor.
And you will still lose some cache words due to forced
alignment.
> Do you know any way of generating code for ARM which would not
> intermix instructions with data? You should keep in mind that all the
> ARM instructions (I'm not considering thumb here) have fixed size
> which is 32-bit. You can't fit any arbitrary constant immediate
> operand in it. Moreover, you can't encode some absolute address into
> instruction and get it fixed by applying relocations. So absolute
> addresses are always stored intermixed with code and accessed using
> pc-relative addressing. Please try to compile something like the
> following fragment to see what is generated (pay attention to how
> external variables are accessed so that this code can be linked with
> other object files):
>
> extern int x;
> extern int y;
> extern int z;
>
> void set_global_variables()
> {
> x = 0x12345678;
> y = 0x1234;
> y = 0x12;
> }
I know ARM well enough, thanks :-)
On ARMv7 you have movt/movw instructions to load 32 bit
constants using two instructions (or unsigned 16 bit
constants using one instruction). And ARM ELF defines
relocation tags for these (R_ARM_MOV{T,W})
Latest CSL gcc release uses these instructions.
>> I think both approaches have to be benchmarked in real
>> life situation, and on several processors.
>
> Please do it. Any improvements are very much welcome. Based on your
> previous posts, I assume that you have ARM hardware to run these
> tests.
Heh, I was just making some comment about a claim you
made that's not proven. I know some ARM design well
enough to know loading constants is better in some places
by using movt/movw. If I had access to older designs such
as ARM11 or ARM9, I would benchmark on them.
>> Also when loading from memory, if your data side is blocking
>> then you are basically stalling your pipeline while the data is
>> loaded.
>
> When all the data fits into a single cache line, adding one more
> constant so that this data set still fits cache line, will not
> introduce extra cache misses. It there anything wrong in this
> statement (except for my English grammar)? Cache line is 32 bytes on
> ARM9/ARM11 and 64 bytes on Cortex-A8
Yeah I agree.
Don't take what I told as a criticism toward you. I just want
to be sure people don't take for granted that what is true
on a given processor is also true on following generations.
Laurent
More information about the ffmpeg-devel
mailing list