[FFmpeg-devel] [PATCH] Fix mm_flags, mm_support for ARM

Tue Jul 1 00:30:30 CEST 2008

matthieu castet <castet.matthieu at free.fr> writes:

>> 
>> Could you or anybody else having compatible ARM device just do some
>> benchmarking to confirm my results (I posted benchmarks here multiple 
>> times already). It would be a really good help. Because I feel that
>> some people here still doubt that it provides a major performance
>> improvement.
> For dct-test (yes I know it is not a benchmark) on a arm926ejs svn 
> implementation got 126.7 kdct/s, your 154.6 kdct/s.

For reference, what figure do you get with C simple_idct (-idct simple)?

>> Once/if the performance improvement is confirmed, a help with
>> integration would be really needed. That's not a joke, I really
>> fail to see any problems with the "balign/ASMALIGN/stack alignment"
>> stuff, so I can't fix them. A good example of a solution (a working
>> patch) is very much welcome.
>> 
> Could you list the integration problem that remains ?
> For the alignement stack, may be for old eabi you could use ldm/stm 
> instead of double load/store instruction but still use double load/store 
> instruction on EABI.

The ARM ABI requires the stack pointer to be 8-byte aligned at
external interfaces.  There should be no problem using ldrd there.

> For memory pool, why don't you do only one memory pool ?
> With a good packing, this could avoid lot's of balign.
>
> Do you benchmark the improvement by using double load/store instruction. 
> My manual (DDI0222B_9EJS_r1p2.pdf) say that for arm9js :
> - The LDRD instruction behaves in the same way as an LDM of two registers.
> - The STRD instruction behaves in the same way as an STM of two registers.

That is true on ARM9.  On ARM11, an 8-byte aligned LDRD takes one
cycle with a latency of three cycles (same as a single LDR), so it
should be faster for aligned data.

-- 
M?ns Rullg?rd
mans at mansr.com