[FFmpeg-devel] [PATCH] Fix mm_flags, mm_support for ARM
Måns Rullgård
mans
Tue Jul 1 00:30:30 CEST 2008
matthieu castet <castet.matthieu at free.fr> writes:
>>
>> Could you or anybody else having compatible ARM device just do some
>> benchmarking to confirm my results (I posted benchmarks here multiple
>> times already). It would be a really good help. Because I feel that
>> some people here still doubt that it provides a major performance
>> improvement.
> For dct-test (yes I know it is not a benchmark) on a arm926ejs svn
> implementation got 126.7 kdct/s, your 154.6 kdct/s.
For reference, what figure do you get with C simple_idct (-idct simple)?
>> Once/if the performance improvement is confirmed, a help with
>> integration would be really needed. That's not a joke, I really
>> fail to see any problems with the "balign/ASMALIGN/stack alignment"
>> stuff, so I can't fix them. A good example of a solution (a working
>> patch) is very much welcome.
>>
> Could you list the integration problem that remains ?
> For the alignement stack, may be for old eabi you could use ldm/stm
> instead of double load/store instruction but still use double load/store
> instruction on EABI.
The ARM ABI requires the stack pointer to be 8-byte aligned at
external interfaces. There should be no problem using ldrd there.
> For memory pool, why don't you do only one memory pool ?
> With a good packing, this could avoid lot's of balign.
>
> Do you benchmark the improvement by using double load/store instruction.
> My manual (DDI0222B_9EJS_r1p2.pdf) say that for arm9js :
> - The LDRD instruction behaves in the same way as an LDM of two registers.
> - The STRD instruction behaves in the same way as an STM of two registers.
That is true on ARM9. On ARM11, an 8-byte aligned LDRD takes one
cycle with a latency of three cycles (same as a single LDR), so it
should be faster for aligned data.
--
M?ns Rullg?rd
mans at mansr.com
More information about the ffmpeg-devel
mailing list