[FFmpeg-devel] [PATCH] ARM: NEON optimised simple_idct

Måns Rullgård mans
Mon Aug 25 16:53:29 CEST 2008


Michael Niedermayer <michaelni at gmx.at> writes:

> On Mon, Aug 25, 2008 at 04:06:33AM +0100, Mans Rullgard wrote:
>> ---
>>  libavcodec/Makefile                  |    2 +
>>  libavcodec/armv4l/dsputil_arm.c      |   15 ++
>>  libavcodec/armv4l/simple_idct_neon.S |  383 ++++++++++++++++++++++++++++++++++
>>  libavcodec/avcodec.h                 |    1 +
>>  libavcodec/utils.c                   |    1 +
>>  5 files changed, 402 insertions(+), 0 deletions(-)
>>  create mode 100644 libavcodec/armv4l/simple_idct_neon.S
>> 
>
> is this idct binary identical in output to the C/MMX simple idct?

Yes.

>> +#ifdef HAVE_NEON
>> +        } else if (idct_algo==FF_IDCT_SIMPLENEON){
>> +            c->idct_put= ff_simple_idct_put_neon;
>> +            c->idct_add= ff_simple_idct_add_neon;
>> +            c->idct    = ff_simple_idct_neon;
>> +            c->idct_permutation_type = FF_NO_IDCT_PERM;
>> +#endif
>
> I do not know neon at all but, ive never seen a SIMD instruction set for
> which the identity permutation would have been optimal.
>
> Also i suspect that the MMX simple idct is a better basis from which to
> write other SIMD variants of the simple idct than the C one.

I can't read mmx code.  Could you explain briefly what optimisations
are possible with permuted input?  NEON has more and wider registers
than mmx, so it is reasonable to expect the optimal code to be quite
different.

I should probably revisit the IDCT.  It was one of the first NEON
things I did, and I've probably picked up another trick or two since.

-- 
M?ns Rullg?rd
mans at mansr.com




More information about the ffmpeg-devel mailing list