[FFmpeg-devel] [PATCH] ARM: NEON optimised simple_idct

Måns Rullgård mans
Mon Aug 25 22:04:27 CEST 2008


Michael Niedermayer <michaelni at gmx.at> writes:

> On Mon, Aug 25, 2008 at 07:47:16PM +0100, M?ns Rullg?rd wrote:
>> Michael Niedermayer <michaelni at gmx.at> writes:
> [...]
>> >2. depending on the pattern of non zero / all zero rows one of 8
>> > optimized column transforms is used.  This may be a bad idea though
>> > for a CPU with a small code cache ...
>> >
>> > also maybe it would make sense to look at i386/idct_sse2_xvid.c
>> > which uses SSE2 (128bit registers), this one uses only 16bit operations
>> > for the column transform so it may be faster when the tricks of the simple
>> > idct arent applicable
>> 
>> Do you expect any sane person to be able to read that?  
>
> well, a little insanity may be needed
>
>> That's also
>> not bitexact, right?
>
> it is supposed to be bitexact, and i cannot remember a case where any
> input lead to different output. Also the MMX one is used in the
> regression tests and they match between MMX and non x86 cpus ...

All the different IDCT variants (int, simple, simplemmx, libmpeg2mmx,
xvidmmx, faani) give different output on my machine with current
FFmpeg.  Which one is correct?

>> > also
>> >
>> >     Intel 64 and IA-32 Architectures
>> >     Software Developers Manual
>> >                               Volume 2A (and B)
>> >            Instruction Set Reference
>> >
>> > contains very readable and unambigious explanations of what all the
>> > MMX, SSE* instruction do, if you ever want to decypher mmx or sse code
>> 
>> I have those documents, and reading Chinese is easier.
>
> This is great, so you can help me communicate with zhentan who is a SOC
> student and IIRC chinese.

No, but maybe he can explain mmx to me.

-- 
M?ns Rullg?rd
mans at mansr.com




More information about the ffmpeg-devel mailing list