[FFmpeg-devel] [PATCH] ARM: NEON optimised simple_idct

Mon Aug 25 22:20:08 CEST 2008

Alexander Strange <astrange at ithinksw.com> writes:

> On Aug 25, 2008, at 4:04 PM, M?ns Rullg?rd wrote:
>
>> Michael Niedermayer <michaelni at gmx.at> writes:
>>
>>> On Mon, Aug 25, 2008 at 07:47:16PM +0100, M?ns Rullg?rd wrote:
>>>> Michael Niedermayer <michaelni at gmx.at> writes:
>>> [...]
>>>>> 2. depending on the pattern of non zero / all zero rows one of 8
>>>>> optimized column transforms is used.  This may be a bad idea though
>>>>> for a CPU with a small code cache ...
>>>>>
>>>>> also maybe it would make sense to look at i386/idct_sse2_xvid.c
>>>>> which uses SSE2 (128bit registers), this one uses only 16bit  
>>>>> operations
>>>>> for the column transform so it may be faster when the tricks of  
>>>>> the simple
>>>>> idct arent applicable
>>>>
>>>> Do you expect any sane person to be able to read that?
>>>
>>> well, a little insanity may be needed
>>>
>>>> That's also not bitexact, right?
>>>
>>> it is supposed to be bitexact, and i cannot remember a case where any
>>> input lead to different output. Also the MMX one is used in the
>>> regression tests and they match between MMX and non x86 cpus ...
>>
>> All the different IDCT variants (int, simple, simplemmx, libmpeg2mmx,
>> xvidmmx, faani) give different output on my machine with current
>> FFmpeg.  Which one is correct?
>
> All of them are correct; none of the IDCT-using codecs specify exact  
> rounding.

Yes, I know.

> simple* and xvid* should be the same as their C versions, though.

Well, they're not.

> It's best to stick with simpleidct so we can at least have bit-exact  
> compatibility with ffmpeg-encoded files.

Agreed.

-- 
M?ns Rullg?rd
mans at mansr.com