[FFmpeg-devel] [PATCH] SPARC VIS simple_idct

Sat Aug 25 18:48:46 CEST 2007

Hi!

Saturday 25 August 2007 14:00-kor Michel Lespinasse ezt ?rta:
> On Sat, Aug 25, 2007 at 01:52:49PM +0200, Balatoni Denes wrote:
> > But unfortunatelly I see a problem: you are using unsigned multiplies,
> > which are AFAIK not available on SPARC. This also means that the code
> > might not actually comply with ieee1180, because you are using the sign
> > bit for data, but you can't.
>
> If I remember, the idea was that muls can be implemented easily (looking at
> the VIS specs right now, I think its vis_fmul8sux16 + vis_fmul8ulx16).
>
> For mulu you could write mulu(x,y) = muls(x,y) + (x >= 32768) ? y : 0
> or when working with vectors, and given that x is a known constant,
> mulu(x,y) = muls(x,y) + (y & mask) where the mask would be a constant
> with each 16-bit element being either 0 of 65535.
> About half of the const vectors had all their elements >= 32768 too so
> these would not even need the mask, actually.
>
> I have not looked much at your current code, maybe this trick might
> apply there as well ?

Yes, exactly. This trick should make simple_idct_vis more accurate, probably 
ieee-1180 compliant - if it helped so much for your code. Now your code is 
probably faster than mine if converted to asm properly, because it has half 
as many multiplications.

> The argument order is important here, muls is not commutative because
> of the annoying VIS rounding. Using the wrong order loses a lot of
> precision, I dont claim to understand why...

I can confirm this. I did also swap the mul arguments in simple_idct_vis at 
one time, and it was less accurate - I thought it was just some random error, 
but there might be an explanation to this then, if you also noticed it.

bye
Denes