[FFmpeg-devel] [PATCH] SPARC VIS simple_idct
Sat Aug 25 18:48:46 CEST 2007
Saturday 25 August 2007 14:00-kor Michel Lespinasse ezt ?rta:
> On Sat, Aug 25, 2007 at 01:52:49PM +0200, Balatoni Denes wrote:
> > But unfortunatelly I see a problem: you are using unsigned multiplies,
> > which are AFAIK not available on SPARC. This also means that the code
> > might not actually comply with ieee1180, because you are using the sign
> > bit for data, but you can't.
> If I remember, the idea was that muls can be implemented easily (looking at
> the VIS specs right now, I think its vis_fmul8sux16 + vis_fmul8ulx16).
> For mulu you could write mulu(x,y) = muls(x,y) + (x >= 32768) ? y : 0
> or when working with vectors, and given that x is a known constant,
> mulu(x,y) = muls(x,y) + (y & mask) where the mask would be a constant
> with each 16-bit element being either 0 of 65535.
> About half of the const vectors had all their elements >= 32768 too so
> these would not even need the mask, actually.
> I have not looked much at your current code, maybe this trick might
> apply there as well ?
Yes, exactly. This trick should make simple_idct_vis more accurate, probably
ieee-1180 compliant - if it helped so much for your code. Now your code is
probably faster than mine if converted to asm properly, because it has half
as many multiplications.
> The argument order is important here, muls is not commutative because
> of the annoying VIS rounding. Using the wrong order loses a lot of
> precision, I dont claim to understand why...
I can confirm this. I did also swap the mul arguments in simple_idct_vis at
one time, and it was less accurate - I thought it was just some random error,
but there might be an explanation to this then, if you also noticed it.
More information about the ffmpeg-devel