[FFmpeg-devel] [PATCH] SPARC VIS simple_idct try #2

Thu Aug 23 02:48:41 CEST 2007

Hi!

Thursday 23 August 2007 01:29-kor Michael Niedermayer ezt ?rta:
> > In the row iteration it is not only permuted, but also shifted right four
> > bits. But there is no shift instruction. So if you know a significantly
> > faster way to shift the input right four bits, than do tell me.
>
> there is a shift instruction, sllx, wheres the problem with using that?

Well, you can't move between floating point and integer registers. So there 
would be some additional storing to memory, reading from memory, some masking 
is still needed, than the shift - all in all it's the same speed or slower 
than 4 adds. Which I already said, that I don't really like, because of 
marginal speedup, and more complexity.

> also iam realizing now that you read and work just with 32bits at a time
> while the registers really are 64bit
> so unles sparc need 2x as much time for 64bit instructions this is very
> inefficient

Now I am kind of puzzled. I am using 64 bit registers. Like f0+f1 is one 64bit 
register. f32, f34, ...f62 are 64 bit registers (these can't even be accessed 
in 32 bit parts). So I really don't understand what you are saying. The big 
macro computes 4 rows in parallel, how could it do that, without using 64 bit 
registers?

> [...]

bye
Denes