[FFmpeg-devel] [PATCH] SPARC VIS simple_idct try #2
Michael Niedermayer
michaelni
Thu Aug 23 03:31:42 CEST 2007
Hi
On Thu, Aug 23, 2007 at 03:01:34AM +0200, Michael Niedermayer wrote:
[...]
> > > also iam realizing now that you read and work just with 32bits at a time
> > > while the registers really are 64bit
> > > so unles sparc need 2x as much time for 64bit instructions this is very
> > > inefficient
> >
> > Now I am kind of puzzled. I am using 64 bit registers. Like f0+f1 is one 64bit
> > register. f32, f34, ...f62 are 64 bit registers (these can't even be accessed
> > in 32 bit parts). So I really don't understand what you are saying. The big
> > macro computes 4 rows in parallel, how could it do that, without using 64 bit
> > registers?
>
> hmm ok the registers are split in 2, i didnt know that (this design is
> extreemly bad for a RISC cpu as it makes out of order execution very hard)
>
> still the loads are 32bit
> so lets check if i finally figured out how sparc asm works
> 1. you load everything by using 32bit loads into the low and high
> halfs of 64bit registers
> 2. you duplicate the input and mask on each side half the 16bit values
> away
> 3. you use fpackfix to shift half the input left by 4bit and pack the 2 16bit
> values which are seperated by 16 zero bits into a 32bit register
> 4. you subtract the other half from 2048
> 5. you do the same fpackfix on the second half
> ...
>
> ok lets see
> 1. 8 instrucions are useless you can use 64bit loads
> 2. all 16 instructions ure unneeded
> 3+5 (16 instructions) are unneeded you can quickly shift the coeffs up to
> block_last_index by using C code
> 4. the subtract is done using 32bit effectively (half of the registers
> are 0 aka unused
>
> so again
> fix the permutation
> shift left by 4 bits using C code or asm both stoping after block_last_index
> do the 2048*(1<<4) subtraction if needed per 64bit
>
> at this point you should have pretty much the same data as in your case
> but very significantly faster
also if you need to transpose the stuff between row and column transform the
following can be used (minus typos high/low half errors and such)
f0 A0A1 A2A3 A4A5 A6A7
f2 B0B1 B2B3 B4B5 B6B7
f4 C0C1 C2C3 C4C5 C6C7
f6 D0D1 D2D3 D4D5 D6D7
f8 E0E1 E2E3 E4E5 E6E7
f10 F0F1 F2F3 F4F5 F6F7
f12 G0G1 G2G3 G4G5 G6G7
f14 H0H1 H2H3 H4H5 H6H7
fpmerge f0,f4 ,f16 A0C0 A1C1 A2C2 A3C3
fpmerge f2,f6 ,f18 B0D0 B1D1 B2D2 B3D3
fpmerge f16,f18 ,f0 A0B0 C0D0 A1B1 C1D1
fpmerge f17,f19 ,f2 A2B2 C2D2 A3B3 C3D3
fpmerge f0,f1,f16 A0A1 B0B1 C0C1 D0D1
fpmerge f2,f3,f18 A2A3 B2B3 C2C3 D2D3
...
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
If you really think that XML is the answer, then you definitly missunderstood
the question -- Attila Kinali
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070823/bb9532b7/attachment.pgp>
More information about the ffmpeg-devel
mailing list