[Ffmpeg-devel] [RFC] AltiVec optimizations, try 2
Luca Barbato
lu_zero
Thu Aug 3 11:44:08 CEST 2006
Guillaume POIRIER wrote:
>
> Just out of curiosity, is it necessary to explicit vec_splat_s32 so that
> gcc uses the "splat" asm instruction, otherwise it will allocate 64, 7,
> ... on the stack and load each register with these constants?
You want to not use the stack at all but just have it inlined as direct
operation since vec_splat_(s|u)(8|16|32) doesn't require memory access
at all.
>
> Also, as far as I understood how vec_splat_s32 works, it should be
> possible to generate a vector full of "64" with a single
> vec_splat_s32(64)...
nope you can put in a ppc instruction a value in the range of -16 .. 15
if is an immediate.
vec_splat_* take an immediate, not a register.
> so why is it desirable to use the form with more
> instructions (more decoding bw, more dependencies, more computation unit
> slots used up)... is this an optimization specific to G4 or to Altivec
> in general?
generic optimization, in Altivec the most expensive operation is memory
access (think it about 3-4 times slower than every other instructions)
>
> Or am I just to blind to see the obvious solution?
>
not blind, just not used to it.
In theory you'd like to have those const values in some registers and
not have to pay a visit to the memory and then keep them there.
since we already splatted 4 somewhere the vec_sl will just use this
register if there aren't deps on it too near, so even splatting 64 would
be a single algebric op.
lu
--
Luca Barbato
Gentoo/linux Gentoo/PPC
http://dev.gentoo.org/~lu_zero
More information about the ffmpeg-devel
mailing list