[MPlayer-dev-eng] [PATCH] (new version) AltiVec: dct64 for mp3lib, IMDCT for liba52, detection code
degger at fhm.edu
Mon Jan 20 02:15:32 CET 2003
On Sun, 2003-01-19 at 19:44, Romain Dolbeau wrote:
> For the xy2, maybe I should check if ((address & 0X1F) < 8), in that
> case I have both 8 pixels block in a single vector and I can avoid the
> second load from "pixels". What do you think ?
Why 0x1f? To check for 16bit alignment you'd use something like
(address & 0x0f) == 0. The nice thing about the code blocks calulating
the mean of 4 adjacent pixels is that with altivec one can trivially
do something like:
a = first line
b = a << 8 bits
c = next line
d = c << 8 bits
e = mean of a, b, c, d
a = c
b = d
as long as the stride is a multiple of 16.
It gets quickly complicated when the start of the useful data in memory
is between 0x.......7 and 0x.......f because then one needs to either
special case or generally fetch 2x16 bytes and align them. The alignment
vector can be calculated in advance and reused unless stride % 16 != 0.
Of course there a small distinction between the to-be-applied data and
the picture itself because the former will be en block while the latter
uses the stride.
> My code seems faster than
> the C code, but I'm not sure it's always true - for fully out-of-cache
> data, the 32 bytes loaded per line for "pixels" may be too costly (you
> only really need 9 of them).
This does not necessarily matter, you can mark the data as uncacheable
if in doubt. Also the i-cache footprint and the schedulability of the
code make a whole lot of difference. Normally altivec code will yield
at least in a four fold improvement in terms of instructions which can
easily pay off, especially when considering that CPUs normally fetch
memory in sizes of a whole cacheline which is 16 bytes for embedded
PPC CPUs, 32 bytes in our case or even 128 bytes on PPC64.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: This is a digitally signed message part
More information about the MPlayer-dev-eng