[MPlayer-dev-eng] [PATCH] (new version) AltiVec: dct64 for mp3lib, IMDCT for liba52, detection code

Sun Jan 19 22:05:03 CET 2003

On Sun, 2003-01-19 at 19:44, Romain Dolbeau wrote:

> The MC functions are the pair of "*pixels8_xy2_c", right ?

Almost all of the functions in dsputil.c.

[me looks...]

Jesus, they have changed almost the whole damn thing....

> I've just sent a patch to ffmpeg for the first one
> ("put_pixels8_xy2_c"). The main problem is the C version is totally
> unreadable, I had to guess what I was supposed to to.

Well, I figured out what they do by preprocessing the source and using
the expanded (and reformatted) source for ideas.

> Sure the alignment is wrong, but it's mostly for reading so it's less of
> a problem.

Huh? You either need to align it (costly, especially if you need to read
2x16 bytes which is often the case) or you'll end up with wrong pixels
on the screen.

>  The output block is 8 bytes-aligned, so it's almost OK. OTOH
> I tried "put_pixels8_c" but it was slower than the C code.

Yes, this is the only which doesn't benefit from altivec AFAIR.
Although writing it as (assuming this is the correct function as it
had been renamend and I haven't been tracking the changes):

 UINT32 *p = (UINT32 *) block;
  const UINT32 *pix = (const UINT32 *) pixels;

  p[0] = pix[0];
  p[1] = pix[1];
  (UINT8 *) p += line_size;
  (UINT8 *) pix += line_size;
  p[0] = pix[0];
  p[1] = pix[1];
  (UINT8 *) p += line_size;
  (UINT8 *) pix += line_size;
  p[0] = pix[0];
  p[1] = pix[1];
  (UINT8 *) p += line_size;
  (UINT8 *) pix += line_size;
  p[0] = pix[0];
  p[1] = pix[1];
  (UINT8 *) p += line_size;
  (UINT8 *) pix += line_size;
  p[0] = pix[0];
  p[1] = pix[1];
  (UINT8 *) p += line_size;
  (UINT8 *) pix += line_size;
  p[0] = pix[0];
  p[1] = pix[1];
  (UINT8 *) p += line_size;
  (UINT8 *) pix += line_size;
  p[0] = pix[0];
  p[1] = pix[1];
  (UINT8 *) p += line_size;
  (UINT8 *) pix += line_size;
  p[0] = pix[0];
  p[1] = pix[1];
  (UINT8 *) p += line_size;
  (UINT8 *) pix += line_size;

  if (h == 8)
    return;

 p[0] = pix[0];
  p[1] = pix[1];
  (UINT8 *) p += line_size;
  (UINT8 *) pix += line_size;
  p[0] = pix[0];
  p[1] = pix[1];
  (UINT8 *) p += line_size;
  (UINT8 *) pix += line_size;
  p[0] = pix[0];
  p[1] = pix[1];
  (UINT8 *) p += line_size;
  (UINT8 *) pix += line_size;
  p[0] = pix[0];
  p[1] = pix[1];
  (UINT8 *) p += line_size;
  (UINT8 *) pix += line_size;
  p[0] = pix[0];
  p[1] = pix[1];
  (UINT8 *) p += line_size;
  (UINT8 *) pix += line_size;
  p[0] = pix[0];
  p[1] = pix[1];
  (UINT8 *) p += line_size;
  (UINT8 *) pix += line_size;
  p[0] = pix[0];
  p[1] = pix[1];
  (UINT8 *) p += line_size;
  (UINT8 *) pix += line_size;
  p[0] = pix[0];
  p[1] = pix[1];

improves the code a bit by taking advantage of the many registers.

> For the xy2, maybe I should check if ((address & 0X1F) < 8), in that
> case I have both 8 pixels block in a single vector and I can avoid the
> second load from "pixels". What do you think ? My code seems faster than
> the C code, but I'm not sure it's always true - for fully out-of-cache
> data, the 32 bytes loaded per line for "pixels" may be too costly (you
> only really need 9 of them).

Nice idea, unfortunately I have to go right now; I'll send my more
detailed reply to this later.

-- 
Servus,
       Daniel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.mplayerhq.hu/pipermail/mplayer-dev-eng/attachments/20030119/936d2dea/attachment.pgp>