[FFmpeg-devel] [PATCH] SPARC VIS simple_idct try #2

Wed Aug 22 03:47:52 CEST 2007

Hi

On Tue, Aug 21, 2007 at 11:37:38PM +0200, Balatoni Denes wrote:
> Hi!
> 
> Ah, one DECLARE_ALIGNED_8 was missed. Updated patch attached.
> 
> bye
> Denes
> 
> Tuesday 21 August 2007 22:35-kor Balatoni Denes ezt ?rta:
> > Hi!
> >
> > Here is a patch to add sparc vis optimized simple_idct. Speedwise it is
> > about halfway between the C and the mlib version, slightly on the faster
> > side. Although the mlib version is faster, this is more accurate - and I
> > honestly don't know why it is not faster on the sparc than it is, it should
> > be according to my estimates, but it is not.

it would be very interresting to find out why its not faster ...

> > Also, although it's the same algorithm as simple_idct, it might be less
> > accurate on rare occasions, because of the primitive slow split 8+8 bit
> > multply operation in VIS. I am hoping the idct won't overflow, too.

what happens with the regression tests if its forced to be used where the
normal C simple idct normally is?

and what does dct-test.c output for the idct?

[...]
>    if (accel & ACCEL_SPARC_VIS) {
> +      if(avctx->idct_algo==FF_IDCT_AUTO || avctx->idct_algo==FF_IDCT_SIMPLEVIS){
> +                c->idct_put = ff_simple_idct_put_vis;
> +                c->idct_add = ff_simple_idct_add_vis;
> +                c->idct     = ff_simple_idct_vis;
> +                c->idct_permutation_type = FF_NO_IDCT_PERM;
[...]
> +static DECLARE_ALIGNED_8(int16_t, coeffs[28]) = {
> +    32138, 32138, 32138, 32138,
> +    30274, 30274, 30274, 30274,
> +    27246, 27246, 27246, 27246,
> +    23170, 23170, 23170, 23170,
> +    18205, 18205, 18205, 18205,
> +    12540, 12540, 12540, 12540,
> +     6393,  6393,  6393,  6393
> +};

const static

[...]
> +#define IDCT4ROWS(in, shift, label, s1, s2, bi, ma) \
> +    /* order input */\
> +        "ld [" in "], %%f0           \n\t"\
> +        "ld [" in "+4], %%f4         \n\t"\
> +        "ld [" in "+8], %%f8         \n\t"\
> +        "ld [" in "+12], %%f12       \n\t"\
> +        "ld [" in "+16], %%f1        \n\t"\
> +        "ld [" in "+4+16], %%f5      \n\t"\
> +        "ld [" in "+8+16], %%f9      \n\t"\
> +        "ld [" in "+12+16], %%f13    \n\t"\
> +        "ld [" in "+32], %%f2        \n\t"\
> +        "ld [" in "+4+32], %%f6      \n\t"\
> +        "ld [" in "+8+32], %%f10     \n\t"\
> +        "ld [" in "+12+32], %%f14    \n\t"\
> +        "ld [" in "+48], %%f3        \n\t"\
> +        "ld [" in "+4+48], %%f7      \n\t"\
> +        "ld [" in "+8+48], %%f11     \n\t"\
> +        "ld [" in "+12+48], %%f15    \n\t"\
> +        "ldd [%0], %%f60             \n\t"\
> +        "ldd [%0" ma "], %%f62       \n\t"\
> +        "fzero %%f30                 \n\t"\
> +        "wr %%g0," s1 ", %%gsr       \n\t"\
> +        label "1:                    \n\t"\
> +        "fand %%f0,%%f60, %%f32      \n\t"\
> +        "fand %%f2,%%f60, %%f34      \n\t"\
> +        "fand %%f4,%%f60, %%f36      \n\t"\
> +        "fand %%f6,%%f60, %%f38      \n\t"\
> +        "fand %%f8,%%f60, %%f40      \n\t"\
> +        "fand %%f10,%%f60, %%f42     \n\t"\
> +        "fand %%f12,%%f60, %%f44     \n\t"\
> +        "fand %%f14,%%f60, %%f46     \n\t"\
> +        "fand %%f0,%%f62, %%f48      \n\t"\
> +        "fand %%f2,%%f62, %%f50      \n\t"\
> +        "fand %%f4,%%f62, %%f52      \n\t"\
> +        "fand %%f6,%%f62, %%f54      \n\t"\
> +        "fand %%f8,%%f62, %%f56      \n\t"\
> +        "fand %%f10,%%f62, %%f58     \n\t"\
> +        "fand %%f12,%%f62, %%f60     \n\t"\
> +        "fand %%f14,%%f62, %%f62     \n\t"\
> +        "fpackfix %%f32, %%f0        \n\t"\
> +        "fpackfix %%f34, %%f1        \n\t"\
> +        "fpackfix %%f36, %%f4        \n\t"\
> +        "fpackfix %%f38, %%f5        \n\t"\
> +        "fpackfix %%f40, %%f8        \n\t"\
> +        "fpackfix %%f42, %%f9        \n\t"\
> +        "fpackfix %%f44, %%f12       \n\t"\
> +        "fpackfix %%f46, %%f13       \n\t"\

well i dont know sparc asm at all but dont you read a few things in at the top
and then just overwrite these registers

also you permute the input explicitly instead of setting idct_permutation_type
properly

please dont sumbit trash

[...]
> +void ff_simple_idct_put_vis(uint8_t *dest, int line_size, DCTELEM *data) {
> +    ff_simple_idct_vis(data);
> +    ff_put_pixels_clamped_vis(data, dest, line_size);
> +}
> +
> +void ff_simple_idct_add_vis(uint8_t *dest, int line_size, DCTELEM *data) {
> +    ff_simple_idct_vis(data);
> +    ff_add_pixels_clamped_vis(data, dest, line_size);
> +}

check that gcc inlines these 4 calls, if not do something so it does, they
should be inlined

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Concerning the gods, I have no means of knowing whether they exist or not
or of what sort they may be, because of the obscurity of the subject, and
the brevity of human life -- Protagoras
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070822/4102677a/attachment.pgp>