[FFmpeg-devel] MMX accelerated DSP functions for VC1/WMV3 decoders

Tue Jul 3 18:08:37 CEST 2007

Hi

On Sun, Jul 01, 2007 at 04:50:46PM +0200, Christophe GISQUET wrote:
> Hello,
> 
> here's a new version, where the license header now specifies strictly
> v2.0 of the _Library_ General Public License, which is probably the most
> I would accept. Its text is not directly available anymore from the
> Licenses pages of the fsf and gnu sites.
> 
> The question whether I accept version 2.1 or 3.0 of the LGPL or GPL is
> moot, as you pointed their texts offer a kind of loophole to select
> another version, which I disagree with.
> 
> Michael Niedermayer a ?crit :
> > currently the code does run a dummy do nothing filter in 6 out of 15 cases
> > this is not good, if there where a general variable tap offset supported
> > then i think it should be easier to skip these dummy filter_0 copy thing
> 
> So I implemented something that should be akin to your suggestion. Here
> are the execution times for 5 runs: 5.26 5.27 5.32 5.16 5.25. Therefore
> it's a tiny bit faster (around 1%).
> 
> As for the code size of the object file, not sure what is the best
> indication:
> ls: 75696 -> 42480 (with debug symbols though)
> size (only text is present): 32869 -> 3552
[...]
> Index: libavcodec/i386/dsputil_mmx.c
> ===================================================================
> --- libavcodec/i386/dsputil_mmx.c	(r??vision 9451)
> +++ libavcodec/i386/dsputil_mmx.c	(copie de travail)
> @@ -410,7 +410,7 @@
>          );
>  }
>  
> -static void put_pixels8_mmx(uint8_t *block, const uint8_t *pixels, int line_size, int h)
> +void put_pixels8_mmx(uint8_t *block, const uint8_t *pixels, int line_size, int h)
>  {

non static fuctions need a ff_ prefix

[...]
> +#if defined(CONFIG_VC1_DECODER) || defined(CONFIG_WMV3_DECODER)
> +            ff_vc1dsp_init_mmx(c, avctx);
> +#endif

shouldnt these be enabled for a VC1/WMV3 encoder too?

[...]
> +/** Interpolates fractional pel values using MMX */
> +static void vc1_mspel_mc_mmx(uint8_t *dst, const uint8_t *src, int stride, int mode, int rnd)
> +{
> +    const uint8_t *tptr;
> +    int           tptrstr;
> +    int           mode1 = mode & 3;
> +    int           mode2 = (mode >> 2) & 3;

you could pass mode1 and 2 as parameters this would avoid the calculation above

[...]
> +    /* Translation: tmp=src-stride, tmp+8=src, ... */
> +    if (mode1) { /* Horizontal filter to apply */
> +        if (mode2) { /* Vertical filter to apply, output to tmp */
> +            vc1_put_shift[mode1-1](tmp, 8, src-stride, stride, 11, rnd, 1);
> +            tptr = tmp+8;
> +            tptrstr = 8;
> +        }
> +        else { /* No vertical filter, output 8 lines to dst */
> +            //fprintf(stderr, "mode1 noV\n"); fflush(stderr);
> +            vc1_put_shift[mode1-1](dst, stride, src, stride, 8, rnd, 1);
> +            return;
> +        }
> +    }
> +    else {
> +        /* No horizontal filter, use directly src as input */
> +        tptr = src;
> +        tptrstr = stride;
> +        /* put_vc1_mspel_mc00_mmx directly calls put_pixels8_mmx */
> +    }
> +
> +    vc1_put_shift[mode2-1](dst, stride, tptr, tptrstr, 8, 1-rnd, tptrstr);
> +}

dst_stride= stride;
if (mode1) { /* Horizontal filter to apply */
    if (mode2) { /* Vertical filter to apply, output to tmp */
        vc1_put_shift[mode1-1](tmp, 8, src-stride, stride, 11, rnd, 1);
        src = tmp+8;
        stride = 8;
    }
    else { /* No vertical filter, output 8 lines to dst */
        //fprintf(stderr, "mode1 noV\n"); fflush(stderr);
        vc1_put_shift[mode1-1](dst, stride, src, stride, 8, rnd, 1);
        return;
    }
}

vc1_put_shift[mode2-1](dst, dst_stride, src, stride, 8, 1-rnd, stride);

also the -1 in [...-1] can be avoided by putting a NULL at place 0 of the
array

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

No snowflake in an avalanche ever feels responsible. -- Voltaire
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070703/37720d50/attachment.pgp>