[FFmpeg-devel] [PATCH] mmx implementation of vc-1 inverse transformations

Mon Jul 21 06:51:08 CEST 2008

On Mon, Jul 21, 2008 at 02:50:10AM +0200, Michael Niedermayer wrote:
> On Thu, Jul 17, 2008 at 11:29:59PM +0200, Victor Pollex wrote:
> > Michael Niedermayer schrieb:
> >> On Mon, Jul 07, 2008 at 09:02:28PM +0200, Victor Pollex wrote:
> >>   
> >>> Michael Niedermayer schrieb:
> >>>     
> >>>> On Thu, Jul 03, 2008 at 02:51:18PM +0200, Victor Pollex wrote:
> >>>>       
> >> [...]
> >>   
> >>>>> +/*
> >>>>> +    precodition:
> >>>>> +        for all values v in r0, r1, r2, r3: -3971 <= v <= 3971
> >>>>> +
> >>>>> +    postcondition:
> >>>>> +        r3 = ((17 * (r0 + r2) + (22 * r1 + 10 * r3) + c) >> 3)
> >>>>> +        r4 = ((17 * (r0 - r2) - (10 * r1 - 22 * r3) + c) >> 3)
> >>>>> +        r1 = ((17 * (r0 - r2) + (10 * r1 - 22 * r3) + c) >> 3)
> >>>>> +        r2 = ((17 * (r0 + r2) - (22 * r1 + 10 * r3) + c) >> 3)
> >>>>> +        r0 undefined
> >>>>> +        r5 undefined
> >>>>> +        r6 undefined
> >>>>> +        r7 undefined
> >>>>> +*/
> >>>>> +#define TRANSFORM_4X4_ROW(r0,r1,r2,r3,r4,r5,r6,r7,c)\
> >>>>> +    TRANSPOSE4(r0,r1,r2,r3,r4)\
> >>>>> +    TRANSFORM_4X4_COMMON(r0,r3,r4,r2,r1,r5,r6,r7,c)\
> >>>>> +    "paddw "#r4", "#r4"\n\t" /* 2 * (r0 + r2) */\
> >>>>> +    SUMSUB_BA(r3,r4)\
> >>>>> +    "paddw "#r1", "#r3"\n\t"\
> >>>>> +    "paddw "#r7", "#r4"\n\t"\
> >>>>> +    "paddw "#r0", "#r0"\n\t" /* 2 * (r0 - r2) */\
> >>>>> +    SUMSUB_BA(r2,r0)\
> >>>>> +    "paddw "#r5", "#r0"\n\t"\
> >>>>> +    "paddw "#r6", "#r2"\n\t"\
> >>>>> +    TRANSPOSE4(r3,r0,r2,r4,r1)
> >>>>>             
> >>>> It should be possible to merge one transpose into the scantble (the 
> >>>> mpeg1/2/4
> >>>> decoder does that too)
> >>>>
> >>>>         
> >>> I'm not sure if this should be done as I found the following lines in 
> >>> decode_sequence_header in vc1.c
> >>>    if (!v->res_fasttx)
> >>>    {
> >>>        v->s.dsp.vc1_inv_trans_8x8 = ff_simple_idct;
> >>>        v->s.dsp.vc1_inv_trans_8x4 = ff_simple_idct84_add;
> >>>        v->s.dsp.vc1_inv_trans_4x8 = ff_simple_idct48_add;
> >>>        v->s.dsp.vc1_inv_trans_4x4 = ff_simple_idct44_add;
> >>>    }
> >>>     
> >>
> >> The used permutation should of course depend on the used idct
> >>   
> >
> > ok, changed it. I could also try to do the same with the scantables for the 
> > 8x4 and 4x8 transformations if desired.
> 
> yes, please do, review & approval of the changes to the vc1 non asm code is
> left for kostya, he is maintainer of that ...

I have to test how it performs with (and if it breaks) AC prediction,
maybe I also have to add a support for permutation tables in block decoding
before applying this patch.

[...]
> 
> [...]
> -- 
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
> 
> When you are offended at any man's fault, turn to yourself and study your
> own failings. Then you will forget your anger. -- Epictetus