[FFmpeg-devel] [PATCH] mmx implementation of vc-1 inverse transformations
Kostya
kostya.shishkov
Mon Jul 21 06:51:08 CEST 2008
On Mon, Jul 21, 2008 at 02:50:10AM +0200, Michael Niedermayer wrote:
> On Thu, Jul 17, 2008 at 11:29:59PM +0200, Victor Pollex wrote:
> > Michael Niedermayer schrieb:
> >> On Mon, Jul 07, 2008 at 09:02:28PM +0200, Victor Pollex wrote:
> >>
> >>> Michael Niedermayer schrieb:
> >>>
> >>>> On Thu, Jul 03, 2008 at 02:51:18PM +0200, Victor Pollex wrote:
> >>>>
> >> [...]
> >>
> >>>>> +/*
> >>>>> + precodition:
> >>>>> + for all values v in r0, r1, r2, r3: -3971 <= v <= 3971
> >>>>> +
> >>>>> + postcondition:
> >>>>> + r3 = ((17 * (r0 + r2) + (22 * r1 + 10 * r3) + c) >> 3)
> >>>>> + r4 = ((17 * (r0 - r2) - (10 * r1 - 22 * r3) + c) >> 3)
> >>>>> + r1 = ((17 * (r0 - r2) + (10 * r1 - 22 * r3) + c) >> 3)
> >>>>> + r2 = ((17 * (r0 + r2) - (22 * r1 + 10 * r3) + c) >> 3)
> >>>>> + r0 undefined
> >>>>> + r5 undefined
> >>>>> + r6 undefined
> >>>>> + r7 undefined
> >>>>> +*/
> >>>>> +#define TRANSFORM_4X4_ROW(r0,r1,r2,r3,r4,r5,r6,r7,c)\
> >>>>> + TRANSPOSE4(r0,r1,r2,r3,r4)\
> >>>>> + TRANSFORM_4X4_COMMON(r0,r3,r4,r2,r1,r5,r6,r7,c)\
> >>>>> + "paddw "#r4", "#r4"\n\t" /* 2 * (r0 + r2) */\
> >>>>> + SUMSUB_BA(r3,r4)\
> >>>>> + "paddw "#r1", "#r3"\n\t"\
> >>>>> + "paddw "#r7", "#r4"\n\t"\
> >>>>> + "paddw "#r0", "#r0"\n\t" /* 2 * (r0 - r2) */\
> >>>>> + SUMSUB_BA(r2,r0)\
> >>>>> + "paddw "#r5", "#r0"\n\t"\
> >>>>> + "paddw "#r6", "#r2"\n\t"\
> >>>>> + TRANSPOSE4(r3,r0,r2,r4,r1)
> >>>>>
> >>>> It should be possible to merge one transpose into the scantble (the
> >>>> mpeg1/2/4
> >>>> decoder does that too)
> >>>>
> >>>>
> >>> I'm not sure if this should be done as I found the following lines in
> >>> decode_sequence_header in vc1.c
> >>> if (!v->res_fasttx)
> >>> {
> >>> v->s.dsp.vc1_inv_trans_8x8 = ff_simple_idct;
> >>> v->s.dsp.vc1_inv_trans_8x4 = ff_simple_idct84_add;
> >>> v->s.dsp.vc1_inv_trans_4x8 = ff_simple_idct48_add;
> >>> v->s.dsp.vc1_inv_trans_4x4 = ff_simple_idct44_add;
> >>> }
> >>>
> >>
> >> The used permutation should of course depend on the used idct
> >>
> >
> > ok, changed it. I could also try to do the same with the scantables for the
> > 8x4 and 4x8 transformations if desired.
>
> yes, please do, review & approval of the changes to the vc1 non asm code is
> left for kostya, he is maintainer of that ...
I have to test how it performs with (and if it breaks) AC prediction,
maybe I also have to add a support for permutation tables in block decoding
before applying this patch.
[...]
>
> [...]
> --
> Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> When you are offended at any man's fault, turn to yourself and study your
> own failings. Then you will forget your anger. -- Epictetus
More information about the ffmpeg-devel
mailing list