[FFmpeg-devel] [PATCH] MMX implementation of VC-1 inverse transforms
Michael Niedermayer
michaelni
Tue Jan 22 20:24:13 CET 2008
On Sun, Jan 20, 2008 at 12:37:21PM +0100, Christophe GISQUET wrote:
> Hi,
>
> Michael Niedermayer a ?crit :
> > i think, the following is safe
> >
> > t1 = src[0] + src[2];
> > t2 = src[0] - src[2];
> > t1= 8*t1 + (t1>>1);
> > t2= 8*t2 + (t2>>1);
> >
> > t3 = 11 * src[1] + 5 * src[3];
> > t4 = 11 * src[3] - 5 * src[1];
> >
> > dst[0] = (t1 + t3 + 2) >> 2;
> > dst[1] = (t2 - t4 + 2) >> 2;
> > dst[2] = (t2 + t4 + 2) >> 2;
> > dst[3] = (t1 - t3 + 2) >> 2;
> [...]
>
> Ok I've implemented that. I also tried to decompose t3 and t4 as:
> t3 = 5(2s1+s3) + s1
> t4 = 5(2s3-s1) + s3
> (trading one constant loading from memory and 2 multiplies for 2 shift
> and 2 additions)
>
> But this is slower, and in fact I can load the multiply constants in
> registers (by loading the bias from memory instead), further increasing
> the speed difference.
>
> 1D2 ~ 1080 dezicycles
> 1D3 ~ 1120
>
> Anyway, that's mostly for reference, as it was shown the 4x4 dct is not
> relevant speedwise and the code for transposing the zz scantables is not
> provided.
>
> Best regards,
> --
> Christophe GISQUET
> Index: libavcodec/i386/vc1dsp_mmx.c
> ===================================================================
> --- libavcodec/i386/vc1dsp_mmx.c (r?vision 11559)
> +++ libavcodec/i386/vc1dsp_mmx.c (copie de travail)
> @@ -467,6 +467,121 @@
> DECLARE_FUNCTION(3, 2)
> DECLARE_FUNCTION(3, 3)
>
> +/* out: d0=R1 d1=R0 d2=R3 d3=R2 */
> +#define IDCT4_1D2(R0, R1, R2, R3, TMP0, TMP1, ADD, SHIFT) \
> + SUMSUB_BA(R2, R0) /* R2=s0+s2 R0=s0-s2 */ \
> + "movq "#R0", "#TMP0" \n\t" \
> + "movq "#R2", "#TMP1" \n\t" \
> + "psllw $3, "#R0" \n\t" \
> + "psllw $3, "#R2" \n\t" \
> + "paddw "MANGLE(ADD)", "#TMP0" \n\t" \
> + "paddw "MANGLE(ADD)", "#TMP1" \n\t" \
maybe the following is faster:
movq MANGLE(ADD), TMP0
movq MANGLE(ADD), TMP1
paddw R0, TMP0
paddw R1, TMP1
psllw $3, R0
psllw $3, R1
> + "psraw $1, "#TMP0" \n\t" \
> + "psraw $1, "#TMP1" \n\t" \
> + "paddw "#TMP0", "#R0" \n\t" \
> + "paddw "#TMP1", "#R2" \n\t" \
> + "movq "#R1", "#TMP0" \n\t" \
> + "movq "#R3", "#TMP1" \n\t" \
> + "pmullw %%mm7, "#R1" \n\t" \
> + "pmullw %%mm7, "#R3" \n\t" \
> + "pmullw %%mm6, "#TMP0" \n\t" \
> + "pmullw %%mm6, "#TMP1" \n\t" \
> + "psubw "#TMP0", "#R3" \n\t" \
> + "paddw "#TMP1", "#R1" \n\t" \
t= 5(A+B)
X= t+ 6A
Y= t-16B
movq R1, TMP0
paddw R3, R1
psllw $4, R3
pmullw mm7(=6), TMP0
pmullw mm6(=5), R1
paddw R1, TMP0
psubw R3, R1
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Concerning the gods, I have no means of knowing whether they exist or not
or of what sort they may be, because of the obscurity of the subject, and
the brevity of human life -- Protagoras
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080122/f7aa877e/attachment.pgp>
More information about the ffmpeg-devel
mailing list