[FFmpeg-devel] [PATCH] MMX implementation of VC-1 inverse transforms
Michael Niedermayer
michaelni
Tue Jan 22 15:33:27 CET 2008
On Sun, Jan 20, 2008 at 01:27:04PM +0100, Christophe GISQUET wrote:
[...]
> > you should at least do
> >> + "movq (%0,"OFF"), %%mm0 \n\t" \
> >> + "psubw %%mm0, %%mm1 \n\t"
> >> + "psubw %%mm0, %%mm4 \n\t"
> >> + "psllw $2, %%mm0 \n\t"
> >> + "psubw %%mm0, %%mm2 \n\t"
> >> + "paddw %%mm0, %%mm0 \n\t"
> >> + "psubw %%mm0, %%mm4 \n\t"
> >> + "paddw %%mm0, %%mm0 \n\t"
> >> + "psubw %%mm0, %%mm3 \n\t"
> >> + "paddw %%mm0, %%mm1 \n\t"
> >
> > 2 instructions less, 3 registers less, no multiply, no constants read
>
> Merging with the needed preshift, it's akin to writing (for instance):
> t1 = 8 * src[1] + 8 * src[3] + 4 * src[5] + 2 * src[7]
> + (src[5] - src[3]) >> 1;
t1= src[1] + src[3];
t1+=t1 + src[5];
t1+=t1 + src[7];
t1+=t1 + ((src[5] - src[3]) >> 1);
>
> >> + : "r"(off), "r"(3*off), "r"(5*off), "r"(7*off),
> >
> > unneeded wasting of 4 registers to load a constant
> > and resulting more complex and slower addressing
>
> This I'm not sure how to handle. My goal was to make a function of the
> 1d dct8, and 'off' depends on what transform (8x8, 8x4, 4x8) uses that
> function.
then fix the code so off is the same for all, its maybe just a matter of
changing the scantables (and idct) ...
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
I know you won't believe me, but the highest form of Human Excellence is
to question oneself and others. -- Socrates
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080122/771c738f/attachment.pgp>
More information about the ffmpeg-devel
mailing list