[FFmpeg-devel] [PATCH] MMX implementation of VC-1 inverse transforms

Michael Niedermayer michaelni
Tue Jan 22 15:33:27 CET 2008


On Sun, Jan 20, 2008 at 01:27:04PM +0100, Christophe GISQUET wrote:
[...]
> > you should at least do
> >> +        "movq   (%0,"OFF"), %%mm0 \n\t"         \
> >> +        "psubw  %%mm0, %%mm1  \n\t"
> >> +        "psubw  %%mm0, %%mm4  \n\t"
> >> +        "psllw  $2, %%mm0 \n\t"
> >> +        "psubw  %%mm0, %%mm2  \n\t"
> >> +        "paddw  %%mm0, %%mm0  \n\t"
> >> +        "psubw  %%mm0, %%mm4  \n\t"
> >> +        "paddw  %%mm0, %%mm0  \n\t"
> >> +        "psubw  %%mm0, %%mm3  \n\t"
> >> +        "paddw  %%mm0, %%mm1  \n\t"
> > 
> > 2 instructions less, 3 registers less, no multiply, no constants read
> 
> Merging with the needed preshift, it's akin to writing (for instance):
> t1 = 8 * src[1] + 8 * src[3] +  4 * src[5] +  2 * src[7]
>    + (src[5] - src[3]) >> 1;

t1= src[1] + src[3];
t1+=t1 + src[5];
t1+=t1 + src[7];
t1+=t1 + ((src[5] - src[3]) >> 1);


> 
> >> +        : "r"(off), "r"(3*off), "r"(5*off), "r"(7*off),
> > 
> > unneeded wasting of 4 registers to load a constant
> > and resulting more complex and slower addressing
> 
> This I'm not sure how to handle. My goal was to make a function of the
> 1d dct8, and 'off' depends on what transform (8x8, 8x4, 4x8) uses that
> function.

then fix the code so off is the same for all, its maybe just a matter of
changing the scantables (and idct) ...

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

I know you won't believe me, but the highest form of Human Excellence is
to question oneself and others. -- Socrates
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080122/771c738f/attachment.pgp>



More information about the ffmpeg-devel mailing list