[FFmpeg-devel] [PATCH] MMX implementation of VC-1 inverse transforms

Christophe GISQUET christophe.gisquet
Sun Jan 20 12:37:21 CET 2008


Hi,

Michael Niedermayer a ?crit :
> i think, the following is safe
> 
>         t1 = src[0] + src[2];
>         t2 = src[0] - src[2];
>         t1= 8*t1 + (t1>>1);
>         t2= 8*t2 + (t2>>1);
> 
>         t3 = 11 * src[1] + 5 * src[3];
>         t4 = 11 * src[3] - 5 * src[1];
> 
>         dst[0] = (t1 + t3 + 2) >> 2;
>         dst[1] = (t2 - t4 + 2) >> 2;
>         dst[2] = (t2 + t4 + 2) >> 2;
>         dst[3] = (t1 - t3 + 2) >> 2;
[...]

Ok I've implemented that. I also tried to decompose t3 and t4 as:
t3 = 5(2s1+s3) + s1
t4 = 5(2s3-s1) + s3
(trading one constant loading from memory and 2 multiplies for 2 shift
and 2 additions)

But this is slower, and in fact I can load the multiply constants in
registers (by loading the bias from memory instead), further increasing
the speed difference.

1D2 ~ 1080 dezicycles
1D3 ~ 1120

Anyway, that's mostly for reference, as it was shown the 4x4 dct is not
relevant speedwise and the code for transposing the zz scantables is not
provided.

Best regards,
-- 
Christophe GISQUET
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vc1-dct-4x4.diff
Type: text/x-patch
Size: 6428 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080120/ef89269c/attachment.bin>



More information about the ffmpeg-devel mailing list