[FFmpeg-devel] [PATCH] MMX implementation of VC-1 inverse transforms
Christophe GISQUET
christophe.gisquet
Sun Jan 20 12:37:21 CET 2008
Hi,
Michael Niedermayer a ?crit :
> i think, the following is safe
>
> t1 = src[0] + src[2];
> t2 = src[0] - src[2];
> t1= 8*t1 + (t1>>1);
> t2= 8*t2 + (t2>>1);
>
> t3 = 11 * src[1] + 5 * src[3];
> t4 = 11 * src[3] - 5 * src[1];
>
> dst[0] = (t1 + t3 + 2) >> 2;
> dst[1] = (t2 - t4 + 2) >> 2;
> dst[2] = (t2 + t4 + 2) >> 2;
> dst[3] = (t1 - t3 + 2) >> 2;
[...]
Ok I've implemented that. I also tried to decompose t3 and t4 as:
t3 = 5(2s1+s3) + s1
t4 = 5(2s3-s1) + s3
(trading one constant loading from memory and 2 multiplies for 2 shift
and 2 additions)
But this is slower, and in fact I can load the multiply constants in
registers (by loading the bias from memory instead), further increasing
the speed difference.
1D2 ~ 1080 dezicycles
1D3 ~ 1120
Anyway, that's mostly for reference, as it was shown the 4x4 dct is not
relevant speedwise and the code for transposing the zz scantables is not
provided.
Best regards,
--
Christophe GISQUET
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vc1-dct-4x4.diff
Type: text/x-patch
Size: 6428 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080120/ef89269c/attachment.bin>
More information about the ffmpeg-devel
mailing list