[FFmpeg-devel] [PATCH] Merge some computations in C code for VC-1 inverse transforms
Michael Niedermayer
michaelni
Wed Jan 16 22:39:13 CET 2008
On Wed, Jan 16, 2008 at 09:42:46PM +0100, Christophe GISQUET wrote:
> Hi,
>
> M. Niedermayer highlighted some sub-optimality in mail
> http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/2008-January/040591.html
>
> These sub-optimalities can also be seen in the C version of those
> inverse transforms. Testing on IA32, it seems that:
> - performing the butterfly a,b->a-b,a+b with the no-additional-register
> trick never works
i suspect thats because it can be compiled to
lea (a,b), c
sub b,a
> - adding the bias constant at the first butterflies hardly ever works
> (unless there's register pressure from the looks of where it works best)
reducing the number of operations only helps with compilers not compensating
by adding nonsense ;)
> - merging some computation (in the 4x1 or 1x4 parts) is always a win
>
> None of them works with any stage of the 8x8 inverse transform.
>
> Here are some benches with *_TIMER:
> bef. after
> 4x4: 2329 2100
> 4x8: 4872 4448
> 8x4: 4926 4267
>
> So very roughly, a 10% improvement (YMMV).
[...]
> + t3 = 22 * src[ 8] + 10 * src[24];
> + t4 = 22 * src[24] - 10 * src[ 8];
maybe
t3= 10*(src[ 8] + src[24]);
t4= 32*src[24] - t3;
t3+= 12*src[ 8];
is faster?
its 3 add, 2 mul, 1 shift vs. 2 add, 4 mul
anyway, patch approval is kostyas territory ...
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Thouse who are best at talking, realize last or never when they are wrong.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080116/50236863/attachment.pgp>
More information about the ffmpeg-devel
mailing list