[FFmpeg-devel] [PATCH] Merge some computations in C code for VC-1 inverse transforms
Christophe GISQUET
christophe.gisquet
Thu Jan 17 20:52:53 CET 2008
Hi,
Michael Niedermayer a ?crit :
>> - adding the bias constant at the first butterflies hardly ever works
>> (unless there's register pressure from the looks of where it works best)
>
> reducing the number of operations only helps with compilers not compensating
> by adding nonsense ;)
Redoing benchmarks, I now notice that this constant adding never
improves speed. To make sure I switched back and forth several times on
the 8x8 and 8x4 functions.
For some reason, the previous test had a tiny bit of improvement but
that doesn't seem to come from it.
The previous benchmark:
>> 8x4: 4926 4267
got back to like 4500, while with the current patch, it's 4400...
Anyway, the improvement I was measuring yesterday was like 20 dezicycles...
> t3= 10*(src[ 8] + src[24]);
> t4= 32*src[24] - t3;
> t3+= 12*src[ 8];
>
> is faster?
> its 3 add, 2 mul, 1 shift vs. 2 add, 4 mul
Should have been at first glance, but this seems to cost 10-30
dezicycles more per loop
Again, maybe it could explained by checking the generated asm code, but
another CPU might see another result with the same code...
Best regards,
Christophe GISQUET
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vc1-dct-4x4.diff
Type: text/x-patch
Size: 4928 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080117/46c8ea81/attachment.bin>
More information about the ffmpeg-devel
mailing list