[FFmpeg-devel] [PATCH] Merge some computations in C code for VC-1 inverse transforms
Christophe GISQUET
christophe.gisquet
Wed Jan 16 21:42:46 CET 2008
Hi,
M. Niedermayer highlighted some sub-optimality in mail
http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/2008-January/040591.html
These sub-optimalities can also be seen in the C version of those
inverse transforms. Testing on IA32, it seems that:
- performing the butterfly a,b->a-b,a+b with the no-additional-register
trick never works
- adding the bias constant at the first butterflies hardly ever works
(unless there's register pressure from the looks of where it works best)
- merging some computation (in the 4x1 or 1x4 parts) is always a win
None of them works with any stage of the 8x8 inverse transform.
Here are some benches with *_TIMER:
bef. after
4x4: 2329 2100
4x8: 4872 4448
8x4: 4926 4267
So very roughly, a 10% improvement (YMMV).
Best regards,
Christophe GISQUET
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vc1-dct-c.diff
Type: text/x-patch
Size: 4961 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080116/297dca81/attachment.bin>
More information about the ffmpeg-devel
mailing list