[FFmpeg-devel] [PATCH] VP3 DC-only IDCT

David Conrad lessen42
Sat Mar 13 07:36:20 CET 2010


Hi,

This gives 2-4% faster overall decode for normal files.

Some thoughts:
I can't think of any shortcuts that could make the IDCT faster with 128-byte simd that don't rely on knowing the last non-zero coefficient.

Knowing that before calling the idct, you could do a slightly faster IDCT that assumes the right and bottom of the block are all 0. This seems to be significantly faster only for mmx; for sse2 it's nearly a wash between the added check vs. the time saved.

For an average video, around a third of all idcts are DC-only, a third more could be done with that shortcut (i.e. last_nnz is under 10), and the rest require a full IDCT.

libtheora only does the 10 element shortcut, not DC-only. It also only has a mmx IDCT.

I also haven't really looked at whether a DC-only IDCT is beneficial for mpeg codecs, thus the vp3-specific dsputil function.

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: textmate stdin BrdUyT.txt
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100313/37fd36f6/attachment.txt>



More information about the ffmpeg-devel mailing list