[Ffmpeg-devel] [PATCH] Some MMX optimizations for Chinese AVS
Fri Jul 28 19:57:41 CEST 2006
attached is a patch which implements inverse transform and a few sub-pel
motion compensation functions in MMX assembler. Because of its similarity,
I took a lot of ideas from the H.264 MMX code. While I checked that the
stuff is functionally correct, I am not sure how to prove the efficiency
of the MMX implementation. Is there some way to find bottlenecks in it?
Like "execution of instruction foo stalls, due to dependency on bar",
which would suggest reordering some instructions? Or can I only try some
changes, benchmark the whole function/block and retry? I am afraid the
latter, as probably the difference of implementations of MMX extension
between different processor models might also be quite large.
Because the motion compensation is implemented only in part, I am not sure
if this is worth a commit. But on the other hand, it makes it easier if somebody has ideas for improvement, and it already gives an overall speedup
of more than 20% on my machine.
Echte DSL-Flatrate dauerhaft f?r 0,- Euro*. Nur noch kurze Zeit!
"Feel free" mit GMX DSL: http://www.gmx.net/de/go/dsl
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 26382 bytes
Desc: not available
More information about the ffmpeg-devel