[FFmpeg-devel] [PATCH] h264pred16x16 plane sse2/ssse3 optimizations
Ronald S. Bultje
Wed Sep 29 04:31:51 CEST 2010
this appeared high on my cathedral profiling, so I'm tackling this one
first. Can/will do 8x8 and 4x4 + mmx/mmx2 functions later also (Jason
tells me x264 uses 8x8/4x4 plane-mode a lot), but it's quite a bit of
testing so I thought I'd ask for review of this piece already. Jason
also tells me there's code in x264 that I should look at but somehow
it looks completely different/incompatible so I'm not sure if I'm
looking at the right place/version...
make fate-h264, fate-svq3 and fate-real-rv40 pass with this patch
(tested h264 both with and without ssse3 enabled).
Numbers (Core i7, x86-64, OSX 10.6.4, cathedral sample):
before: 6719 dezicycles in pred16x16_plane, 262062 runs, 82 skips
after: 1170 dezicycles in pred16x16_plane, 262128 runs, 16 skips
(avg 8.067, ~3.6% faster)
Didn't profile svq3/rv40, speedup is of course sample-dependent. And
Diego owes me beer now (5%!).
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 8500 bytes
Desc: not available
More information about the ffmpeg-devel