[FFmpeg-devel] [PATCH] h264pred16x16 plane sse2/ssse3 optimizations

Ronald S. Bultje rsbultje
Wed Sep 29 04:31:51 CEST 2010


Hi,

this appeared high on my cathedral profiling, so I'm tackling this one
first. Can/will do 8x8 and 4x4 + mmx/mmx2 functions later also (Jason
tells me x264 uses 8x8/4x4 plane-mode a lot), but it's quite a bit of
testing so I thought I'd ask for review of this piece already. Jason
also tells me there's code in x264 that I should look at but somehow
it looks completely different/incompatible so I'm not sure if I'm
looking at the right place/version...

make fate-h264, fate-svq3 and fate-real-rv40 pass with this patch
(tested h264 both with and without ssse3 enabled).

Numbers (Core i7, x86-64, OSX 10.6.4, cathedral sample):

before: 6719 dezicycles in pred16x16_plane, 262062 runs, 82 skips
after: 1170 dezicycles in pred16x16_plane, 262128 runs, 16 skips
(83% speedup)

time before:
8.398
8.382
8.309
(avg 8.363)

after:
8.000
8.072
8.130
(avg 8.067, ~3.6% faster)

Didn't profile svq3/rv40, speedup is of course sample-dependent. And
Diego owes me beer now (5%!).

Ronald
-------------- next part --------------
A non-text attachment was scrubbed...
Name: h264pred_pred16x16planecompat_simd.patch
Type: application/octet-stream
Size: 8500 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100928/e23e8011/attachment.obj>



More information about the ffmpeg-devel mailing list