[FFmpeg-devel] [PATCH] h264pred16x16 plane sse2/ssse3 optimizations
Ronald S. Bultje
Wed Sep 29 13:15:10 CEST 2010
On Wed, Sep 29, 2010 at 12:13 AM, Loren Merritt <lorenm at u.washington.edu> wrote:
> On Tue, 28 Sep 2010, Ronald S. Bultje wrote:
>> this appeared high on my cathedral profiling, so I'm tackling this one
>> first. Can/will do 8x8 and 4x4 + mmx/mmx2 functions later also (Jason
>> tells me x264 uses 8x8/4x4 plane-mode a lot),
> There is no 4x4 plane-mode. Jason said that x264 doesn't use i16x16 as much
> as the cathedral sample does.
I'll test some x264-generated content (or other random content) to see
if 8x8plane ops are worth it.
> But what x264 uses intead isn't a version of this function.
I tested x264's from predict-c.c (might not be 100% optimal, I just
changed every occurrence of FDEC_STRIDE to stride), which looks a lot
more like what this function does (as opposed to predict-a.asm's
half-function) and it was almost 10 cycles slower than this version.
One potential reason why the cycle-count in your profiling was lower
(Jason said yours is 73 cycles on ssse3) is exactly because of
More information about the ffmpeg-devel