[FFmpeg-devel] [HACK] 50% faster H.264 decoding

Fri Aug 20 01:30:24 CEST 2010

Hi,

On Thu, Aug 19, 2010 at 7:00 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> can you show benchmarks of w=2 without limiting to mx/my=0
> we know the 00 case will be faster if its optimized by adding a special
> case but we dont know if the additional branch mispredictions and code
> cache pressure will be less than that gain so i think this should be
> tested

After (my local tree + all patches):
757 dezicycles in w=2, 8191 runs, 1 skips
731 dezicycles in w=2, 16383 runs, 1 skips
735 dezicycles in w=2, 32767 runs, 1 skips
723 dezicycles in w=2, 65535 runs, 1 skips
722 dezicycles in w=2, 131068 runs, 4 skips
718 dezicycles in w=2, 262136 runs, 8 skips
717 dezicycles in w=2, 524272 runs, 16 skips

Before (i.e. current SVN):
537 dezicycles in w=2, 8192 runs, 0 skips
521 dezicycles in w=2, 16384 runs, 0 skips
518 dezicycles in w=2, 32767 runs, 1 skips
509 dezicycles in w=2, 65535 runs, 1 skips
506 dezicycles in w=2, 131068 runs, 4 skips
507 dezicycles in w=2, 262140 runs, 4 skips
505 dezicycles in w=2, 524279 runs, 9 skips

Hm... That's weird, how's that possible? Would this be solved by
adding more specialized paths for 1D, or is this just "too
insignificant gain" compared to the added complexity (= misprediction
or so)?

I'll also benchmark mc4 (if that doesn't improve, the whole patch is
pointless ;-) ) and mc8 (should stay the same, otherwise again the
patch is pointless)...

Ronald