[FFmpeg-devel] [HACK] 50% faster H.264 decoding

Jason Garrett-Glaser darkshikari
Fri Aug 20 02:10:02 CEST 2010


On Thu, Aug 19, 2010 at 5:05 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
> Hi,
>
> On Thu, Aug 19, 2010 at 7:30 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
>> I'll also benchmark mc4 (if that doesn't improve, the whole patch is
>> pointless ;-) ) and mc8 (should stay the same, otherwise again the
>> patch is pointless)...
>
> After looking through these even-weirder looking ones, I decided that
> something was wrong and noticed one of my patches had HAVE_MMX
> commented out, hence causing odd numbers from the C-versions... Don't
> believe my previous mc2 numbers also, now they're about identical
> before vs. after. ;-). For reference, here's mc4/mc8 numbers also.
>
> mc8 after:
> 811 dezicycles in w=2, 65527 runs, 9 skips
> 796 dezicycles in w=2, 131061 runs, 11 skips
>
> mc8 before:
> 827 dezicycles in w=2, 65530 runs, 6 skips
> 807 dezicycles in w=2, 131065 runs, 7 skips
>
> So that's about the same.
>
> mc4 after:
> 504 dezicycles in w=2, 262140 runs, 4 skips
> 501 dezicycles in w=2, 524275 runs, 13 skips
> 497 dezicycles in w=2, 1048553 runs, 23 skips
>
> mc4 before:
> 503 dezicycles in w=2, 131066 runs, 6 skips
> 499 dezicycles in w=2, 262135 runs, 9 skips
> 496 dezicycles in w=2, 524272 runs, 16 skips
> 499 dezicycles in w=2, 1048543 runs, 33 skips
>
> Also about the same, which is probably because x=0, y=0 doesn't occur
> very much statistically in a random distribution, only 1 in 64).

Except that:

a) <0,0> is an extraordinarily common motion vector.
b) fullpel is more common than subpel (all else being equal).  of
course there are 4 fullpel positions per chroma fullpel position.

> That's a little disappointing, I would've expected to see them more,
> but then again I'm testing reference stream samples for now...
>
> I should probably write 1D versions also, which occur more often (14
> in 64) and then re-measure as I did just now, to be able to see any
> benefit at all. Does that make sense?

I can confirm from CoreAVC that 1D does help.  One can then add a 0D
branch from the 1D code.

Dark Shikari



More information about the ffmpeg-devel mailing list