[FFmpeg-devel] [HACK] 50% faster H.264 decoding

Fri Aug 20 16:57:24 CEST 2010

Hi,

On Thu, Aug 19, 2010 at 8:10 PM, Jason Garrett-Glaser
<darkshikari at gmail.com> wrote:
> On Thu, Aug 19, 2010 at 5:05 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
>> On Thu, Aug 19, 2010 at 7:30 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
>>> I'll also benchmark mc4 (if that doesn't improve, the whole patch is
>>> pointless ;-) ) and mc8 (should stay the same, otherwise again the
>>> patch is pointless)...
>>
>> After looking through these even-weirder looking ones, I decided that
>> something was wrong and noticed one of my patches had HAVE_MMX
>> commented out, hence causing odd numbers from the C-versions... Don't
>> believe my previous mc2 numbers also, now they're about identical
>> before vs. after. ;-). For reference, here's mc4/mc8 numbers also.
>>
>> mc8 after:
>> 811 dezicycles in w=2, 65527 runs, 9 skips
>> 796 dezicycles in w=2, 131061 runs, 11 skips
>>
>> mc8 before:
>> 827 dezicycles in w=2, 65530 runs, 6 skips
>> 807 dezicycles in w=2, 131065 runs, 7 skips
>>
>> So that's about the same.
>>
>> mc4 after:
>> 504 dezicycles in w=2, 262140 runs, 4 skips
>> 501 dezicycles in w=2, 524275 runs, 13 skips
>> 497 dezicycles in w=2, 1048553 runs, 23 skips
>>
>> mc4 before:
>> 503 dezicycles in w=2, 131066 runs, 6 skips
>> 499 dezicycles in w=2, 262135 runs, 9 skips
>> 496 dezicycles in w=2, 524272 runs, 16 skips
>> 499 dezicycles in w=2, 1048543 runs, 33 skips
>>
>> Also about the same, which is probably because x=0, y=0 doesn't occur
>> very much statistically in a random distribution, only 1 in 64).
>
> Except that:
>
> a) <0,0> is an extraordinarily common motion vector.
> b) fullpel is more common than subpel (all else being equal). ?of
> course there are 4 fullpel positions per chroma fullpel position.
>
>> That's a little disappointing, I would've expected to see them more,
>> but then again I'm testing reference stream samples for now...
>>
>> I should probably write 1D versions also, which occur more often (14
>> in 64) and then re-measure as I did just now, to be able to see any
>> benefit at all. Does that make sense?
>
> I can confirm from CoreAVC that 1D does help. ?One can then add a 0D
> branch from the 1D code.

OK, I'm working on some basic 1D code to see if that helps any. Will
update here once it works...

Ronald