[FFmpeg-devel] [HACK] 50% faster H.264 decoding

Ronald S. Bultje rsbultje
Fri Aug 20 16:57:24 CEST 2010


Hi,

On Thu, Aug 19, 2010 at 8:10 PM, Jason Garrett-Glaser
<darkshikari at gmail.com> wrote:
> On Thu, Aug 19, 2010 at 5:05 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
>> On Thu, Aug 19, 2010 at 7:30 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
>>> I'll also benchmark mc4 (if that doesn't improve, the whole patch is
>>> pointless ;-) ) and mc8 (should stay the same, otherwise again the
>>> patch is pointless)...
>>
>> After looking through these even-weirder looking ones, I decided that
>> something was wrong and noticed one of my patches had HAVE_MMX
>> commented out, hence causing odd numbers from the C-versions... Don't
>> believe my previous mc2 numbers also, now they're about identical
>> before vs. after. ;-). For reference, here's mc4/mc8 numbers also.
>>
>> mc8 after:
>> 811 dezicycles in w=2, 65527 runs, 9 skips
>> 796 dezicycles in w=2, 131061 runs, 11 skips
>>
>> mc8 before:
>> 827 dezicycles in w=2, 65530 runs, 6 skips
>> 807 dezicycles in w=2, 131065 runs, 7 skips
>>
>> So that's about the same.
>>
>> mc4 after:
>> 504 dezicycles in w=2, 262140 runs, 4 skips
>> 501 dezicycles in w=2, 524275 runs, 13 skips
>> 497 dezicycles in w=2, 1048553 runs, 23 skips
>>
>> mc4 before:
>> 503 dezicycles in w=2, 131066 runs, 6 skips
>> 499 dezicycles in w=2, 262135 runs, 9 skips
>> 496 dezicycles in w=2, 524272 runs, 16 skips
>> 499 dezicycles in w=2, 1048543 runs, 33 skips
>>
>> Also about the same, which is probably because x=0, y=0 doesn't occur
>> very much statistically in a random distribution, only 1 in 64).
>
> Except that:
>
> a) <0,0> is an extraordinarily common motion vector.
> b) fullpel is more common than subpel (all else being equal). ?of
> course there are 4 fullpel positions per chroma fullpel position.
>
>> That's a little disappointing, I would've expected to see them more,
>> but then again I'm testing reference stream samples for now...
>>
>> I should probably write 1D versions also, which occur more often (14
>> in 64) and then re-measure as I did just now, to be able to see any
>> benefit at all. Does that make sense?
>
> I can confirm from CoreAVC that 1D does help. ?One can then add a 0D
> branch from the 1D code.

OK, I'm working on some basic 1D code to see if that helps any. Will
update here once it works...

Ronald



More information about the ffmpeg-devel mailing list