[FFmpeg-devel] [PATCH] H264 MC8 SSSE3 minor speedups

Ronald S. Bultje rsbultje
Fri Dec 24 18:31:44 CET 2010


Hi,

On Fri, Dec 17, 2010 at 9:50 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Fri, Dec 17, 2010 at 08:28:55PM -0500, Ronald S. Bultje wrote:
>> Hi,
>>
>> On Sat, Aug 21, 2010 at 1:18 PM, Loren Merritt <lorenm at u.washington.edu> wrote:
>> > On Sat, 21 Aug 2010, Ronald S. Bultje wrote:
>> >
>> >> 604 dezicycles in w=8, 65535 runs, 1 skips
>> >> 603 dezicycles in w=8, 131067 runs, 5 skips
>> >> 606 dezicycles in w=8, 262137 runs, 7 skips
>> >> 606 dezicycles in w=8, 524275 runs, 13 skips
>> >> 605 dezicycles in w=8, 1048552 runs, 24 skips
>> >
>> > Bad benchmark technique. You should report only the last dezicycle line
>> > (i.e. the one with the highest # of runs, which includes all the previous
>> > data). But run the whole program multiple times, and report the last line
>> > from each.
>>
>> Late...
>>
>> first change (movq+mohlhps -> movdqa, before
>> 532 dezicycles in mc8, 524271 runs, 17 skips
>> 532 dezicycles in mc8, 524273 runs, 15 skips
>> 539 dezicycles in mc8, 524267 runs, 21 skips
>> 537 dezicycles in mc8, 524272 runs, 16 skips
>> 532 dezicycles in mc8, 524274 runs, 14 skips
>> 538 dezicycles in mc8, 524274 runs, 14 skips
>> after
>> 533 dezicycles in mc8, 524278 runs, 10 skips
>> 528 dezicycles in mc8, 524267 runs, 21 skips
>> 527 dezicycles in mc8, 524272 runs, 16 skips
>> 525 dezicycles in mc8, 524269 runs, 19 skips
>> 525 dezicycles in mc8, 524274 runs, 14 skips
>> 530 dezicycles in mc8, 524276 runs, 12 skips
>>
>> So a little (~1 cycle) faster.
>>
>> Then the other change (remove movdqa), before (with above change included):
>> 1004 dezicycles in mc8, 131070 runs, 2 skips
>> 1008 dezicycles in mc8, 131066 runs, 6 skips
>> 996 dezicycles in mc8, 131068 runs, 4 skips
>> 1000 dezicycles in mc8, 131068 runs, 4 skips
>> 1055 dezicycles in mc8, 131065 runs, 7 skips
>> 1006 dezicycles in mc8, 131069 runs, 3 skips
>> after:
>> 1007 dezicycles in mc8, 131070 runs, 2 skips
>> 1005 dezicycles in mc8, 131067 runs, 5 skips
>> 1017 dezicycles in mc8, 131068 runs, 4 skips
>> 1008 dezicycles in mc8, 131064 runs, 8 skips
>> 990 dezicycles in mc8, 131070 runs, 2 skips
>> 1014 dezicycles in mc8, 131067 runs, 5 skips
>>
>> So confusingly, the 2nd change appears to not be faster. Also binary
>> size is the same (probably b/c of alignment further down).
>
>
>> What to do?
>
> random ideas:
> 1. find something else to optimize

Yeah, probably... I applied the first half since it is faster.
Consider the second half dropped then.

Ronald



More information about the ffmpeg-devel mailing list