[FFmpeg-devel] [PATCH] Add x86-optimized versions of exponent_min().

Måns Rullgård mans
Fri Feb 4 03:30:39 CET 2011


Justin Ruggles <justin.ruggles at gmail.com> writes:

> On 02/03/2011 07:13 PM, Justin Ruggles wrote:
>
>> On 02/03/2011 06:47 PM, Loren Merritt wrote:
>> 
>>> On Thu, 3 Feb 2011, Justin Ruggles wrote:
>>>> So should we just accept what is an obvious bad case on one 
>>>> configuration because there is a chance that fixing it is worse 
>>>> in another?
>>>
>>> My expectation of the effect of this fix on the performance of the 
>>> configurations you haven't benchmarked, is positive. If you don't want to 
>>> benchmark them, I won't reject this patch on those grounds.
>>>
>>> I am merely saying that as long as you haven't identified the actual 
>>> cause of the slowdowns, as long as performance is still random unto you, 
>>> making decisions based on a thorough benchmark of only one compiler 
>>> configuration is generalizing from one data point.
>>>
>>>> Even the worst case versions are 80-90% faster than the C version in the 
>>>> tested configuration (x86_64 unix). Is it likely that the worst case 
>>>> will be much slower in another?
>>>
>>> Not more than 40% slower. (Some confidence since on this question your 
>>> benchmark counts as 24 data points, not 1.)
>> 
>> 
>> I can recompile with "--extra-cflags=-m32 --extra-ldflags=-m32" and add
>> 24 more data points if you think this would be useful.
>
> Results for x86_32:
>
> LOOP1/LOOP2   MMX   MMX2   SSE2
> -------------------------------
> NONE/NONE :  5150   4640   2735
>    NONE/8 :  5240   3716   2343
>   NONE/16 :  5270   3713*  2360
>    8/NONE :  5123   3765   2899
>       8/8 :  4970   5295   2793
>      8/16 :  5911   4361   2469
>   16/NONE :  4902*  4860   2696
>      16/8 :  5381   3922   2228
>     16/16 :  5382   3954   2226*
>
> And again, the results for x86_64:
>
> LOOP1/LOOP2   MMX   MMX2   SSE2
> -------------------------------
> NONE/NONE :  5270   5283   2757
>    NONE/8 :  5200   5077   2644
>   NONE/16 :  5723   3961   2161
>    8/NONE :  5214   5339   2787
>       8/8 :  5198*  5083   2722
>      8/16 :  5936   3902   2128
>   16/NONE :  6613   4788   2580
>      16/8 :  5490   3702   2020
>     16/16 :  5474   3680*  2000*
>
> So this is definitely not conclusive. :(
>
> One thing that is consistent is that no matter what the alignment of the
> first loop is, increasing the alignment for the 2nd loop gives better
> results for mmx2 and sse2.
>
> I would be ok with doing nothing for mmx since it is wildly inconsistent
> and either only aligning the 2nd loop for mmx2 and sse2 or aligning both
> loops.

All x86_64 CPUs have SSE2, so the MMX(2) performance there doesn't
really matter.

-- 
M?ns Rullg?rd
mans at mansr.com



More information about the ffmpeg-devel mailing list