[FFmpeg-devel] [PATCH] Add x86-optimized versions of exponent_min().
Justin Ruggles
justin.ruggles
Fri Feb 4 02:59:48 CET 2011
On 02/03/2011 07:13 PM, Justin Ruggles wrote:
> On 02/03/2011 06:47 PM, Loren Merritt wrote:
>
>> On Thu, 3 Feb 2011, Justin Ruggles wrote:
>>> So should we just accept what is an obvious bad case on one
>>> configuration because there is a chance that fixing it is worse
>>> in another?
>>
>> My expectation of the effect of this fix on the performance of the
>> configurations you haven't benchmarked, is positive. If you don't want to
>> benchmark them, I won't reject this patch on those grounds.
>>
>> I am merely saying that as long as you haven't identified the actual
>> cause of the slowdowns, as long as performance is still random unto you,
>> making decisions based on a thorough benchmark of only one compiler
>> configuration is generalizing from one data point.
>>
>>> Even the worst case versions are 80-90% faster than the C version in the
>>> tested configuration (x86_64 unix). Is it likely that the worst case
>>> will be much slower in another?
>>
>> Not more than 40% slower. (Some confidence since on this question your
>> benchmark counts as 24 data points, not 1.)
>
>
> I can recompile with "--extra-cflags=-m32 --extra-ldflags=-m32" and add
> 24 more data points if you think this would be useful.
Results for x86_32:
LOOP1/LOOP2 MMX MMX2 SSE2
-------------------------------
NONE/NONE : 5150 4640 2735
NONE/8 : 5240 3716 2343
NONE/16 : 5270 3713* 2360
8/NONE : 5123 3765 2899
8/8 : 4970 5295 2793
8/16 : 5911 4361 2469
16/NONE : 4902* 4860 2696
16/8 : 5381 3922 2228
16/16 : 5382 3954 2226*
And again, the results for x86_64:
LOOP1/LOOP2 MMX MMX2 SSE2
-------------------------------
NONE/NONE : 5270 5283 2757
NONE/8 : 5200 5077 2644
NONE/16 : 5723 3961 2161
8/NONE : 5214 5339 2787
8/8 : 5198* 5083 2722
8/16 : 5936 3902 2128
16/NONE : 6613 4788 2580
16/8 : 5490 3702 2020
16/16 : 5474 3680* 2000*
So this is definitely not conclusive. :(
One thing that is consistent is that no matter what the alignment of the
first loop is, increasing the alignment for the 2nd loop gives better
results for mmx2 and sse2.
I would be ok with doing nothing for mmx since it is wildly inconsistent
and either only aligning the 2nd loop for mmx2 and sse2 or aligning both
loops.
-Justin
More information about the ffmpeg-devel
mailing list