[FFmpeg-devel] [PATCH] SIMD-optimized float_to_int32_fmul_scalar()

Justin Ruggles justin.ruggles
Fri Jan 7 21:50:56 CET 2011


On 01/07/2011 03:38 PM, Jason Garrett-Glaser wrote:

> On Fri, Jan 7, 2011 at 3:38 PM, Justin Ruggles <justin.ruggles at gmail.com> wrote:
>> On 01/07/2011 01:52 PM, Justin Ruggles wrote:
>>
>>> On 01/07/2011 01:31 PM, Michael Niedermayer wrote:
>>>> also some of these can be unrolled to gain a bit more speed
>>>
>>>
>>> unrolling didn't give me any benefit in testing, but that was just on
>>> Athlon.  I'll do more tests and try it on Atom as well.
>>
>>
>> dang. well, I didn't test very thoroughly before apparently.
>>
>> AMD Athlon
>> loop2 3DNow: 51221
>> loop4 3DNow: 49101
>> loop8 3DNow: 43870
>> loop4   SSE: 50267
>> loop8   SSE: 51038
>> loop4  SSE2: 53008
>> loop8  SSE2: 50139
>>
>> Intel Atom
>> loop4   SSE: 149126
>> loop8   SSE: 107183
>> loop4  SSE2: 148860
>> loop8  SSE2: 104592
>>
>> Based on this data it seems my best option would be to loop over 8
>> values for all versions and set function pointers like so:
>>
>> if(mm_flags & AV_CPU_FLAG_SSE){
>>    c->float_to_int32_fmul_scalar = float_to_int32_fmul_scalar_sse;
>> }
>> if(mm_flags & AV_CPU_FLAG_SSE2){
>>    c->float_to_int32_fmul_scalar = float_to_int32_fmul_scalar_sse2;
>> }
>> if((mm_flags & AV_CPU_FLAG_3DNOW) && !(avctx->flags & CODEC_FLAG_BITEXACT)){
>>    // faster than sse2
>>    c->float_to_int32_fmul_scalar = float_to_int32_fmul_scalar_3dnow;
>> }
> 
> Have you forgotten about the existence of the Phenom, a much more
> commonly used CPU than the Athlon 64?


Ah, ok.  So Phenom has 3DNow but has faster SSE2.  So we still need that
Athlon check for slower SSE2.

-Justin



More information about the ffmpeg-devel mailing list