[FFmpeg-devel] [PATCH] SIMD-optimized float_to_int32_fmul_scalar()
Justin Ruggles
justin.ruggles
Fri Jan 7 21:38:05 CET 2011
On 01/07/2011 01:52 PM, Justin Ruggles wrote:
> On 01/07/2011 01:31 PM, Michael Niedermayer wrote:
>> also some of these can be unrolled to gain a bit more speed
>
>
> unrolling didn't give me any benefit in testing, but that was just on
> Athlon. I'll do more tests and try it on Atom as well.
dang. well, I didn't test very thoroughly before apparently.
AMD Athlon
loop2 3DNow: 51221
loop4 3DNow: 49101
loop8 3DNow: 43870
loop4 SSE: 50267
loop8 SSE: 51038
loop4 SSE2: 53008
loop8 SSE2: 50139
Intel Atom
loop4 SSE: 149126
loop8 SSE: 107183
loop4 SSE2: 148860
loop8 SSE2: 104592
Based on this data it seems my best option would be to loop over 8
values for all versions and set function pointers like so:
if(mm_flags & AV_CPU_FLAG_SSE){
c->float_to_int32_fmul_scalar = float_to_int32_fmul_scalar_sse;
}
if(mm_flags & AV_CPU_FLAG_SSE2){
c->float_to_int32_fmul_scalar = float_to_int32_fmul_scalar_sse2;
}
if((mm_flags & AV_CPU_FLAG_3DNOW) && !(avctx->flags & CODEC_FLAG_BITEXACT)){
// faster than sse2
c->float_to_int32_fmul_scalar = float_to_int32_fmul_scalar_3dnow;
}
-Justin
More information about the ffmpeg-devel
mailing list