[FFmpeg-devel] [PATCH] SIMD-optimized float_to_int32_fmul_scalar()

Fri Jan 7 20:19:21 CET 2011

On 01/07/2011 01:49 PM, Loren Merritt wrote:

> On Fri, 7 Jan 2011, Justin Ruggles wrote:
> 
>> This patch implements float_to_int32_fmul_scalar() for 3dnow, sse, and
>> sse2 and uses it in the AC3 encoder.
> 
>> @@ -2303,6 +2303,65 @@ static void int32_to_float_fmul_scalar_sse2(float *dst, const int *src, float mu
>>     );
>> }
>>
>> +static void float_to_int32_fmul_scalar_3dnow(int32_t *dst, const float *src, float mul, int len)
>> +{
>> +    /* note: pf2id conversion uses truncation, not round-to-nearest */
>> +    x86_reg i = (len-4)*4;
>> +    __asm__ volatile(
>> +        "movq          %3,   %%mm1      \n\t"
> 
> movd

thanks for catching that.

> 
>> @@ -2910,6 +2971,8 @@ void dsputil_init_mmx(DSPContext* c, AVCodecContext *avctx)
>>             c->vector_fmul_add = vector_fmul_add_3dnow; // faster than sse
>>         if(mm_flags & AV_CPU_FLAG_SSE2){
>>             c->int32_to_float_fmul_scalar = int32_to_float_fmul_scalar_sse2;
>> +            if (!(mm_flags & AV_CPU_FLAG_SSE2SLOW))
>> +                c->float_to_int32_fmul_scalar = float_to_int32_fmul_scalar_sse2;
> 
> AV_CPU_FLAG_SSE2SLOW is an alternative to AV_CPU_FLAG_SSE2. They won't 
> both be set at once. It means "pentium-m's SSE2 is so slow that by default 
> we pretend it doesn't exist, and only make an exception if specifically 
> tested".
> If you intended it to detect athlon64, then you picked the wrong flag, and 
> there isn't a right one yet.

ok. I don't know enough about cpu detection to implement it. I could
port it from x264, but it's GPL.

-Justin