[FFmpeg-devel] [PATCH] SIMD-optimized float_to_int32_fmul_scalar()
Justin Ruggles
justin.ruggles
Fri Jan 7 20:19:21 CET 2011
On 01/07/2011 01:49 PM, Loren Merritt wrote:
> On Fri, 7 Jan 2011, Justin Ruggles wrote:
>
>> This patch implements float_to_int32_fmul_scalar() for 3dnow, sse, and
>> sse2 and uses it in the AC3 encoder.
>
>> @@ -2303,6 +2303,65 @@ static void int32_to_float_fmul_scalar_sse2(float *dst, const int *src, float mu
>> );
>> }
>>
>> +static void float_to_int32_fmul_scalar_3dnow(int32_t *dst, const float *src, float mul, int len)
>> +{
>> + /* note: pf2id conversion uses truncation, not round-to-nearest */
>> + x86_reg i = (len-4)*4;
>> + __asm__ volatile(
>> + "movq %3, %%mm1 \n\t"
>
> movd
thanks for catching that.
>
>> @@ -2910,6 +2971,8 @@ void dsputil_init_mmx(DSPContext* c, AVCodecContext *avctx)
>> c->vector_fmul_add = vector_fmul_add_3dnow; // faster than sse
>> if(mm_flags & AV_CPU_FLAG_SSE2){
>> c->int32_to_float_fmul_scalar = int32_to_float_fmul_scalar_sse2;
>> + if (!(mm_flags & AV_CPU_FLAG_SSE2SLOW))
>> + c->float_to_int32_fmul_scalar = float_to_int32_fmul_scalar_sse2;
>
> AV_CPU_FLAG_SSE2SLOW is an alternative to AV_CPU_FLAG_SSE2. They won't
> both be set at once. It means "pentium-m's SSE2 is so slow that by default
> we pretend it doesn't exist, and only make an exception if specifically
> tested".
> If you intended it to detect athlon64, then you picked the wrong flag, and
> there isn't a right one yet.
ok. I don't know enough about cpu detection to implement it. I could
port it from x264, but it's GPL.
-Justin
More information about the ffmpeg-devel
mailing list