[Ffmpeg-cvslog] r5975 - in trunk/libavcodec: dsputil.c dsputil.h i386/dsputil_mmx.c vorbis.c vorbis.h

Loren Merritt lorenm
Fri Aug 18 20:38:20 CEST 2006


On Fri, 18 Aug 2006, Rich Felker wrote:
> On Thu, Aug 10, 2006 at 09:06:26PM +0200, lorenm wrote:
>> +static void vector_fmul_3dnow(float *dst, const float *src, int len){
>> +    long i;
>> +    len >>= 1;
>> +    for(i=0; i<len; i++) {
>> +        asm volatile(
>> +            "movq  %0, %%mm0 \n\t"
>> +            "pfmul %1, %%mm0 \n\t"
>> +            "movq  %%mm0, %0 \n\t"
>> +            :"+m"(dst[i*2])
>> +            :"m"(src[i*2])
>> +            :"memory"
>> +        );
>> +    }
>> +    asm volatile("femms");
>> +}
>
> Have you read the asm gcc generates? I would guess (have not tested
> however) that writing the loop in asm would be faster than gcc's for
> loops... Writing the loop yourself also allows unrolling the loop
> slightly and interleaving paired iterations, or if nothing else just
> interleaving the pointer increment ops with the 3dnow ops.

Already done, r5983.
Unrolling helped. Just switching the loop to asm made no difference. My 
athlon64 is sufficiently out-of-order that it doesn't matter where in the 
loop I put the increment op.

--Loren Merritt




More information about the ffmpeg-cvslog mailing list