[Ffmpeg-cvslog] r5975 - in trunk/libavcodec: dsputil.c dsputil.h i386/dsputil_mmx.c vorbis.c vorbis.h
Loren Merritt
lorenm
Fri Aug 18 20:38:20 CEST 2006
On Fri, 18 Aug 2006, Rich Felker wrote:
> On Thu, Aug 10, 2006 at 09:06:26PM +0200, lorenm wrote:
>> +static void vector_fmul_3dnow(float *dst, const float *src, int len){
>> + long i;
>> + len >>= 1;
>> + for(i=0; i<len; i++) {
>> + asm volatile(
>> + "movq %0, %%mm0 \n\t"
>> + "pfmul %1, %%mm0 \n\t"
>> + "movq %%mm0, %0 \n\t"
>> + :"+m"(dst[i*2])
>> + :"m"(src[i*2])
>> + :"memory"
>> + );
>> + }
>> + asm volatile("femms");
>> +}
>
> Have you read the asm gcc generates? I would guess (have not tested
> however) that writing the loop in asm would be faster than gcc's for
> loops... Writing the loop yourself also allows unrolling the loop
> slightly and interleaving paired iterations, or if nothing else just
> interleaving the pointer increment ops with the 3dnow ops.
Already done, r5983.
Unrolling helped. Just switching the loop to asm made no difference. My
athlon64 is sufficiently out-of-order that it doesn't matter where in the
loop I put the increment op.
--Loren Merritt
More information about the ffmpeg-cvslog
mailing list