[Mplayer-users] Question: Fastmemcpy and 3dnow
nickols_k at mail.ru
Fri Apr 20 14:48:46 CEST 2001
>> > With 3DNow: v1-v2=740241.96
>> >Without 3DNow: v1-v2=781859.25
>> >I'd like to see what numbers the K6-3 gives.
>with 3dnow: 704465 ( ran once ), 406xxx avarage of 5000 runs
>without 3dnow: 761410 ( ran once ), 438803 avarage of 5000 runs
>I rewrite fastmemcpy code, my new code copies 128 bytes in one cycle. The
>old code copies 64 bytes in one cycle. The old code its slower than the
Indeed, I have tested original 3dnow code. It's effective only when data is aligned at least on 8 byte
boundary. Linux works with page aligned data. In Linux source this function is named as fast_pagecopy
but not fast copy. Page have size of 4K and aligned on 4K. You will be able to get more effective memcpy
if you'll want to play with code which will compute necessary shifts for aligning source and dest together. for
/* first align source. Can be skipped for cpus which have prefetch insns */
for(i=0;i<z;i++) dest[i] = src[i];
/* second compute necessary alignment for dest */
dest = &dest[y]
mov src, reg1
mov src, reg2
shld(shrd) reg1, reg2, X
mov reg1, dest
mov reg2, dest
/* Third: do tailes */
>>Here are the averages that I get over 100 runs on my K6-2:
>> With 3DNow: v1-v2=740241.96
>>Without 3DNow: v1-v2=781859.25
>>I'd like to see what numbers the K6-3 gives.
>Did you use a loop and devide by 100? I guess you shouldn't!
>The array of 100000 Bytes fits (at least) into L2-Cache, so you measure
>L2-speed starting with the 2nd iteration. But video data destination will
>probably not be L2, so the (higher) number you get in a first run would be a
>better value for comparisson. Am I right?
No. Moderm cpus have mtrr register which allow perform write-combing writing.
So data is being output and into L2 and into video memory simultaneously.
Best regards! Nick
P.S.: I'm ready to improve SSE part of fastmemcpy. May be I'll send it to mplayer.
Mplayer-users mailing list
Mplayer-users at lists.sourceforge.net
More information about the MPlayer-users