[Mplayer-users] [mplayer PATCH] fastmemcpy alignment for any cpu
nickols_k at mail.ru
Tue Apr 24 11:55:35 CEST 2001
On Mon, 23 Apr 2001 16:33:04 +0200, Felix Buenemann wrote:
Sorry for delay - I'm busy in commercial business ;)
>Well I've tested new code and it has the adverse effect on PIII, MMX2
>intructions slow down a lot by aligning, even more if you align to 16byte for
>mmx2 instead of 8byte.
>Please see attached perlbench log.
It seems that tests which are performed by you are very similar to your first benchmarks when there was no
difference between optimized and non optimized memcpy ;)
First: it can not be because it's opposite to principles of cpu design. The fact that sse code does not cause
any exceptions on your P3 shows that data is aligned on 16-byte boundary and logic of the code is correct.
(Finely I have found out that MOVNTPS causes exception on P3 when dest is not aligned on 16-byte
boundary and only on P4 it works without exceptions).
Second: In results which you sent me range of values is too big. Probably you perform tests on your old
Linux-2.2.x kernel. To get realistic results imho you should to use Linux-2.4 and descrease number of running
process down to minimum.
IMHO this code must speedup fastmemcpy anyway.
Best regards! Nick
Mplayer-users mailing list
Mplayer-users at lists.sourceforge.net
More information about the MPlayer-users