[Mplayer-users] [mplayer PATCH] fastmemcpy alignment for any cpu

Nick Kurshev nickols_k at mail.ru
Tue Apr 24 11:55:35 CEST 2001

Hello, Felix!

On Mon, 23 Apr 2001 16:33:04 +0200, Felix Buenemann wrote:

Sorry for delay - I'm busy in commercial business ;)

>Well I've tested new code and it has the adverse effect on PIII, MMX2 
>intructions slow down a lot by aligning, even more if you align to 16byte for 
>mmx2 instead of 8byte.
>Please see attached perlbench log.
It seems that tests which are performed by you are very similar to your first benchmarks when there was no 
difference between optimized and non optimized memcpy ;)
First: it can not be because it's opposite to principles of cpu design. The fact that sse code does not cause 
any exceptions on your P3 shows that data is aligned on 16-byte boundary and logic of the code is correct.
(Finely I have found out that MOVNTPS causes exception on P3 when dest is not aligned on 16-byte 
boundary and only on P4 it works without exceptions).
Second: In results which you sent me range of values is too big. Probably you perform tests on your old 
Linux-2.2.x kernel. To get realistic results imho you should to use Linux-2.4 and descrease number of running 
process down to minimum.
IMHO this code must speedup fastmemcpy anyway.

Best regards! Nick

Mplayer-users mailing list
Mplayer-users at lists.sourceforge.net

More information about the MPlayer-users mailing list