Thank you very much for your reply ! > anyone got a nicer derivation/proof? > or even a faster implementation? A x2 speedup could be achieved by using the SSE2 integer instructions (128bit registers instead of 64bit for MMX), using the very same algorithm. I am working on it at this moment. Thomas