On Tue, 1 Dec 2015 19:27:49 -0500, Ganesh Ajjanagadde wrote: > The slowness comes from the generate code, e.g llrint compiles down to > a single asm instruction cvttsd2si, while floor(x + 0.5) needs to do > multiple things. How much slower is it?