[Libav-user] a little performance/optimisation headbreaker :)
"René J.V. Bertin"
rjvbertin at gmail.com
Fri Feb 15 16:48:13 CET 2013
Thanks, Claudio!
On Feb 15, 2013, at 16:33, Claudio Freire wrote:
> gcc 4.7 is clever enough to generate SSE code by itself. Maybe that's
> what you're experiencing. I guess compiler flags do matter too.
I haven't compiled with -ftree-vectorize (rather, I tried with and without, made no difference), but you're right ... -fno-tree-vectorize gets me back to the 2x faster performance of the hand-coded SSE version. Amazing, I never really saw a lot of benefit to the tree-vectoriser before!
If it wasn't clear, I didn't hand code the SSE version myself, so comparing the versions will be like looking for the relative differences between the works of 2 post-modern art schools ;)
I've run the code through Shark, though, and that showed a clear load difference in disfavour of the SSE version.
> gcc, which tends to inhibit many of its other optimizations. Why don't
> you try gcc's vector primitives instead?
Which ones? As in the few lines with intrinsics for MSVC, which also compile under gcc but shows no speed dis/advantage with gcc ?
BTW, this does beg the question why ffmpeg's build process uses -fno-tree-vectorize ... maybe that's no longer required for today's compilers?
R.
More information about the Libav-user
mailing list