[FFmpeg-devel] [flamefest-start] A little something on MMX/SSE intrinsics
Thu Feb 28 09:21:48 CET 2008
2008/2/28, Thorsten Jordan <tjordan at macrosystem.de>:
> M?ns Rullg?rd schrieb:
> > Michael Niedermayer <michaelni at gmx.at> writes:
> >> On Wed, Feb 27, 2008 at 09:33:09PM +0000, M?ns Rullg?rd wrote:
> >>> Michael Niedermayer <michaelni at gmx.at> writes:
> >> Also one can always write asm code that is as fast as intrinsic
> >> code, its not neccessarily possible to write intrinsics code that is
> >> as fast as asm.
> > One can write assembler that is as fast as intrinsics for *one* CPU
> > variant. Even a moderately clever compiler may well compile the
> > intrinsics into code outperforming code that was hand-tuned for the
> > wrong CPU.
> And not to forget: x86-64 has the double number of SSE registers, so the
> compiler can in theory make use of that and generate faster code for
> x86-64 using the same source as for x86-32. And in fact it did happen
> (once used SSE-intrinsics in a project for x86-32 and x86-64).
> Plain asm was 5% faster than intrinsics on x86-32, but on x86-64 the
> speed was just amazing, i would have needed to rewrite the plain asm for
> that, but got it for free out of gcc (4.1 that was). The particular code
> was rather long, so it saved tons of coding and debugging time...
And IA-64 understands MMX and SSE intrinsics too although i've no idea
if ffmpeg has ever been run on that platform.
> Maybe gcc developers could optimize intrinsic generator code better if
> it would get used more often - chicken & egg?
I hope so badly :-) Anybody ever checked how icc performed on intrinsics?
Beauty is truth,
While truth is beauty.
PGP KeyID: E8555ED6
More information about the ffmpeg-devel