[FFmpeg-devel] [flamefest-start] A little something on MMX/SSE intrinsics

Michael Niedermayer michaelni
Thu Feb 28 12:19:08 CET 2008

On Thu, Feb 28, 2008 at 09:18:32AM +0100, Thorsten Jordan wrote:
> M?ns Rullg?rd schrieb:
> > Michael Niedermayer <michaelni at gmx.at> writes:
> > 
> >> On Wed, Feb 27, 2008 at 09:33:09PM +0000, M?ns Rullg?rd wrote:
> >>> Michael Niedermayer <michaelni at gmx.at> writes:
> [...]
> >> Also one can always write asm code that is as fast as intrinsic
> >> code, its not neccessarily possible to write intrinsics code that is
> >> as fast as asm.
> > 
> > One can write assembler that is as fast as intrinsics for *one* CPU
> > variant.  Even a moderately clever compiler may well compile the
> > intrinsics into code outperforming code that was hand-tuned for the
> > wrong CPU.
> [...]
> > 
> And not to forget: x86-64 has the double number of SSE registers, so the
> compiler can in theory make use of that and generate faster code for
> x86-64 using the same source as for x86-32. And in fact it did happen
> (once used SSE-intrinsics in a project for x86-32 and x86-64).
> Plain asm was 5% faster than intrinsics on x86-32, but on x86-64 the
> speed was just amazing, i would have needed to rewrite the plain asm for
> that, but got it for free out of gcc (4.1 that was). The particular code
> was rather long, so it saved tons of coding and debugging time...

If a 5% speed loss and having random loops silently messed up (depending on
gcc version) is acceptable to you ...
It certainly is not for ffmpeg and neither for many other projects.
Also for serious projects speed loss when upgrading gcc would have to be
debugged and fixed, this adds considerable maintaince time. Which IMO
outweights the time one might spend reassigning registers for x86-64 by
hand from x86-32 code.

> Maybe gcc developers could optimize intrinsic generator code better if
> it would get used more often - chicken & egg?

No, if you would have read the posts of the last 2 days as well as the stuff
linked, you would know
that all compilers are generating crappy code from intrinsics. Also
there is no lack of examples where gcc messes up, its not as if gcc would
optimize all existing code correctly and that new code would be needed.
It just seems some people like you plain dont want to accept that intrinsics
are fundamently flawed.

Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

I am the wisest man alive, for I know one thing, and that is that I know
nothing. -- Socrates
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080228/128c8b4f/attachment.pgp>

More information about the ffmpeg-devel mailing list