[FFmpeg-devel] [flamefest-start] A little something on MMX/SSE intrinsics

Michael Niedermayer michaelni
Wed Feb 27 23:23:41 CET 2008

On Wed, Feb 27, 2008 at 09:33:09PM +0000, M?ns Rullg?rd wrote:
> Michael Niedermayer <michaelni at gmx.at> writes:
> > On Wed, Feb 27, 2008 at 03:29:56PM -0500, Alexander Strange wrote:
> >> I don't think anyone can get Altivec asm to work better than
> >> intrinsics on more than one CPU - PPC is really, really
> >> scheduling-sensitive, especially the G5 and Cell.
> >
> > Until i see benchmarks id guess gcc+intrinsics will be slower than
> > unsheduled naively written asm()
> That depends on the CPU.  Some CPUs are quite particular about
> instruction scheduling.

That is true but can gcc schedule instructions properly on these cpus?

Also the real question is can gcc beat a human in instruction scheduling ;)

> >> I guess you can always try, though, but don't do anything to
> >> discourage people who know altivec from adding more. There's still a
> >> lot missing from H.264.
> >
> > Code is either well written or should be rejected.
> > Intrinsics != well written.
> That's where you're wrong.  Code using intrinsics can be well-written.

If the compiler generated optimal code to begin with there would be no need
for asm/intrinsics. OTOH if it does not, using intrinsics is not that smart.
So IMO well written intrinsics is like a well written java program using XML.

> The problem is not the code, but the compiler.
> I agree that if the most commonly used compilers can't compile
> intrinsics properly, plain assembler should be used.  I have no idea
> whether this is the case for Altivec, and neither do you.

I do know that gcc does quite stupid things on x86 be it when compiling C
code or intrinsics. And i know that gcc is generally better at compiling
x86 code than code for other less common architecures. Combining these
does strongly point toward that the gap between intrinsics and asm will
be bigger on ppc than x86 not smaller.
Of course you are correct that i do not strictly "know" it. Its just VERY
Also one can always write asm code that is as fast as intrinsic code, its
not neccessarily possible to write intrinsics code that is as fast as asm.

Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

The worst form of inequality is to try to make unequal things equal.
-- Aristotle
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080227/3c7611d4/attachment.pgp>

More information about the ffmpeg-devel mailing list