[FFmpeg-devel] [flamefest-start] A little something on MMX/SSE intrinsics
Thu Feb 28 23:53:41 CET 2008
On Thu, Feb 28, 2008 at 10:15:35PM +0100, Luca Barbato wrote:
> Michael Niedermayer wrote:
> > I feel like iam talking against brick walls. The point is that intrinsics
> > are flawed because they are unpredictable, gcc could generate efficient
> > code from them, but it as well can (and does in current versions on x86)
> > generate completely dismal code. This does not go away if gcc becomes better
> > at generating code.
> gcc isn't predictable even at managing asm blocks as we could experience
> with the register constrained architectures... (yes x86 again)
As i said at some other point in the thread i prefer a compilation failure
which i can fix over a silent pessimization of code i do not even know
> > We write asm/intrinsics because gcc did NOT compile the C code to something
> > efficient in at least some cases. Asm is optimized once and will then always
> > be efficient for the cpu class for which it has been optimized. That is its
> > a write once and forget thing. Intrinsics OTOH are at the mercy of the
> > current compiler version and require constant maintaince to ensure that they
> > dont get miscompiled to something inefficient.
> I cannot agree more, in fact having a set of asm routines for G3, G5,
> CELL and pa-semi wound be great. Same would be said for asm for P4 and
> amd64, since they are _quite_ different in the end.
> Sparing some pain and using intrinsics to get quite similar results for
> the whole PPC/PPC64 or x68/x86_64 families wouldn't be bad as starting
If you plan to ever write the asm() your efforts with intrinsics were wasted.
If you dont plan to ever write asm() its of course a different story ...
> > But the key advantage asm() has IMO is that the compiler can NOT second guess
> > what the programmer wanted, it can NOT reorder the instructions behind the
> > programmers back and it can NOT silently put unneeded load+stores between
> > instructions.
> The main issue with intrinsics is that they are more than often ugly and
> do not deliver what they are supposed to, but that's is just an
> implementation detail that could be ironed out with a little cooperation
> between users and developers.
> you can get silent load+store or even better have the whole outer loop
> pessimized due bogus constraints in the asm block...
No you cannot, proper asm looks like:
Which is called through a function pointer. Theres no outer loop which
knows of what is done inside the function.
Also the whole inner loop is all inside a single asm() no way gcc could
mess it up.
> > And code quality standards in ffmpeg are high, writing 5% slower code is
> > plain unacceptable.
> I could say that having the x86 asm routines that happens to work by
> hack on x86_64 are in that range, still better that than plain C, isn't it?
I do not think we have much hacked x86 -> x86_64 code that would be slower
than the equivalent intrinsics on x86_64.
If you find some report it please!
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
I am the wisest man alive, for I know one thing, and that is that I know
nothing. -- Socrates
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: Digital signature
More information about the ffmpeg-devel