[FFmpeg-devel] [flamefest-start] A little something on MMX/SSE intrinsics

Thu Feb 28 23:53:41 CET 2008

On Thu, Feb 28, 2008 at 10:15:35PM +0100, Luca Barbato wrote:
> Michael Niedermayer wrote:
> > 
> > I feel like iam talking against brick walls. The point is that intrinsics
> > are flawed because they are unpredictable, gcc could generate efficient
> > code from them, but it as well can (and does in current versions on x86)
> > generate completely dismal code. This does not go away if gcc becomes better
> > at generating code.
> 
> gcc isn't predictable even at managing asm blocks as we could experience 
> with the register constrained architectures... (yes x86 again)

As i said at some other point in the thread i prefer a compilation failure
which i can fix over a silent pessimization of code i do not even know
about.

> 
> 
> > We write asm/intrinsics because gcc did NOT compile the C code to something
> > efficient in at least some cases. Asm is optimized once and will then always
> > be efficient for the cpu class for which it has been optimized. That is its
> > a write once and forget thing. Intrinsics OTOH are at the mercy of the
> > current compiler version and require constant maintaince to ensure that they
> > dont get miscompiled to something inefficient.
> 
> I cannot agree more, in fact having a set of asm routines for G3, G5, 
> CELL and pa-semi wound be great. Same would be said for asm for P4 and 
> amd64, since they are _quite_ different in the end.
> 
> Sparing some pain and using intrinsics to get quite similar results for 
> the whole PPC/PPC64 or x68/x86_64 families wouldn't be bad as starting 
> point.

If you plan to ever write the asm() your efforts with intrinsics were wasted.
If you dont plan to ever write asm() its of course a different story ...

[...]
> > But the key advantage asm() has IMO is that the compiler can NOT second guess
> > what the programmer wanted, it can NOT reorder the instructions behind the
> > programmers back and it can NOT silently put unneeded load+stores between
> > instructions.
> 
> The main issue with intrinsics is that they are more than often ugly and 
>   do not deliver what they are supposed to, but that's is just an 
> implementation detail that could be ironed out with a little cooperation 
> between users and developers.
> 
> you can get silent load+store or even better have the whole outer loop 
> pessimized due bogus constraints in the asm block...

No you cannot, proper asm looks like:

function_mmx(){
    asm(
        ...
    );
}

Which is called through a function pointer. Theres no outer loop which
knows of what is done inside the function.
Also the whole inner loop is all inside a single asm() no way gcc could
mess it up.

[...]
> > And code quality standards in ffmpeg are high, writing 5% slower code is
> > plain unacceptable.
> 
> I could say that having the x86 asm routines that happens to work by 
> hack on x86_64 are in that range, still better that than plain C, isn't it?

I do not think we have much hacked x86 -> x86_64 code that would be slower
than the equivalent intrinsics on x86_64.
If you find some report it please!

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

I am the wisest man alive, for I know one thing, and that is that I know
nothing. -- Socrates
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080228/ec12cce0/attachment.pgp>