[FFmpeg-devel] [flamefest-start] A little something on MMX/SSE intrinsics

Uoti Urpala uoti.urpala
Thu Feb 28 21:59:18 CET 2008

On Thu, 2008-02-28 at 20:37 +0100, Michael Niedermayer wrote:
> We write asm/intrinsics because gcc did NOT compile the C code to something
> efficient in at least some cases. Asm is optimized once and will then always
> be efficient for the cpu class for which it has been optimized.

It may be efficient for one CPU class if you got it exactly right. It
may not be close to optimal for slightly different CPUs.

>  That is its a write once and forget thing.

Or write once and improve later unless you got it right on the first try
(less likely to be done right than intrinsics). Or if someone tests it
on a different or new CPU.

>  Intrinsics OTOH are at the mercy of the
> current compiler version and require constant maintaince to ensure that they
> dont get miscompiled to something inefficient.

As is most of the C code.

> But the key advantage asm() has IMO is that the compiler can NOT second guess
> what the programmer wanted, it can NOT reorder the instructions behind the
> programmers back and it can NOT silently put unneeded load+stores between
> instructions.
> Its a fundamental difference, not something which will go away as gcc becomes
> better at compiling intrinsics (if that ever will happen ...).

There's a reason why we code most things in C, not asm. Even if you need
to help the compiler by using intrinsics that doesn't mean you should go
to as low a level of programming as possible. Handwritten asm is
something that should be used if you can't get the effect any other way,
not something to be preferred as "fundamentally more reliable".

> As far as i can see the only people supporting intrinsics either
> A. cant code asm
> B. never properly compared asm and intrinsics

Or C. given enough time to write everything in asm can do more
productive things during that time instead (such as optimizing C code,
converting more C to use intrinsics, fixing bugs, or adding new

> If iam wrong, please show me an example with altivec asm which you hand
> tuned (instructions optimally selcted and ordered by hand based on read and
> understood datasheets for the target cpu and the final instruction ordering
> selected by benchmark trial and error) and benchmark results against the
> equivalent intrinsic code.

This comparison is fundamentally flawed. You'd be comparing intrinsics
with code that took excessive effort to write, something that tries to
be perfect no matter what the cost. That is not the way to develop a
practical program. There is a lot in FFmpeg that is obviously far from
perfect, both in areas of performance and features. Development efforts
are best directed in areas where you can achieve the most with the least
effort. The right comparison is whether the effort to convert intrinsics
to asm could achieve more benefit than spending equal effort to improve
any alternative area.

> It seems our disagreement is not about intrinsics vs. asm being better but
> about the minimum quality and performance of the code. 5% speedloss is not
> acceptable! Even much smaller speedlosses need some justification.
> Yes asm is harder to write, but for that you get 5% more speed.
> And code quality standards in ffmpeg are high, writing 5% slower code is
> plain unacceptable.

You're kidding yourself if you think you're not accepting a 5% speedloss
in many features even on x86. I wonder if there's any nontrivial feature
in FFmpeg that IS within 5% of optimal...

More information about the ffmpeg-devel mailing list