[FFmpeg-devel] [flamefest-start] A little something on MMX/SSE intrinsics

Michael Niedermayer michaelni
Thu Feb 28 23:37:17 CET 2008

On Thu, Feb 28, 2008 at 10:59:18PM +0200, Uoti Urpala wrote:
> On Thu, 2008-02-28 at 20:37 +0100, Michael Niedermayer wrote:
> > We write asm/intrinsics because gcc did NOT compile the C code to something
> > efficient in at least some cases. Asm is optimized once and will then always
> > be efficient for the cpu class for which it has been optimized.
> It may be efficient for one CPU class if you got it exactly right. 

It is efficient if its exactly right. And this doesnt mean that its not
efficient if its not exactly right.

> It
> may not be close to optimal for slightly different CPUs.

The emphasis is on "may" here.
Code which is optimal for one cpu class will not contain unneeded
instructions like random load/stores. Gcc does generate such code sometimes.

> > But the key advantage asm() has IMO is that the compiler can NOT second guess
> > what the programmer wanted, it can NOT reorder the instructions behind the
> > programmers back and it can NOT silently put unneeded load+stores between
> > instructions.
> > Its a fundamental difference, not something which will go away as gcc becomes
> > better at compiling intrinsics (if that ever will happen ...).
> There's a reason why we code most things in C, not asm. Even if you need
> to help the compiler by using intrinsics that doesn't mean you should go
> to as low a level of programming as possible. 

True in principle.

> Handwritten asm is
> something that should be used if you can't get the effect any other way,
> not something to be preferred as "fundamentally more reliable".

The problem here is that trying intrinsics for every function you want to
write in asm takes time, and if you end up changing most of them to asm
later it was wasted time.

> > As far as i can see the only people supporting intrinsics either
> > A. cant code asm
> > B. never properly compared asm and intrinsics
> Or C. given enough time to write everything in asm can do more
> productive things during that time instead (such as optimizing C code,
> converting more C to use intrinsics, fixing bugs, or adding new
> features).

Noone stops people from optimizing C code, fixing bugs and adding new
features if they prefer these over asm coding.
Also even your patches used asm and not intrinsics in the past are you
arguing that you choose the inefficient approuch?

> > If iam wrong, please show me an example with altivec asm which you hand
> > tuned (instructions optimally selcted and ordered by hand based on read and
> > understood datasheets for the target cpu and the final instruction ordering
> > selected by benchmark trial and error) and benchmark results against the
> > equivalent intrinsic code.
> This comparison is fundamentally flawed. You'd be comparing intrinsics
> with code that took excessive effort to write, something that tries to
> be perfect no matter what the cost. That is not the way to develop a
> practical program. 

Truth is, we do have some, and actually not that little asm code that was
developed in such a way. And all that code is AFAIK still near optimal even
though the CPUs have changed quite a bit since it was written ...
Also i remember kabi did spend some time optimizing our MC functions and
IIRC he tested and optimized them against various cpus.
If one wants to have the fastest program one has to spend the time to
optimize the code. If one doesnt care one of course can use intrinsics.

> There is a lot in FFmpeg that is obviously far from
> perfect, both in areas of performance and features. Development efforts
> are best directed in areas where you can achieve the most with the least
> effort. The right comparison is whether the effort to convert intrinsics
> to asm could achieve more benefit than spending equal effort to improve
> any alternative area.

I agree here, but one also must consider that such comparission is hard to
do in practice and it could easily take longer to awnser the question of
"which way to go" than to go both to the end and back.

> > It seems our disagreement is not about intrinsics vs. asm being better but
> > about the minimum quality and performance of the code. 5% speedloss is not
> > acceptable! Even much smaller speedlosses need some justification.
> > Yes asm is harder to write, but for that you get 5% more speed.
> > And code quality standards in ffmpeg are high, writing 5% slower code is
> > plain unacceptable.
> You're kidding yourself if you think you're not accepting a 5% speedloss
> in many features even on x86. I wonder if there's any nontrivial feature
> in FFmpeg that IS within 5% of optimal...

You are misunderstanding me, I do not accept something if I think (or know)
that it can be done 5% faster.
The things you speak about are >5% away from a global optimum which we do
not know where it is nor how to reach it. The difference is that first is
constructive (this is bad do that as its 5% faster) vs. the second being
destructive (this is bad its rejected and we dont know how to improve it).

Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Republics decline into democracies and democracies degenerate into
despotisms. -- Aristotle
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080228/ffa40f02/attachment.pgp>

More information about the ffmpeg-devel mailing list