[FFmpeg-devel] looking for comparison of intrinsics vs hand written asm

Michael Niedermayer michaelni
Fri Oct 30 19:56:13 CET 2009


On Fri, Oct 30, 2009 at 04:34:04PM +0100, Attila Kinali wrote:
> Moin,
> 
> I had discussion with some prof about code optimization
> and it came to a point where we disagreed on how big
> the difference between intrinsics and hand written asm
> is. I was quite sure that i've seen somewhere such a comparison,
> done by someone with a clue (ie both intrinsics and asm had
> a good quality), but cannot find anything anymore.
> 
> Hence, i'd like to ask if someone has such a comparison.
> If possible also against intrinsics compiled with icc.

i dont have a comparison sadly, but i could think/remember a few things that
could be interresting in that context ...
numbered for easy reference in replies

1. One of the biggest issues i have with intrinsics is that they are
   unpredictable. Developers write code and benchmark it after each change
   when trying to make it as fast as possible. With asm you know the code
   will stay exactly as it is but with intrinsics the code could be
   running half the speed with gcc 4.5.6 compared to 4.5.5 its purely
   luck that the best you could achive with 4.5.5 has anything to do
   with the best for 4.5.6
2. I remember the virtualdub author ranting about intrinsics ages ago
   dunno if he did a comparission ...
3. swscales horizontal fast bilinear scaler generates code at runtime
   depending on the source / destination width, try this with intrinsics
4. looking at the disassembly of some decoder (no i dont remember which)
   i saw mmx code where each instruction had a memory reads prior to it
   reading its input arguments and memory writes afterwards to write its
   output. I wonder if that was generated from inrinsics by a compiler?
   The author probably wasnt aware of it and how should he? the compiler
   doesnt tell one if it decides to mess up
5. You could look at gcc bugs about intrinsics and code pessimization
6. accessing globals in PIC code requires indirection through various
   tables (and one of course needs a register pointing to these tables),
   well it doesnt require indirection really but some document, ELF spec?
   requires it so that all globals can be overridden. It can be avoided
   with vissibility attributes but this is not trivial in reality, asm
   can just access things directly ...


[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Why not whip the teacher when the pupil misbehaves? -- Diogenes of Sinope
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20091030/c8f395e0/attachment.pgp>



More information about the ffmpeg-devel mailing list