[FFmpeg-devel] looking for comparison of intrinsics vs hand written asm
Michael Niedermayer
michaelni
Fri Oct 30 19:56:13 CET 2009
On Fri, Oct 30, 2009 at 04:34:04PM +0100, Attila Kinali wrote:
> Moin,
>
> I had discussion with some prof about code optimization
> and it came to a point where we disagreed on how big
> the difference between intrinsics and hand written asm
> is. I was quite sure that i've seen somewhere such a comparison,
> done by someone with a clue (ie both intrinsics and asm had
> a good quality), but cannot find anything anymore.
>
> Hence, i'd like to ask if someone has such a comparison.
> If possible also against intrinsics compiled with icc.
i dont have a comparison sadly, but i could think/remember a few things that
could be interresting in that context ...
numbered for easy reference in replies
1. One of the biggest issues i have with intrinsics is that they are
unpredictable. Developers write code and benchmark it after each change
when trying to make it as fast as possible. With asm you know the code
will stay exactly as it is but with intrinsics the code could be
running half the speed with gcc 4.5.6 compared to 4.5.5 its purely
luck that the best you could achive with 4.5.5 has anything to do
with the best for 4.5.6
2. I remember the virtualdub author ranting about intrinsics ages ago
dunno if he did a comparission ...
3. swscales horizontal fast bilinear scaler generates code at runtime
depending on the source / destination width, try this with intrinsics
4. looking at the disassembly of some decoder (no i dont remember which)
i saw mmx code where each instruction had a memory reads prior to it
reading its input arguments and memory writes afterwards to write its
output. I wonder if that was generated from inrinsics by a compiler?
The author probably wasnt aware of it and how should he? the compiler
doesnt tell one if it decides to mess up
5. You could look at gcc bugs about intrinsics and code pessimization
6. accessing globals in PIC code requires indirection through various
tables (and one of course needs a register pointing to these tables),
well it doesnt require indirection really but some document, ELF spec?
requires it so that all globals can be overridden. It can be avoided
with vissibility attributes but this is not trivial in reality, asm
can just access things directly ...
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Why not whip the teacher when the pupil misbehaves? -- Diogenes of Sinope
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20091030/c8f395e0/attachment.pgp>
More information about the ffmpeg-devel
mailing list