[FFmpeg-devel] Inline ASM vs. Intrinsics
Fri May 11 09:25:38 CEST 2007
On 5/11/07, Zuxy Meng <zuxy.meng at gmail.com> wrote:
> 2007/5/9, Thorsten Jordan <tjordan at macrosystem.de>:
> > Hello developers,
> > i am wondering what do you think about the MMX/SSE intrinsics that can
> > be used at least with gcc (and maybe also with intels compiler).
Yes, intel compiler also support MMX/SSE intrinsics (not to confuse
with build-ins introduced by GCC)
> > My question is if they are not used because of performance or if they
> > are a big NoNo because of some other reason.
> > I know that by using inline asm one has most control over what is going
> > on. However with intrinsics the code is sometimes shorter and easier to
> > read,
That's true for Altivec intrinsics, but x86 intrinsics are really
horrible IMHO. It codes the type of data in the intrinsic name rather
than by typing vectors.
That means that with Altivec, you have vec_add() and vec_adds() to
respectively do vector add, and vector saturated add, and on x86,
you'd have _mm_add8(), _mm_add16(), _mm_add32(), _mm_add64(),
_mm_adds8(), _mm_adds16(), , _mm_adds32(), _mm_adds64().
I think that this certainly isn't more readable, and that it's rather
ugly to have a "typeless" extension to a C language, which is a
strongly typed language.
Off course, when you have an SIMD ISA that evolves with each new CPU
model, you have a harder time to do things clean like with Altivec
> although recent compilers are rather good in code generation. The
> > intrinsics have one big advantage: you can use the very same code f?r
> > x86-32 and x86-64 and on the latter the 8 extra registers are used
> > automatically.
> I agree, and since gcc knows more about what intrinsics do than inline
> assembly, gcc may optimize better with different march/mtune settings.
> However, gcc's register allocation algorithm sometimes does stupid
> things, spilling registers when unnecessary. So, even if you write
> everything in intrinsics, you should use 'gcc -S' to double check.
I've experimented a bit with ICC-9.1 (not with GCC though), and
analysed the quality of the code generation. I'm pleased to say that
it generates really good code in general, but in some cases, it does
some stupid things that a human who has a tiny bit of ASM expertise
would never write.
But in general, ICC did a really good job at generating code out of intrinsincs.
I don't know about GCC, but I read a paper some month ago where the
bleeding edge versions of GCC were able to beath ICC on syntetic
benchmarks. I expect that on code that has a rather large data set,
GCC will screw up its register allocation, where ICC should do better.
Rich, you're forgetting one thing here: *everybody* except you is
More information about the ffmpeg-devel