[FFmpeg-devel] [flamefest-start] A little something on MMX/SSE intrinsics

Fri Feb 29 07:35:58 CET 2008

Michael Niedermayer wrote:
>> gcc isn't predictable even at managing asm blocks as we could experience 
>> with the register constrained architectures... (yes x86 again)
> 
> As i said at some other point in the thread i prefer a compilation failure
> which i can fix over a silent pessimization of code i do not even know
> about.
> 

That is good till you don't care if the compilation failure happens 
because of PIC or other features you may not deem worth attention (see 
the issue with flac routines)

>> Sparing some pain and using intrinsics to get quite similar results for 
>> the whole PPC/PPC64 or x68/x86_64 families wouldn't be bad as starting 
>> point.
> 
> If you plan to ever write the asm() your efforts with intrinsics were wasted.
> If you dont plan to ever write asm() its of course a different story ...

NO, I can spend less time in getting a loop vectorized using intrinsics 
and it is fine even for architecture I cannot touch right now, once the 
logic is sound I could spend some time to tune it by hand and see if I'm 
better than gcc. Still you are right, we should try and benchmark and 
see how much we lose/gain by different approaches instead of arguing 
that much.

> No you cannot, proper asm looks like:
> 
> function_mmx(){
>     asm(
>         ...
>     );
> }

I _think_ that won't change the fact gcc may do something dumb like 
reg->memory->reg depending on the constraints and how bad is the arch 
ABI in use is.

> Which is called through a function pointer. Theres no outer loop which
> knows of what is done inside the function.
> Also the whole inner loop is all inside a single asm() no way gcc could
> mess it up.

> 
> 
> [...]
>>> And code quality standards in ffmpeg are high, writing 5% slower code is
>>> plain unacceptable.
>> I could say that having the x86 asm routines that happens to work by 
>> hack on x86_64 are in that range, still better that than plain C, isn't it?
> 
> I do not think we have much hacked x86 -> x86_64 code that would be slower
> than the equivalent intrinsics on x86_64.
> If you find some report it please!

Probably I'll get an x86_64 sooner than I'd like to and I'll check by 
myself, otherwise do you agree with my proposal about setting 
qualification tasks about benchmarking and comparing a little more?

lu

-- 

Luca Barbato
Gentoo Council Member
Gentoo/linux Gentoo/PPC
http://dev.gentoo.org/~lu_zero