[FFmpeg-devel] [flamefest-start] A little something on MMX/SSE intrinsics
Fri Feb 29 07:35:58 CET 2008
Michael Niedermayer wrote:
>> gcc isn't predictable even at managing asm blocks as we could experience
>> with the register constrained architectures... (yes x86 again)
> As i said at some other point in the thread i prefer a compilation failure
> which i can fix over a silent pessimization of code i do not even know
That is good till you don't care if the compilation failure happens
because of PIC or other features you may not deem worth attention (see
the issue with flac routines)
>> Sparing some pain and using intrinsics to get quite similar results for
>> the whole PPC/PPC64 or x68/x86_64 families wouldn't be bad as starting
> If you plan to ever write the asm() your efforts with intrinsics were wasted.
> If you dont plan to ever write asm() its of course a different story ...
NO, I can spend less time in getting a loop vectorized using intrinsics
and it is fine even for architecture I cannot touch right now, once the
logic is sound I could spend some time to tune it by hand and see if I'm
better than gcc. Still you are right, we should try and benchmark and
see how much we lose/gain by different approaches instead of arguing
> No you cannot, proper asm looks like:
I _think_ that won't change the fact gcc may do something dumb like
reg->memory->reg depending on the constraints and how bad is the arch
ABI in use is.
> Which is called through a function pointer. Theres no outer loop which
> knows of what is done inside the function.
> Also the whole inner loop is all inside a single asm() no way gcc could
> mess it up.
>>> And code quality standards in ffmpeg are high, writing 5% slower code is
>>> plain unacceptable.
>> I could say that having the x86 asm routines that happens to work by
>> hack on x86_64 are in that range, still better that than plain C, isn't it?
> I do not think we have much hacked x86 -> x86_64 code that would be slower
> than the equivalent intrinsics on x86_64.
> If you find some report it please!
Probably I'll get an x86_64 sooner than I'd like to and I'll check by
myself, otherwise do you agree with my proposal about setting
qualification tasks about benchmarking and comparing a little more?
Gentoo Council Member
More information about the ffmpeg-devel