[Ffmpeg-devel] [PATCH] SSE counterpart of ff_imdct_calc_3dn2
Thu Aug 24 19:28:33 CEST 2006
On Thu, Aug 24, 2006 at 07:47:12PM +0300, Uoti Urpala wrote:
> On Thu, 2006-08-24 at 12:15 -0400, Rich Felker wrote:
> > > Also, ICC is able to process these intrinsics, whereas it has a hard
> > > time with inline asm.
> > Supporting ICC would be nice, but you can always compile with asm
> > disabled.. Any viable compiler for high-performance needs to have full
> > inline asm available, not just a limited set of intrinsics for vector
> > ops.
> Not necessarily, and certainly not gcc-compatible inline asm. How many
> asm routines are there in FFmpeg or MPlayer that could not achieve
> comparable speed with intrinsics only?
s/comparable/same or better/. 1-5% slowdown is not acceptable. And
with this correction I suspect the answer is _NONE_.
> > > Rich, you should really consider that some ppl aren't willing to spend
> > > their youth on writting killer hand tuned asm code.
> > It takes maybe 5-10 minutes more to write the obvious handwritten asm
> > than to write the code with intrinsics, and performance should be same
> > or better.
> It takes much more at least if you don't already have a lot of
> experience writing general asm. If you don't do much asm programming
> otherwise practicing it just for FFmpeg/MPlayer usage doesn't pay off.
If you don't have this experience you're probably not qualified for
performance coding anyway.
More information about the ffmpeg-devel