[Ffmpeg-devel] [PATCH] SSE counterpart of ff_imdct_calc_3dn2

Thu Aug 24 07:06:23 CEST 2006

2006/8/24, Rich Felker <dalias at aerifal.cx>:
> On Thu, Aug 24, 2006 at 09:53:05AM +0800, Zuxy Meng wrote:
> > I still insist that intrinsics help produce better code, at least on gcc4.
>
> And I still insist that this statement is fundamentally false. Better
> than what? Whatever code gcc generates with the intrinsics, you can
> always generate the same or better code if you just write it yourself.
> Intrinsics are also gcc4-specific and have the problem that
> performance is subject to the whims (and bugs) of gcc, whose record is
> very bad...
>

Better than hardcoded inline asms when:

1. gcc knows better about what the code does: for example gcc will
insert instructions that modify the loop counter and/or calculate
addresses into intrinsics and helps hide latency. You can only write
the whole loop in asm (quite rare in practice) to achieve this.

2. gcc knows better about where the code will run: gcc optimizes
intrinsics according to different target machines, reorder and replace
some instructions to minimize the latency.

Nevertheless, the performance gap between hardcoded inline asm and
intrinsics is no huge and when sending patches I'll just follow
traditions.

-- 
Zuxy
Beauty is truth,
While truth is beauty.
PGP KeyID: E8555ED6