[FFmpeg-devel] Fwd: Fixpoint FFT optimization, with MDCT and IMDCT wrappers for audio optimization

Loren Merritt lorenm
Mon Aug 27 12:19:16 CEST 2007


On Sun, 26 Aug 2007, Mike Giacomelli wrote:

>> As opposed to recent x86 chips, where 32x32 mul is 9 times slower than add?
>
> Modern x86 chips have pipelined adders and multipliers, so the add and
> multiply rate is the same (at least assuming they have equal numbers
> of each).  I believe Intel has been doing this since the pentium pro
> in the mid 90s, and AMD since the K7 in the late 90s.

But they don't have equal numbers of each.
Sorry, I screwed up my throughput test. mul is only 3x slower.

Both K8 and Core2 have:
add is latency 1, throughput 3.
32x32->32 mul is latency 3, throughput 1.
64x64->64 mul is latency 4, throughput 1.
32x32->64 mul is latency 3, and can't be used pipelined due to its use
of implicit registers.

--Loren Merritt




More information about the ffmpeg-devel mailing list