[Ffmpeg-devel] Fixed vs. Floating Point AAC

Thu Mar 9 02:21:09 CET 2006

Hi

On Wed, Mar 08, 2006 at 07:38:22PM -0500, Rich Felker wrote:
> On Thu, Mar 09, 2006 at 12:37:05AM +0100, Michael Niedermayer wrote:
> > but what about the dynamic range? if all samples are 1.0 (max) then a
> > dc component would have a value of N^0.5 which for lets say N=1024 would
> > be 32, so we would need 21 bits, wheres the problem now, 21bits * 21bits=
> > 42bits and that doesnt fit in 32bits so no fast 32*32->32bit muliplies 
> 
> A 32*32 multiply gives a 64bit result. This is fast. If a cpu sucks
> too much to give the full result, that's the particular platform's
> problem and users who insist on using a broken cpu arch will have to
> deal with it being somewhat slower. x86 does it correctly, and has
> done so ever since the 8088...

the x86 can output the 64bits only in a single register pair
and needs one of the inputs also to be in a specific register
this is a nasty restriction which doesnt help the compiler generating fast
code, and the instruction timings dont look favorably for this either

throughput for 32*32->64 on P4 is 1/8 for 32*32->32 its 1/4.5 and for
floating point FMUL its 1/2
iam ignoring latency here but the order is the same

for the athlon the timings arent clear from the docs i have, only that 
32*32->32 seems 1/4 and 32*32->64 worse then 1/6 if the high value is used
and FMUL >=1/4, also note fmul is direct path imul vector path so imul
cannot excute with anything else together while fmul can

so i think i provided enough "proof", your only argument seems that low
prcission integer tremor is faster then libvorbis, now AFAIK these are
2 different implemenattions, i dont see how a comparission between them has any
meaning, i can also compare libavcodecs mp3 decoder which uses integers
mostly against the one in mplayer which is mostly floats, you know
which is faster ...

[...]
-- 
Michael