[Ffmpeg-devel] Fixed vs. Floating Point AAC

Thu Mar 9 02:27:46 CET 2006

Michael Niedermayer <michaelni at gmx.at> writes:

> Hi
>
> On Wed, Mar 08, 2006 at 07:38:22PM -0500, Rich Felker wrote:
>> On Thu, Mar 09, 2006 at 12:37:05AM +0100, Michael Niedermayer wrote:
>> > but what about the dynamic range? if all samples are 1.0 (max) then a
>> > dc component would have a value of N^0.5 which for lets say N=1024 would
>> > be 32, so we would need 21 bits, wheres the problem now, 21bits * 21bits=
>> > 42bits and that doesnt fit in 32bits so no fast 32*32->32bit muliplies 
>> 
>> A 32*32 multiply gives a 64bit result. This is fast. If a cpu sucks
>> too much to give the full result, that's the particular platform's
>> problem and users who insist on using a broken cpu arch will have to
>> deal with it being somewhat slower. x86 does it correctly, and has
>> done so ever since the 8088...
>
> the x86 can output the 64bits only in a single register pair
> and needs one of the inputs also to be in a specific register
> this is a nasty restriction which doesnt help the compiler generating fast
> code, and the instruction timings dont look favorably for this either
>
> throughput for 32*32->64 on P4 is 1/8 for 32*32->32 its 1/4.5 and for
> floating point FMUL its 1/2
> iam ignoring latency here but the order is the same
>
> for the athlon the timings arent clear from the docs i have, only that 
> 32*32->32 seems 1/4 and 32*32->64 worse then 1/6 if the high value is used
> and FMUL >=1/4, also note fmul is direct path imul vector path so imul
> cannot excute with anything else together while fmul can

How about doing a 64-bit version for true 64-bit CPUs?  Low-end
embedded systems often run 64-bit MIPS without floating-point.

-- 
M?ns Rullg?rd
mru at inprovide.com