[FFmpeg-devel] [PATCH] split-radix FFT

Mike . giac2000
Fri Aug 8 05:12:57 CEST 2008

> Date: Tue, 5 Aug 2008 19:51:56 +0200
> From: castet.matthieu at free.fr
> To: ffmpeg-devel at mplayerhq.hu
> Subject: Re: [FFmpeg-devel] [PATCH] split-radix FFT
> Siarhei Siamashka wrote:
>> On Mon, Aug 4, 2008 at 10:27 PM, matthieu castet
>>  wrote:
>>> M?ns Rullg?rd wrote:
>>>> Loren Merritt  writes:
>> Modern arm cores have hardware fpu which is reasonably fast, so it is
>> quite questionable if fixed point decoder would be better for such
>> cores. The same happened for x86 in the past and floating point audio
>> decoders are now better for modern x86 cores.
> Do you have some number for fixed-point vs fpu ?
> I was beveling that arm fpu were quite slow, but may be new hardware are
> better.

Fixed point multiplies on ARM require a 32x32=64 bit multiply, a shift, and then an add.  For targets with pipelined multipliers, this works out to 3 or 4 clock cycles (if you can do a 64 bit mul in one cycle or not).  Fixed point adds obviously take no extra time since renormalization is unneeded.

For targets without pipelined multipliers (ARM7 and some ARM9) the situation is even worse, since a 32x32=64 multiply can take between 1 and 4 cycles of stall (depending on the number of nonzero bits in the second operand to the multiply).  There are tricks to get around some of this delay, mostly by prescaling values to avoid having to renormalize values at runtime, and by doing several consecutive muls without renormalizing, but they're quite difficult to use much of the time without unacceptable rounding error or risk of overflow.  In either case, I would expect an FP version to be much faster, particularly if you can use a vector FPU, although at the additional expense of powering up the FPU.

That said, I'm sort of curious what the application is that requires such high performance for audio.  Modern IMDCT codecs (Ogg, WMA, AAC) can quite easily be made to decode at well under 30MHz on the slowest ARM7 cores.  Cores with a VFPU should likely weigh in around the 20-25MHz mark without undue effort.  Since these devices typically clock in at many hundreds of MHz, I'm not really sure what use single digit MHz savings really is.  I would expect the difference between fixed and floating point (in either battery life or frame rate for video decode rate) to be immeasurably small on high end ARM hardware.
Your PC, mobile phone, and online services work together like never before.

More information about the ffmpeg-devel mailing list