[FFmpeg-devel] Fwd: Fixpoint FFT optimization, with MDCT and IMDCT wrappers for audio optimization

Marc Hoffman mmh
Thu Aug 23 04:30:49 CEST 2007


On Aug 22, 2007, at 10:03 PM, Michael Niedermayer wrote:

> Hi
>
> On Wed, Aug 22, 2007 at 08:31:12PM -0400, Marc Hoffman wrote:
>> On 8/21/07, Michael Niedermayer <michaelni at gmx.at> wrote:
>>> Hi
>>>
>>> On Tue, Aug 21, 2007 at 05:02:18PM -0400, Marc Hoffman wrote:
>>>> Hi,
>>>>
>>>> On 8/19/07, Justin Ruggles <justinruggles at bellsouth.net> wrote:
>>>>>
>>>>>                      256    512   1024   2048    4096
>>>>> -------------------------------------------------------
>>>>> (1) fft32           46.1   101.0  205.0  494.4  1169.3
>>>>> (2) fftr2            7.6    16.0   34.1  101.3   258.5
>>>>> (3) fft ffmpeg       7.4    16.4   33.8   82.4   187.7
>>>>> (4) fftr4/2          6.7    13.6   28.2   74.0   169.8
>>>>>
>>>>> (1) fixedpoint 32bit simple RAD2 fft
>>>>> (2) simple rad2 first stage rad4
>>>>> (3) current lavc RAD2, first 2 stages, and middle pulled
>>>>> (4) rad4/2 with last stage rad2 if needed
>>>>
>>>> I'm adding a split radix to the list.  Its a lot harder to optimize
>>>> because of the extra pointer manipulation and control  
>>>> structures.  I
>>>> did manage to unroll the W0 multipliers out but getting a clean
>>>> iteration space has not happened yet.
>>>
>>> could you also add the split radix fft from
>>> liba52-cvs/a52dec/liba52/imdct.c
>>> as comparission
>>>
>>> also djbfft and fftw2 would be interresting
>>> IIRC they all use split radix ffts
>>>
>>> and looking at them might help you with optimizing your code
>>
>> Your right if we do those techniques we would have to implement a
>> different optimized FFT for every possible data size we need. Unless
>> I'm missing something please advise.
>
> each of these implementations needs exactly 4 lines of code (not  
> counting {}
> in liba52)
>
> now all the ffts are n^2 based so if we assume that the largest we  
> want
> to support is 1mb then we would need 80 lines of code max
>
> and if you want to support all sizes up to 2^64 thats still just  
> 256 lines
> and if you use a macro well its just 64+4 lines then
>
> that said iam not saying this aproach should be taken but if you  
> cant find
> a faster and simpler solution then it might be a good choice

Michael,

This is a waste of our time, my judgement is to keep this structure  
regular and simple.  If we make a mess with macros its just difficult  
for us to debug and maintain.  I suggest we use the RAD4/RAD2 fft we  
just developed its a good improvement over the existing code and much  
more regular.  Our bigger issue is to hook the fft through  
DSPContext, and not struggle with the high level FFT we use IMHO.

The thing I have a much better appreciation for thanks to you, is how  
much extra overhead a fixedpoint FFT has over a floating point one on  
a floating point machine.   All the normalization code is just  
expensive, do you have a good idea of how we might move in the  
direction of fixedpoint but still maintain floating point performance  
on floating point machines?

Marc




More information about the ffmpeg-devel mailing list