[FFmpeg-devel] [PATCH] split-radix FFT

Siarhei Siamashka siarhei.siamashka
Mon Aug 4 10:27:27 CEST 2008


On Sun, Aug 3, 2008 at 11:25 PM, M?ns Rullg?rd <mans at mansr.com> wrote:
> Loren Merritt <lorenm at u.washington.edu> writes:
>
>> $subject, vaguely based on djbfft.
>> Changed from djb:
>> * added simd.
>> * removed the hand-scheduled pentium-pro code. gcc's output from
>> simple C is better on all cpus I have access to.
>> * removed the distinction between fft and ifft. they're just
>> permutations of eachother, so the difference belongs in revtab[] and
>> not in the code.
>> * removed the distinction between pass() and pass_big(). C can always
>> use the memory-efficient version, and simd never does because the
>> shuffles are too costly.
>> * made an entirely different pass_big(), to avoid store->load aliasing.
>
> Any progress on this?  IMDCT is taking 84% time decoding Vorbis on
> ARM, and SIMD-optimising the FFT in svn seems silly.

As far as I understand it, the work is already done, at least the
first stage of it. SIMD optimizations are not perfect yet and can be
improved, but they already provide performance improvement over the
current code in SVN. But with the overly perfectionistic attitude, old
slow FFT can stay in SVN very long ;)

Still I would like to have a look at that tangent FFT variant even if
it could not provide a performance improvement on x86, maybe still
something can be done to speed it up or performance could be better on
other platforms. Also as it has been already discussed in this mailing
list long ago, MPlayer SVN also has djbfft based split-radix FFT
implementation for the sizes 64 and 128 in liba52, with SIMD
optimizations for 3dnow. It is GPL licensed, so don't know if it is a
good idea for Loren to have a close look at it and become 'tainted',
but it would be nice to do some benchmarks. Fast 64 point FFT is also
useful for decoding typical vorbis audio files.

Also I wonder if it would be possible/difficult to implement
not-power-of-two FFT using the same djbfft based code layout (for
example 320 point FFT which is needed for some other codecs IIRC). If
non-power-of-two FFT would be supported, some change to API would be
required to address this.



More information about the ffmpeg-devel mailing list