[FFmpeg-devel] [PATCH] libavutil: add an FFT & MDCT implementation

Mon May 6 03:23:26 EEST 2019

May 5, 2019, 1:52 PM by dev at lynne.ee:

> May 4, 2019, 10:00 PM by > dev at lynne.ee <mailto:dev at lynne.ee>> :
>
>> May 4, 2019, 8:10 PM by > >> michael at niedermayer.cc <mailto:michael at niedermayer.cc>>>  <mailto:>> michael at niedermayer.cc <mailto:michael at niedermayer.cc>>> >> :
>>
>>> On Fri, May 03, 2019 at 09:08:57PM +0200, Lynne wrote:
>>>
>>>> This commit adds a new API to libavutil to allow for arbitrary transformations
>>>> on various types of data.
>>>>
>>> breaks build on mips
>>>
>>> CC	libavutil/fft.o
>>> src/libavutil/fft.c:47: error: redefinition of typedef ‘AVFFTContext’
>>> src/libavutil/fft.h:25: note: previous declaration of ‘AVFFTContext’ was here
>>> make: *** [libavutil/fft.o] Error 1
>>>
>>> [...]
>>>
>>
>> Fixed, v2 attached. Changes:
>> -Stride really is in bytes now.
>> -Corrected some comments (stride supported by all (i)mdcts, not just compound
>>  ones, some clarifications regarding the scale).
>>
>> Also that 28-point FFT comparison was a typo, its 128.
>>
>
> Managed to further optimize the 15-point transform by rewriting it as an exptab-less
> compound 3x5 transform and embedding its input map into the parent transform's map.
> Updated comparisons to libfftw3f:
> 120:
>   22353 decicycles in     fftwf_execute,     1024 runs,      0 skips
>   21836 decicycles in compound_fft_15x8,     1024 runs,      0 skips
>
> 480:
>   103998 decicycles in       fftwf_execute,    1024 runs,      0 skips
>   102747 decicycles in compound_fft_15x32,    1024 runs,      0 skips
> 960:
>   186210 decicycles in      fftwf_execute,    1024 runs,      0 skips
>   215256 decicycles in compound_fft_15x64,    1024 runs,      0 skips
>

Attached a v4 of the patch which adjusts transform direction by reordering the
coefficients like the power of two transforms do. This allowed for the exptabs
to be computed just once on startup and stored in a global array.
Didn't even consider it was possible to do so for odd-sized transforms and
especially for compound 5x3 transforms but after some experimentation I found
the key was to perform the permutation before the second permutation to
embed the 5x3's input map in.

I don't think there are any more feasible ways to improve the code, short of
having 15 different versions for all power of two transforms by hardcoding
the output reindexing, so I'd like to get some feedback on the API.
The old SIMD from lavc is unusable, especially the power of two part,
so it would be nice to get started on rewriting that soon.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-libavutil-add-an-FFT-MDCT-implementation.patch
Type: text/x-diff
Size: 39732 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20190506/02127c4b/attachment.patch>