[FFmpeg-devel] [PATCH] libavutil: add an FFT & MDCT implementation
Paul B Mahol
onemda at gmail.com
Mon May 6 14:35:23 EEST 2019
On 5/6/19, Lynne <dev at lynne.ee> wrote:
> May 5, 2019, 1:52 PM by dev at lynne.ee:
>
>> May 4, 2019, 10:00 PM by > dev at lynne.ee <mailto:dev at lynne.ee>> :
>>
>>> May 4, 2019, 8:10 PM by > >> michael at niedermayer.cc
>>> <mailto:michael at niedermayer.cc>>> <mailto:>> michael at niedermayer.cc
>>> <mailto:michael at niedermayer.cc>>> >> :
>>>
>>>> On Fri, May 03, 2019 at 09:08:57PM +0200, Lynne wrote:
>>>>
>>>>> This commit adds a new API to libavutil to allow for arbitrary
>>>>> transformations
>>>>> on various types of data.
>>>>>
>>>> breaks build on mips
>>>>
>>>> CC libavutil/fft.o
>>>> src/libavutil/fft.c:47: error: redefinition of typedef ‘AVFFTContext’
>>>> src/libavutil/fft.h:25: note: previous declaration of ‘AVFFTContext’ was
>>>> here
>>>> make: *** [libavutil/fft.o] Error 1
>>>>
>>>> [...]
>>>>
>>>
>>> Fixed, v2 attached. Changes:
>>> -Stride really is in bytes now.
>>> -Corrected some comments (stride supported by all (i)mdcts, not just
>>> compound
>>> ones, some clarifications regarding the scale).
>>>
>>> Also that 28-point FFT comparison was a typo, its 128.
>>>
>>
>> Managed to further optimize the 15-point transform by rewriting it as an
>> exptab-less
>> compound 3x5 transform and embedding its input map into the parent
>> transform's map.
>> Updated comparisons to libfftw3f:
>> 120:
>> 22353 decicycles in fftwf_execute, 1024 runs, 0 skips
>> 21836 decicycles in compound_fft_15x8, 1024 runs, 0 skips
>>
>> 480:
>> 103998 decicycles in fftwf_execute, 1024 runs, 0 skips
>> 102747 decicycles in compound_fft_15x32, 1024 runs, 0 skips
>> 960:
>> 186210 decicycles in fftwf_execute, 1024 runs, 0 skips
>> 215256 decicycles in compound_fft_15x64, 1024 runs, 0 skips
>>
>
> Attached a v4 of the patch which adjusts transform direction by reordering
> the
> coefficients like the power of two transforms do. This allowed for the
> exptabs
> to be computed just once on startup and stored in a global array.
> Didn't even consider it was possible to do so for odd-sized transforms and
> especially for compound 5x3 transforms but after some experimentation I
> found
> the key was to perform the permutation before the second permutation to
> embed the 5x3's input map in.
>
> I don't think there are any more feasible ways to improve the code, short
> of
> having 15 different versions for all power of two transforms by hardcoding
> the output reindexing, so I'd like to get some feedback on the API.
> The old SIMD from lavc is unusable, especially the power of two part,
> so it would be nice to get started on rewriting that soon.
>
API looks fine to me.
More information about the ffmpeg-devel
mailing list