[FFmpeg-devel] [PATCH] lavu/tx: support in-place FFT transforms
Lynne
dev at lynne.ee
Sun Feb 21 18:07:32 EET 2021
Feb 21, 2021, 00:43 by dev at lynne.ee:
> Feb 10, 2021, 21:31 by dev at lynne.ee:
>
>> Feb 10, 2021, 18:15 by dev at lynne.ee:
>>
>>> This commit adds support for in-place FFT transforms. Since our
>>> internal transforms were all in-place anyway, this only changes
>>> the permutation on the input.
>>>
>>> Unfortunately, research papers were of no help here. All focused
>>> on dry hardware implementations, where permutes are free, or on
>>> software implementations where binary bloat is of no concern so
>>> storing dozen times the transforms for each permutation and version
>>> is not considered bad practice.
>>> Still, for a pure C implementation, it's only around 28% slower
>>> than the multi-megabyte FFTW3 in unaligned mode.
>>>
>>> Unlike a closed permutation like with PFA, split-radix FFT bit-reversals
>>> contain multiple NOPs, multiple simple swaps, and a few chained swaps,
>>> so regular single-loop single-state permute loops were not possible.
>>> Instead, we filter out parts of the input indices which are redundant.
>>> This allows for a single branch, and with some clever AVX512 asm,
>>> could possibly be SIMD'd without refactoring.
>>>
>>> The inplace_idx array is guaranteed to never be larger than the
>>> revtab array, and in practice only requires around log2(len) entries.
>>>
>>> The power-of-two MDCTs can be done in-place as well. And it's
>>> possible to eliminate a copy in the compound MDCTs too, however
>>> it'll be slower than doing them out of place, and we'd need to dirty
>>> the input array.
>>>
>>> Patch attached.
>>>
>>
>> Locally added APIchanges and lavu minor bump.
>> And got rid of the unused set temporary variables when permuting.
>>
>
> Will push this tomorrow if there are no objections.
>
Pushed.
More information about the ffmpeg-devel
mailing list