[FFmpeg-devel] [PATCH 2/2] x86/tx_float: implement inverse MDCT AVX2 assembly
Lynne
dev at lynne.ee
Sun Sep 4 00:35:24 EEST 2022
Sep 3, 2022, 22:55 by michael at niedermayer.cc:
> On Sat, Sep 03, 2022 at 03:42:36AM +0200, Lynne wrote:
>
>> This commit implements an iMDCT in pure assembly.
>>
>> This is capable of processing any mod-8 transforms, rather than just
>> power of two, but since power of two is all we have assembly for
>> currently, that's what's supported.
>> It would really benefit if we could somehow use the C code to decide
>> which function to jump into, but exposing function labels from assebly
>> into C is anything but easy.
>> The post-transform loop could probably be improved.
>>
>> This was somewhat annoying to write, as we must support arbitrary
>> strides during runtime. There's a fast branch for stride == 4 bytes
>> and a slower one which uses vgatherdps.
>>
>> Zen 3 benchmarks for stride == 4 for old (av_imdct_half) vs new (av_tx):
>>
>> 128pt:
>> 2811 decicycles in av_tx (imdct),16775916 runs, 1300 skips
>> 3082 decicycles in av_imdct_half,16776751 runs, 465 skips
>>
>> 256pt:
>> 4920 decicycles in av_tx (imdct),16775820 runs, 1396 skips
>> 5378 decicycles in av_imdct_half,16776411 runs, 805 skips
>>
>> 512pt:
>> 9668 decicycles in av_tx (imdct),16775774 runs, 1442 skips
>> 10626 decicycles in av_imdct_half,16775647 runs, 1569 skips
>>
>> 1024pt:
>> 19812 decicycles in av_tx (imdct),16777144 runs, 72 skips
>> 23036 decicycles in av_imdct_half,16777167 runs, 49 skips
>>
>> Patch attached.
>>
>
> x86-32 doesnt digest this very well
>
Thanks for checking, ifdef'd it out of 32bit compiles, also fixed
a small issue with asm functions being picked for non-asm calls.
Attached.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-x86-tx_float-add-support-for-calling-assembly-functi.patch
Type: text/x-diff
Size: 14023 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20220903/9b5a0c75/attachment.patch>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-x86-tx_float-implement-inverse-MDCT-AVX2-assembly.patch
Type: text/x-diff
Size: 12620 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20220903/9b5a0c75/attachment-0001.patch>
More information about the ffmpeg-devel
mailing list