[FFmpeg-devel] [PATCH v2] x86/tx_float: implement inverse MDCT AVX2 assembly
Lynne
dev at lynne.ee
Fri Sep 2 08:49:31 EEST 2022
Version 2 notes: halved the amount of loads and loops for the
pre-transform loop by exploiting the symmetry.
This commit implements an iMDCT in pure assembly.
This is capable of processing any mod-8 transforms, rather than just
power of two, but since power of two is all we have assembly for
currently, that's what's supported.
It would really benefit if we could somehow use the C code to decide
which function to jump into, but exposing function labels from assebly
into C is anything but easy.
The post-transform loop could probably be improved.
This was somewhat annoying to write, as we must support arbitrary
strides during runtime. There's a fast branch for stride == 4 bytes
and a slower one which uses vgatherdps.
Zen 3 benchmarks for stride == 4 for old (av_imdct_half) vs new (av_tx):
128pt:
2815 decicycles in av_tx (imdct),16776766 runs, 450 skips
3097 decicycles in av_imdct_half,16776745 runs, 471 skips
256pt:
4931 decicycles in av_tx (imdct), 4193127 runs, 1177 skips
5401 decicycles in av_imdct_half, 2097058 runs, 94 skips
512pt:
9764 decicycles in av_tx (imdct), 4193929 runs, 375 skips
10690 decicycles in av_imdct_half, 2096948 runs, 204 skips
1024pt:
20113 decicycles in av_tx (imdct), 4194202 runs, 102 skips
21258 decicycles in av_imdct_half, 2097147 runs, 5 skips
Patch attached.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: v2-0001-x86-tx_float-implement-inverse-MDCT-AVX2-assembly.patch
Type: text/x-diff
Size: 12881 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20220902/034fa486/attachment.patch>
More information about the ffmpeg-devel
mailing list