[FFmpeg-devel] [PATCH 2/2] x86/tx_float: implement inverse MDCT AVX2 assembly
Michael Niedermayer
michael at niedermayer.cc
Sat Sep 3 23:55:38 EEST 2022
On Sat, Sep 03, 2022 at 03:42:36AM +0200, Lynne wrote:
> This commit implements an iMDCT in pure assembly.
>
> This is capable of processing any mod-8 transforms, rather than just
> power of two, but since power of two is all we have assembly for
> currently, that's what's supported.
> It would really benefit if we could somehow use the C code to decide
> which function to jump into, but exposing function labels from assebly
> into C is anything but easy.
> The post-transform loop could probably be improved.
>
> This was somewhat annoying to write, as we must support arbitrary
> strides during runtime. There's a fast branch for stride == 4 bytes
> and a slower one which uses vgatherdps.
>
> Zen 3 benchmarks for stride == 4 for old (av_imdct_half) vs new (av_tx):
>
> 128pt:
> 2811 decicycles in av_tx (imdct),16775916 runs, 1300 skips
> 3082 decicycles in av_imdct_half,16776751 runs, 465 skips
>
> 256pt:
> 4920 decicycles in av_tx (imdct),16775820 runs, 1396 skips
> 5378 decicycles in av_imdct_half,16776411 runs, 805 skips
>
> 512pt:
> 9668 decicycles in av_tx (imdct),16775774 runs, 1442 skips
> 10626 decicycles in av_imdct_half,16775647 runs, 1569 skips
>
> 1024pt:
> 19812 decicycles in av_tx (imdct),16777144 runs, 72 skips
> 23036 decicycles in av_imdct_half,16777167 runs, 49 skips
>
> Patch attached.
>
x86-32 doesnt digest this very well
src/libavutil/x86/tx_float.asm:1540: error: (ASSERT:2) assertion ``8 <= 7'' failed
src/libavutil/x86/tx_float.asm:1361: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:721: ... from macro `cglobal' defined here
src//libavutil/x86/x86inc.asm:756: ... from macro `cglobal_internal' defined here
src//libavutil/x86/x86inc.asm:618: ... from macro `PROLOGUE' defined here
src//libavutil/x86/x86inc.asm:304: ... from macro `ASSERT' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r7' undefined
src/libavutil/x86/tx_float.asm:1361: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:721: ... from macro `cglobal' defined here
src//libavutil/x86/x86inc.asm:756: ... from macro `cglobal_internal' defined here
src//libavutil/x86/x86inc.asm:620: ... from macro `PROLOGUE' defined here
src//libavutil/x86/x86inc.asm:382: ... from macro `ALLOC_STACK' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r7' undefined
src/libavutil/x86/tx_float.asm:1361: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:721: ... from macro `cglobal' defined here
src//libavutil/x86/x86inc.asm:756: ... from macro `cglobal_internal' defined here
src//libavutil/x86/x86inc.asm:621: ... from macro `PROLOGUE' defined here
src//libavutil/x86/x86inc.asm:273: ... from macro `LOAD_IF_USED' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r7' undefined
src/libavutil/x86/tx_float.asm:1361: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:721: ... from macro `cglobal' defined here
src//libavutil/x86/x86inc.asm:756: ... from macro `cglobal_internal' defined here
src//libavutil/x86/x86inc.asm:621: ... from macro `PROLOGUE' defined here
src//libavutil/x86/x86inc.asm:273: ... from macro `LOAD_IF_USED' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r7' undefined
src/libavutil/x86/tx_float.asm:1361: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:721: ... from macro `cglobal' defined here
src//libavutil/x86/x86inc.asm:756: ... from macro `cglobal_internal' defined here
src//libavutil/x86/x86inc.asm:621: ... from macro `PROLOGUE' defined here
src//libavutil/x86/x86inc.asm:273: ... from macro `LOAD_IF_USED' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r7' undefined
src/libavutil/x86/tx_float.asm:1361: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:721: ... from macro `cglobal' defined here
src//libavutil/x86/x86inc.asm:756: ... from macro `cglobal_internal' defined here
src//libavutil/x86/x86inc.asm:621: ... from macro `PROLOGUE' defined here
src//libavutil/x86/x86inc.asm:273: ... from macro `LOAD_IF_USED' defined here
src/libavutil/x86/tx_float.asm:1540: error: invalid combination of opcode and operands
src/libavutil/x86/tx_float.asm:1362: ... from macro `IMDCT_FN' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r7d' undefined
src/libavutil/x86/tx_float.asm:1365: ... from macro `IMDCT_FN' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r7d' undefined
src/libavutil/x86/tx_float.asm:1366: ... from macro `IMDCT_FN' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r11q' undefined
src/libavutil/x86/tx_float.asm:1369: ... from macro `IMDCT_FN' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r7d' undefined
src/libavutil/x86/tx_float.asm:1380: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1396: ... from macro `movd' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r8q' undefined
src/libavutil/x86/tx_float.asm:1383: ... from macro `IMDCT_FN' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `m15' undefined
src/libavutil/x86/tx_float.asm:1384: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1459: ... from macro `pcmpeqd' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `m15' undefined
src/libavutil/x86/tx_float.asm:1388: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1395: ... from macro `movaps' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `m15' undefined
src/libavutil/x86/tx_float.asm:1390: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1395: ... from macro `movaps' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `m8' undefined
src/libavutil/x86/tx_float.asm:1394: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1395: ... from macro `movaps' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `m9' undefined
src/libavutil/x86/tx_float.asm:1395: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1395: ... from macro `movaps' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `m10' undefined
src/libavutil/x86/tx_float.asm:1403: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1414: ... from macro `movshdup' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `m11' undefined
src/libavutil/x86/tx_float.asm:1404: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1414: ... from macro `movshdup' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `m12' undefined
src/libavutil/x86/tx_float.asm:1405: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1415: ... from macro `movsldup' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `m13' undefined
src/libavutil/x86/tx_float.asm:1406: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1415: ... from macro `movsldup' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `m10' undefined
src/libavutil/x86/tx_float.asm:1408: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1421: ... from macro `mulps' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `m11' undefined
src/libavutil/x86/tx_float.asm:1409: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1421: ... from macro `mulps' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `m10' undefined
src/libavutil/x86/tx_float.asm:1411: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1562: ... from macro `shufps' defined here
src//libavutil/x86/x86inc.asm:1262: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `m11' undefined
src/libavutil/x86/tx_float.asm:1412: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1562: ... from macro `shufps' defined here
src//libavutil/x86/x86inc.asm:1262: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `m10' undefined
src/libavutil/x86/tx_float.asm:1414: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1668: ... from macro `fmaddsubps' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `m11' undefined
src/libavutil/x86/tx_float.asm:1415: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1668: ... from macro `fmaddsubps' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r8q' undefined
src/libavutil/x86/tx_float.asm:1417: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1398: ... from macro `movdqa' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src//libavutil/x86/x86inc.asm:1716: ... from macro `vmovdqa' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r8q' undefined
src/libavutil/x86/tx_float.asm:1418: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1398: ... from macro `movdqa' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src//libavutil/x86/x86inc.asm:1716: ... from macro `vmovdqa' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r8q' undefined
src/libavutil/x86/tx_float.asm:1422: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1140: ... from macro `add' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r7q' undefined
src/libavutil/x86/tx_float.asm:1430: ... from macro `IMDCT_FN' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r7q' undefined
src/libavutil/x86/tx_float.asm:1431: ... from macro `IMDCT_FN' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r8q' undefined
src/libavutil/x86/tx_float.asm:1432: ... from macro `IMDCT_FN' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r7q' undefined
src/libavutil/x86/tx_float.asm:1436: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1395: ... from macro `movaps' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r8q' undefined
src/libavutil/x86/tx_float.asm:1444: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1395: ... from macro `movaps' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `m8' undefined
src/libavutil/x86/tx_float.asm:1449: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1564: ... from macro `shufps' defined here
src//libavutil/x86/x86inc.asm:1260: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `m8' undefined
src/libavutil/x86/tx_float.asm:1452: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1421: ... from macro `mulps' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: invalid combination of opcode and operands
src/libavutil/x86/tx_float.asm:1461: ... from macro `IMDCT_FN' defined here
src/libavutil/x86/tx_float.asm:1540: error: invalid combination of opcode and operands
src/libavutil/x86/tx_float.asm:1462: ... from macro `IMDCT_FN' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r9q' undefined
src/libavutil/x86/tx_float.asm:1463: ... from macro `IMDCT_FN' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r10q' undefined
src/libavutil/x86/tx_float.asm:1464: ... from macro `IMDCT_FN' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r9q' undefined
src/libavutil/x86/tx_float.asm:1468: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1405: ... from macro `movlps' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r10q' undefined
src/libavutil/x86/tx_float.asm:1469: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1402: ... from macro `movhps' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r8q' undefined
src/libavutil/x86/tx_float.asm:1471: ... from macro `IMDCT_FN' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r8q' undefined
src/libavutil/x86/tx_float.asm:1472: ... from macro `IMDCT_FN' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r9q' undefined
src/libavutil/x86/tx_float.asm:1473: ... from macro `IMDCT_FN' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r10q' undefined
src/libavutil/x86/tx_float.asm:1474: ... from macro `IMDCT_FN' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r9q' undefined
src/libavutil/x86/tx_float.asm:1478: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1405: ... from macro `movlps' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r10q' undefined
src/libavutil/x86/tx_float.asm:1479: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1402: ... from macro `movhps' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r7q' undefined
src/libavutil/x86/tx_float.asm:1484: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1152: ... from macro `sub' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r8q' undefined
src/libavutil/x86/tx_float.asm:1485: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1152: ... from macro `sub' defined here
src/libavutil/x86/tx_float.asm:1540: error: invalid combination of opcode and operands
src/libavutil/x86/tx_float.asm:1489: ... from macro `IMDCT_FN' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r8q' undefined
src/libavutil/x86/tx_float.asm:1490: ... from macro `IMDCT_FN' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r11q' undefined
src/libavutil/x86/tx_float.asm:1492: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1118: ... from macro `call' defined here
src//libavutil/x86/x86inc.asm:1130: ... from macro `call_internal' defined here
src/libavutil/x86/tx_float.asm:1540: error: invalid combination of opcode and operands
src/libavutil/x86/tx_float.asm:1495: ... from macro `IMDCT_FN' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r7q' undefined
src/libavutil/x86/tx_float.asm:1499: ... from macro `IMDCT_FN' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r8q' undefined
src/libavutil/x86/tx_float.asm:1500: ... from macro `IMDCT_FN' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r7q' undefined
src/libavutil/x86/tx_float.asm:1505: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1395: ... from macro `movaps' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r8q' undefined
src/libavutil/x86/tx_float.asm:1506: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1395: ... from macro `movaps' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r7q' undefined
src/libavutil/x86/tx_float.asm:1507: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1395: ... from macro `movaps' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r8q' undefined
src/libavutil/x86/tx_float.asm:1508: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1395: ... from macro `movaps' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r8q' undefined
src/libavutil/x86/tx_float.asm:1530: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1395: ... from macro `movaps' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r7q' undefined
src/libavutil/x86/tx_float.asm:1531: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1395: ... from macro `movaps' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r7q' undefined
src/libavutil/x86/tx_float.asm:1533: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1140: ... from macro `add' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r8q' undefined
src/libavutil/x86/tx_float.asm:1534: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:1152: ... from macro `sub' defined here
src/libavutil/x86/tx_float.asm:1540: error: symbol `r7' undefined
src/libavutil/x86/tx_float.asm:1537: ... from macro `IMDCT_FN' defined here
src//libavutil/x86/x86inc.asm:638: ... from macro `RET' defined here
src/ffbuild/common.mak:103: recipe for target 'libavutil/x86/tx_float.o' failed
make: *** [libavutil/x86/tx_float.o] Error 1
make: *** Waiting for unfinished jobs....
thx
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Everything should be made as simple as possible, but not simpler.
-- Albert Einstein
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20220903/c74b79ff/attachment.sig>
More information about the ffmpeg-devel
mailing list