[FFmpeg-devel] [PATCH v2] x86/tx_float: implement inverse MDCT AVX2 assembly

Henrik Gramner henrik at gramner.com
Fri Sep 2 17:03:21 EEST 2022


On Fri, Sep 2, 2022 at 7:55 AM Lynne <dev at lynne.ee> wrote:
> +    movd xmm4, strided
> +    neg t2d
> +    movd xmm5, t2d
> +    SPLATD xmm4
> +    SPLATD xmm5
> +    vperm2f128 m4, m4, m4, 0x00      ; +stride splatted
> +    vperm2f128 m5, m5, m5, 0x00      ; -stride splatted

movd xm4, strided
pxor m5, m5
vpbroadcastd m4, xm4

+    mova m2, [lutq]                  ; load LUT indices
+    pcmpeqd m0, m0                   ; zero out a register
+    pmulld m3, m2, m4                ; multiply by +stride
+    pmulld m2, m5                    ; multiply by -stride
+    movaps m1, m0
+    vgatherdps m6, [inq + 2*m3], m0  ; im
+    vgatherdps m7, [t1q + 2*m2], m1  ; re

pmulld m2, m4, [lutq]
pcmpeqd m0, m0
mova m1, m0
vgatherdps m6, [inq + 2*m2], m0
psubd m2, m5, m2
vgatherdps m7, [t1q + 2*m2], m1

The comment for pcmpeqd is also wrong as bits are set to 1, not 0.
That instruction could also be moved outside the loop and replaced
with a cheaper register-register move inside the loop.

> +    vperm2f128   m0, m0, 0x01        ; flip
> +    vperm2f128   m4, m4, 0x01        ; flip (2)
> +    shufpd       m0, m0, 101b
> +    shufpd       m4, m4, 101b

vpermpd m0, m0, q0123
vpermpd m4, m4, q0123


More information about the ffmpeg-devel mailing list