[FFmpeg-devel] [PATCH v2] x86/tx_float: implement inverse MDCT AVX2 assembly
Henrik Gramner
henrik at gramner.com
Fri Sep 2 17:03:21 EEST 2022
On Fri, Sep 2, 2022 at 7:55 AM Lynne <dev at lynne.ee> wrote:
> + movd xmm4, strided
> + neg t2d
> + movd xmm5, t2d
> + SPLATD xmm4
> + SPLATD xmm5
> + vperm2f128 m4, m4, m4, 0x00 ; +stride splatted
> + vperm2f128 m5, m5, m5, 0x00 ; -stride splatted
movd xm4, strided
pxor m5, m5
vpbroadcastd m4, xm4
+ mova m2, [lutq] ; load LUT indices
+ pcmpeqd m0, m0 ; zero out a register
+ pmulld m3, m2, m4 ; multiply by +stride
+ pmulld m2, m5 ; multiply by -stride
+ movaps m1, m0
+ vgatherdps m6, [inq + 2*m3], m0 ; im
+ vgatherdps m7, [t1q + 2*m2], m1 ; re
pmulld m2, m4, [lutq]
pcmpeqd m0, m0
mova m1, m0
vgatherdps m6, [inq + 2*m2], m0
psubd m2, m5, m2
vgatherdps m7, [t1q + 2*m2], m1
The comment for pcmpeqd is also wrong as bits are set to 1, not 0.
That instruction could also be moved outside the loop and replaced
with a cheaper register-register move inside the loop.
> + vperm2f128 m0, m0, 0x01 ; flip
> + vperm2f128 m4, m4, 0x01 ; flip (2)
> + shufpd m0, m0, 101b
> + shufpd m4, m4, 101b
vpermpd m0, m0, q0123
vpermpd m4, m4, q0123
More information about the ffmpeg-devel
mailing list