[FFmpeg-devel] [PATCH v3 1/2][GSoC 2024] libavcodec/x86/vvc: Add AVX2 DMVR SAD functions for VVC
Ronald S. Bultje
rsbultje at gmail.com
Sat May 18 18:33:14 EEST 2024
Hi,
On Tue, May 14, 2024 at 4:40 PM Stone Chen <chen.stonechen at gmail.com> wrote:
> + vvc_sad_8:
> + .loop_height:
> + movu xm0, [src1q]
> + movu xm1, [src2q]
> + MIN_MAX_SAD xm2, xm0, xm1
> + vpmovzxwd m1, xm1
> + vpaddd m3, m1
>
[..]
> + vvc_sad_16_128:
> + .loop_height:
>
[..]
> + .loop_width:
> + movu xm0, [src1q]
> + movu xm1, [src2q]
> + MIN_MAX_SAD xm2, xm0, xm1
> + vpmovzxwd m1, xm1
> + vpaddd m3, m1
>
Wouldn't it be more efficient if the main loops did a full register worth
at a time?
vpbroadcastd m4, [pw_1]
loop:
movu m0, [src1q]
movu m1, [src2q]
MIN_MAX_SAD m2, m0, m1
pmaddwd m1, m4
paddd m3, m1
(And then for w8, load 2 rows per iteration using movu xmN, [row0] and
vinserti128 mN, [row1], 1.)
Ronald
More information about the ffmpeg-devel
mailing list