[FFmpeg-cvslog] lavc/aacpsdsp: unroll RISC-V V add_squares
Rémi Denis-Courmont
git at videolan.org
Wed Jul 19 19:30:39 EEST 2023
ffmpeg | branch: master | Rémi Denis-Courmont <remi at remlab.net> | Sat Jul 15 23:30:59 2023 +0300| [2eb55157aab49076eb581c99227ee81ef5d06b6e] | committer: Rémi Denis-Courmont
lavc/aacpsdsp: unroll RISC-V V add_squares
This slightly improves performance with the Device Under Test.
> http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=2eb55157aab49076eb581c99227ee81ef5d06b6e
---
libavcodec/riscv/aacpsdsp_rvv.S | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/libavcodec/riscv/aacpsdsp_rvv.S b/libavcodec/riscv/aacpsdsp_rvv.S
index 80bd19f6ad..b581383f77 100644
--- a/libavcodec/riscv/aacpsdsp_rvv.S
+++ b/libavcodec/riscv/aacpsdsp_rvv.S
@@ -22,13 +22,13 @@
func ff_ps_add_squares_rvv, zve32f
1:
- vsetvli t0, a2, e32, m1, ta, ma
+ vsetvli t0, a2, e32, m4, ta, ma
vlseg2e32.v v24, (a1)
sub a2, a2, t0
vle32.v v16, (a0)
sh3add a1, t0, a1
vfmacc.vv v16, v24, v24
- vfmacc.vv v16, v25, v25
+ vfmacc.vv v16, v28, v28
vse32.v v16, (a0)
sh2add a0, t0, a0
bnez a2, 1b
More information about the ffmpeg-cvslog
mailing list