[FFmpeg-cvslog] lavc/aacpsdsp: unroll RISC-V V add_squares

Rémi Denis-Courmont git at videolan.org
Wed Jul 19 19:30:39 EEST 2023


ffmpeg | branch: master | Rémi Denis-Courmont <remi at remlab.net> | Sat Jul 15 23:30:59 2023 +0300| [2eb55157aab49076eb581c99227ee81ef5d06b6e] | committer: Rémi Denis-Courmont

lavc/aacpsdsp: unroll RISC-V V add_squares

This slightly improves performance with the Device Under Test.

> http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=2eb55157aab49076eb581c99227ee81ef5d06b6e
---

 libavcodec/riscv/aacpsdsp_rvv.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libavcodec/riscv/aacpsdsp_rvv.S b/libavcodec/riscv/aacpsdsp_rvv.S
index 80bd19f6ad..b581383f77 100644
--- a/libavcodec/riscv/aacpsdsp_rvv.S
+++ b/libavcodec/riscv/aacpsdsp_rvv.S
@@ -22,13 +22,13 @@
 
 func ff_ps_add_squares_rvv, zve32f
 1:
-        vsetvli     t0, a2, e32, m1, ta, ma
+        vsetvli     t0, a2, e32, m4, ta, ma
         vlseg2e32.v v24, (a1)
         sub         a2, a2, t0
         vle32.v     v16, (a0)
         sh3add      a1, t0, a1
         vfmacc.vv   v16, v24, v24
-        vfmacc.vv   v16, v25, v25
+        vfmacc.vv   v16, v28, v28
         vse32.v     v16, (a0)
         sh2add      a0, t0, a0
         bnez        a2, 1b



More information about the ffmpeg-cvslog mailing list