[FFmpeg-devel] [PATCH] af_afir: RISC-V V fcmul_add
Rémi Denis-Courmont
remi at remlab.net
Mon Nov 13 17:35:35 EET 2023
Hi,
Le maanantaina 13. marraskuuta 2023, 11.43.01 EET flow gg a écrit :
> Sorry for the long delay in responding.
No problem. Working with T-Head C910 (or C920?) cores is very tedious. I gave
up on that and switched over to Kendryte K230 (based on C908) now.
> How is the modified patch now?
It looks better, but some minute improvements are still possible.
> no longer using register stride(learn from your code) and have switched to
> shNadd instead.
>
> (using m4 and m2 as they are slightly faster than m8 and m4)
>
> benchmark:
> fcmul_add_c: 2179
> fcmul_add_rvv_f32: 1652
> diff --git a/libavfilter/af_afirdsp.h b/libavfilter/af_afirdsp.h
> index 4208501393..d2d1e909c1 100644
> --- a/libavfilter/af_afirdsp.h
> +++ b/libavfilter/af_afirdsp.h
> @@ -34,6 +34,7 @@ typedef struct AudioFIRDSPContext {
> } AudioFIRDSPContext;
>
> void ff_afir_init_x86(AudioFIRDSPContext *s);
> +void ff_afir_init_riscv(AudioFIRDSPContext *s);
Nit: please stick to alphabetical order like most similar code.
>
> static void fcmul_add_c(float *sum, const float *t, const float *c,
> ptrdiff_t len)
> {
> @@ -76,6 +77,8 @@ static av_unused void ff_afir_init(AudioFIRDSPContext
> *dsp)
>
> #if ARCH_X86
> ff_afir_init_x86(dsp);
> +#elif ARCH_RISCV
> + ff_afir_init_riscv(dsp);
Ditto.
> #endif
> }
>
> diff --git a/libavfilter/riscv/Makefile b/libavfilter/riscv/Makefile
> new file mode 100644
> index 0000000000..0b968a9c0d
> --- /dev/null
> +++ b/libavfilter/riscv/Makefile
> @@ -0,0 +1,2 @@
> +OBJS += riscv/af_afir_init.o
> +RVV-OBJS += riscv/af_afir_rvv.o
> diff --git a/libavfilter/riscv/af_afir_init.c
> b/libavfilter/riscv/af_afir_init.c new file mode 100644
> index 0000000000..13df8341e7
> --- /dev/null
> +++ b/libavfilter/riscv/af_afir_init.c
> @@ -0,0 +1,39 @@
> +/*
> + * Copyright (c) 2023 Institue of Software Chinese Academy of Sciences
> (ISCAS).
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301
> USA
> + */
> +
> +#include <stdint.h>
> +
> +#include "config.h"
> +#include "libavutil/attributes.h"
> +#include "libavutil/cpu.h"
> +#include "libavfilter/af_afirdsp.h"
> +
> +void ff_fcmul_add_rvv(float *sum, const float *t, const float *c,
> + ptrdiff_t len);
> +
> +av_cold void ff_afir_init_riscv(AudioFIRDSPContext *s)
> +{
> +#if HAVE_RVV
> + int flags = av_get_cpu_flags();
> +
> + if (flags & AV_CPU_FLAG_RVV_F32)
You need to check for Zba as well here. I doubt that we'll see hardware with V
and without Zba in real life, but for the sake of correctness...
> + s->fcmul_add = ff_fcmul_add_rvv;
> +#endif
> +}
> diff --git a/libavfilter/riscv/af_afir_rvv.S
> b/libavfilter/riscv/af_afir_rvv.S new file mode 100644
> index 0000000000..078cac8e7e
> --- /dev/null
> +++ b/libavfilter/riscv/af_afir_rvv.S
> @@ -0,0 +1,61 @@
> +/*
> + * Copyright (c) 2023 Institue of Software Chinese Academy of Sciences
> (ISCAS).
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301
> USA
> + */
> +
> +#include "libavutil/riscv/asm.S"
> +
> +// void ff_fcmul_add(float *sum, const float *t, const float *c, int len)
> +func ff_fcmul_add_rvv, zve32f
> + li t1, 32
> +1:
> + vsetvli t0, a3, e64, m4, ta, ma
You can set SEW=32 and corresponding LMUL here. Then you can remove all other
VSETVLI instances below. (Note that this will NOT work on draft 0.7.1
hardware, but it does work on conformant hardware.)
> + vle64.v v12, (a0)
This requires 64-bit alignment. I don't know if this is correct for this
specific filter, so I leave it to other people to comment here.
> + sub a3, a3, t0
> + vsetvli zero, zero, e32, m2, ta, ma
> + vnsrl.vx v8, v12, zero
> + vnsrl.vx v10, v12, t1
> + vsetvli zero, zero, e64, m4, ta, ma
> + vle64.v v12, (a1)
> + sh3add a1, t0, a1
> + vsetvli zero, zero, e32, m2, ta, ma
> + vnsrl.vx v0, v12, zero
> + vnsrl.vx v2, v12, t1
> + vsetvli zero, zero, e64, m4, ta, ma
> + vle64.v v12, (a2)
> + sh3add a2, t0, a2
> + vsetvli zero, zero, e32, m2, ta, ma
> + vnsrl.vx v4, v12, zero
> + vnsrl.vx v6, v12, t1
> + vfmacc.vv v8, v0, v4
> + vfnmsac.vv v8, v2, v6
> + vfmacc.vv v10, v0, v6
Swap the two instructions above for better pipeline utilisation on in-order
CPUs.
> + vfmacc.vv v10, v2, v4
> + vsseg2e32.v v8, (a0)
> + sh3add a0, t0, a0
> + bgtz a3, 1b
> +
> + flw fa0, 0(a1)
> + flw fa1, 0(a2)
> + flw fa2, 0(a0)
> + fmul.s fa0, fa0, fa1
> + fadd.s fa2, fa2, fa0
It won't make much difference, but you can use a fused multiply-add here.
> + fsw fa2, 0(a0)
> +
> + ret
> +endfunc
While you're at it, this looks like it could easily be adapted for the double
precision version. In fact, it will be simpler, since you will have to use
vlseg2e64 rather than vle128.v+vnsrl.vx+vnsrl.vx. But if you decide to
implement that too, please keep it a separate patch.
--
レミ・デニ-クールモン
http://www.remlab.net/
More information about the ffmpeg-devel
mailing list