[FFmpeg-devel] [PATCH] af_afir: RISC-V V fcmul_add
Paul B Mahol
onemda at gmail.com
Mon Nov 13 18:01:18 EET 2023
On Mon, Nov 13, 2023 at 4:35 PM Rémi Denis-Courmont <remi at remlab.net> wrote:
> Hi,
>
> Le maanantaina 13. marraskuuta 2023, 11.43.01 EET flow gg a écrit :
> > Sorry for the long delay in responding.
>
> No problem. Working with T-Head C910 (or C920?) cores is very tedious. I
> gave
> up on that and switched over to Kendryte K230 (based on C908) now.
>
> > How is the modified patch now?
>
> It looks better, but some minute improvements are still possible.
>
> > no longer using register stride(learn from your code) and have switched
> to
> > shNadd instead.
> >
> > (using m4 and m2 as they are slightly faster than m8 and m4)
> >
> > benchmark:
> > fcmul_add_c: 2179
> > fcmul_add_rvv_f32: 1652
>
> > diff --git a/libavfilter/af_afirdsp.h b/libavfilter/af_afirdsp.h
> > index 4208501393..d2d1e909c1 100644
> > --- a/libavfilter/af_afirdsp.h
> > +++ b/libavfilter/af_afirdsp.h
> > @@ -34,6 +34,7 @@ typedef struct AudioFIRDSPContext {
> > } AudioFIRDSPContext;
> >
> > void ff_afir_init_x86(AudioFIRDSPContext *s);
> > +void ff_afir_init_riscv(AudioFIRDSPContext *s);
>
> Nit: please stick to alphabetical order like most similar code.
>
> >
> > static void fcmul_add_c(float *sum, const float *t, const float *c,
> > ptrdiff_t len)
> > {
> > @@ -76,6 +77,8 @@ static av_unused void ff_afir_init(AudioFIRDSPContext
> > *dsp)
> >
> > #if ARCH_X86
> > ff_afir_init_x86(dsp);
> > +#elif ARCH_RISCV
> > + ff_afir_init_riscv(dsp);
>
> Ditto.
>
> > #endif
> > }
> >
> > diff --git a/libavfilter/riscv/Makefile b/libavfilter/riscv/Makefile
> > new file mode 100644
> > index 0000000000..0b968a9c0d
> > --- /dev/null
> > +++ b/libavfilter/riscv/Makefile
> > @@ -0,0 +1,2 @@
> > +OBJS += riscv/af_afir_init.o
> > +RVV-OBJS += riscv/af_afir_rvv.o
> > diff --git a/libavfilter/riscv/af_afir_init.c
> > b/libavfilter/riscv/af_afir_init.c new file mode 100644
> > index 0000000000..13df8341e7
> > --- /dev/null
> > +++ b/libavfilter/riscv/af_afir_init.c
> > @@ -0,0 +1,39 @@
> > +/*
> > + * Copyright (c) 2023 Institue of Software Chinese Academy of Sciences
> > (ISCAS).
> > + *
> > + * This file is part of FFmpeg.
> > + *
> > + * FFmpeg is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU Lesser General Public
> > + * License as published by the Free Software Foundation; either
> > + * version 2.1 of the License, or (at your option) any later version.
> > + *
> > + * FFmpeg is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > + * Lesser General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU Lesser General Public
> > + * License along with FFmpeg; if not, write to the Free Software
> > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
> 02110-1301
> > USA
> > + */
> > +
> > +#include <stdint.h>
> > +
> > +#include "config.h"
> > +#include "libavutil/attributes.h"
> > +#include "libavutil/cpu.h"
> > +#include "libavfilter/af_afirdsp.h"
> > +
> > +void ff_fcmul_add_rvv(float *sum, const float *t, const float *c,
> > + ptrdiff_t len);
> > +
> > +av_cold void ff_afir_init_riscv(AudioFIRDSPContext *s)
> > +{
> > +#if HAVE_RVV
> > + int flags = av_get_cpu_flags();
> > +
> > + if (flags & AV_CPU_FLAG_RVV_F32)
>
> You need to check for Zba as well here. I doubt that we'll see hardware
> with V
> and without Zba in real life, but for the sake of correctness...
>
> > + s->fcmul_add = ff_fcmul_add_rvv;
> > +#endif
> > +}
> > diff --git a/libavfilter/riscv/af_afir_rvv.S
> > b/libavfilter/riscv/af_afir_rvv.S new file mode 100644
> > index 0000000000..078cac8e7e
> > --- /dev/null
> > +++ b/libavfilter/riscv/af_afir_rvv.S
> > @@ -0,0 +1,61 @@
> > +/*
> > + * Copyright (c) 2023 Institue of Software Chinese Academy of Sciences
> > (ISCAS).
> > + *
> > + * This file is part of FFmpeg.
> > + *
> > + * FFmpeg is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU Lesser General Public
> > + * License as published by the Free Software Foundation; either
> > + * version 2.1 of the License, or (at your option) any later version.
> > + *
> > + * FFmpeg is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > + * Lesser General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU Lesser General Public
> > + * License along with FFmpeg; if not, write to the Free Software
> > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
> 02110-1301
> > USA
> > + */
> > +
> > +#include "libavutil/riscv/asm.S"
> > +
> > +// void ff_fcmul_add(float *sum, const float *t, const float *c, int
> len)
> > +func ff_fcmul_add_rvv, zve32f
> > + li t1, 32
> > +1:
> > + vsetvli t0, a3, e64, m4, ta, ma
>
> You can set SEW=32 and corresponding LMUL here. Then you can remove all
> other
> VSETVLI instances below. (Note that this will NOT work on draft 0.7.1
> hardware, but it does work on conformant hardware.)
>
> > + vle64.v v12, (a0)
>
> This requires 64-bit alignment. I don't know if this is correct for this
> specific filter, so I leave it to other people to comment here.
>
Array should be aligned as allocated by libavutil calls.
The buffers sizes are aligned using av_cpu_align() so if that returns
correct size it should work.
>
> > + sub a3, a3, t0
> > + vsetvli zero, zero, e32, m2, ta, ma
> > + vnsrl.vx v8, v12, zero
> > + vnsrl.vx v10, v12, t1
> > + vsetvli zero, zero, e64, m4, ta, ma
> > + vle64.v v12, (a1)
> > + sh3add a1, t0, a1
> > + vsetvli zero, zero, e32, m2, ta, ma
> > + vnsrl.vx v0, v12, zero
> > + vnsrl.vx v2, v12, t1
> > + vsetvli zero, zero, e64, m4, ta, ma
> > + vle64.v v12, (a2)
> > + sh3add a2, t0, a2
> > + vsetvli zero, zero, e32, m2, ta, ma
> > + vnsrl.vx v4, v12, zero
> > + vnsrl.vx v6, v12, t1
> > + vfmacc.vv v8, v0, v4
> > + vfnmsac.vv v8, v2, v6
> > + vfmacc.vv v10, v0, v6
>
> Swap the two instructions above for better pipeline utilisation on
> in-order
> CPUs.
>
> > + vfmacc.vv v10, v2, v4
> > + vsseg2e32.v v8, (a0)
> > + sh3add a0, t0, a0
> > + bgtz a3, 1b
> > +
> > + flw fa0, 0(a1)
> > + flw fa1, 0(a2)
> > + flw fa2, 0(a0)
> > + fmul.s fa0, fa0, fa1
> > + fadd.s fa2, fa2, fa0
>
> It won't make much difference, but you can use a fused multiply-add here.
>
> > + fsw fa2, 0(a0)
> > +
> > + ret
> > +endfunc
>
> While you're at it, this looks like it could easily be adapted for the
> double
> precision version. In fact, it will be simpler, since you will have to use
> vlseg2e64 rather than vle128.v+vnsrl.vx+vnsrl.vx. But if you decide to
> implement that too, please keep it a separate patch.
>
> --
> レミ・デニ-クールモン
> http://www.remlab.net/
>
>
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
>
More information about the ffmpeg-devel
mailing list