[FFmpeg-devel] [PATCH 1/7] lavc/me_cmp: R-V V pix_abs
Rémi Denis-Courmont
remi at remlab.net
Thu Feb 8 21:41:35 EET 2024
Le keskiviikkona 7. helmikuuta 2024, 2.01.23 EET flow gg a écrit :
> I think in most cases it is like this, but specifically for this function,
> using Reduction only once would be slower.
>
> The currently submitted version roughly takes:
> pix_abs_0_0_rvv_i32: 136.2
>
> The version that uses Reduction only once takes:
> pix_abs_0_0_rvv_i32: 169.2
You're only using one vector and half a vector respectively, so the
logarithmic time of the sum is relatively small.
But are you sure that it wouldn't be faster to process multiple rows and
larger group multiplers?
> Here is the implementation of the version that uses it only once:
>
> func ff_pix_abs16_temp_rvv, zve32x
> vsetivli zero, 16, e32, m4, ta, ma
> vmv.v.i v24, 0
> vmv.s.x v0, zero
> 1:
> vsetvli zero, zero, e8, m1, tu, ma
> vle8.v v4, (a1)
> vle8.v v12, (a2)
> addi a4, a4, -1
> vwsubu.vv v16, v4, v12
> add a1, a1, a3
> vwsubu.vv v20, v12, v4
> vsetvli zero, zero, e16, m2, tu, ma
> vmax.vv v16, v16, v20
> add a2, a2, a3
> vwadd.wv v24, v24, v16
> bnez a4, 1b
>
> vsetvli zero, zero, e32, m4, ta, ma
> vwredsumu.vs v0, v24, v0
> vmv.x.s a0, v0
> ret
> endfunc
>
> Rémi Denis-Courmont <remi at remlab.net> 于2024年2月7日周三 00:58写道:
>
> > Hi,
> >
> > To sum a vector, you should only reduce once at the end of the function,
> > c.f.
> > how it's done in existing scalar products. Reduction instructions are
> > (intrinsically) slow.
> >
> > --
> > Rémi Denis-Courmont
> > http://www.remlab.net/
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
--
雷米‧德尼-库尔蒙
http://www.remlab.net/
More information about the ffmpeg-devel
mailing list