[FFmpeg-devel] [PATCH v3 5/5] avcodec/ac3: Implement sum_square_butterfly_float for aarch64 NEON
Martin Storsjö
martin at martin.st
Thu Apr 4 16:01:01 EEST 2024
On Tue, 2 Apr 2024, Geoff Hill wrote:
> Signed-off-by: Geoff Hill <geoff at geoffhill.org>
> ---
> libavcodec/aarch64/ac3dsp_init_aarch64.c | 5 ++++
> libavcodec/aarch64/ac3dsp_neon.S | 35 ++++++++++++++++++++++++
> tests/checkasm/ac3dsp.c | 26 ++++++++++++++++++
> 3 files changed, 66 insertions(+)
>
> diff --git a/libavcodec/aarch64/ac3dsp_neon.S b/libavcodec/aarch64/ac3dsp_neon.S
> index fa8fcf2e47..4a78ec0b2a 100644
> --- a/libavcodec/aarch64/ac3dsp_neon.S
> +++ b/libavcodec/aarch64/ac3dsp_neon.S
> @@ -88,3 +88,38 @@ function ff_ac3_sum_square_butterfly_int32_neon, export=1
> st1 {v0.1d-v3.1d}, [x0]
> 1: ret
> endfunc
> +
> +function ff_ac3_sum_square_butterfly_float_neon, export=1
> + cbz w3, 1f
> + movi v0.4s, #0
> + movi v1.4s, #0
> + movi v2.4s, #0
> + movi v3.4s, #0
> +0: ld1 {v30.4s}, [x1], #16
> + ld1 {v31.4s}, [x2], #16
> + fadd v16.4s, v30.4s, v31.4s
> + fsub v17.4s, v30.4s, v31.4s
> + fmul v30.4s, v30.4s, v30.4s
> + fadd v0.4s, v0.4s, v30.4s
The arm version here used vmla instead of separate vmul+vadd - is there
any reason why we can't use fmla here?
// Martin
More information about the ffmpeg-devel
mailing list