[FFmpeg-devel] [PATCH v3 4/5] avcodec/ac3: Implement sum_square_butterfly_int32 for aarch64 NEON

Martin Storsjö martin at martin.st
Thu Apr 4 15:58:57 EEST 2024


On Tue, 2 Apr 2024, Geoff Hill wrote:

> Signed-off-by: Geoff Hill <geoff at geoffhill.org>
> ---
> libavcodec/aarch64/ac3dsp_init_aarch64.c |  5 +++++
> libavcodec/aarch64/ac3dsp_neon.S         | 24 +++++++++++++++++++++
> tests/checkasm/ac3dsp.c                  | 27 ++++++++++++++++++++++++
> 3 files changed, 56 insertions(+)
>
> diff --git a/libavcodec/aarch64/ac3dsp_init_aarch64.c b/libavcodec/aarch64/ac3dsp_init_aarch64.c
> index 1bdc215b51..e95436c651 100644
> --- a/libavcodec/aarch64/ac3dsp_init_aarch64.c
> +++ b/libavcodec/aarch64/ac3dsp_init_aarch64.c
> @@ -28,6 +28,10 @@
> void ff_ac3_exponent_min_neon(uint8_t *exp, int num_reuse_blocks, int nb_coefs);
> void ff_ac3_extract_exponents_neon(uint8_t *exp, int32_t *coef, int nb_coefs);
> void ff_float_to_fixed24_neon(int32_t *dst, const float *src, size_t len);
> +void ff_ac3_sum_square_butterfly_int32_neon(int64_t sum[4],
> +                                            const int32_t *coef0,
> +                                            const int32_t *coef1,
> +                                            int len);
>
> av_cold void ff_ac3dsp_init_aarch64(AC3DSPContext *c)
> {
> @@ -37,4 +41,5 @@ av_cold void ff_ac3dsp_init_aarch64(AC3DSPContext *c)
>     c->ac3_exponent_min = ff_ac3_exponent_min_neon;
>     c->extract_exponents = ff_ac3_extract_exponents_neon;
>     c->float_to_fixed24 = ff_float_to_fixed24_neon;
> +    c->sum_square_butterfly_int32 = ff_ac3_sum_square_butterfly_int32_neon;
> }
> diff --git a/libavcodec/aarch64/ac3dsp_neon.S b/libavcodec/aarch64/ac3dsp_neon.S
> index b26f71a3f6..fa8fcf2e47 100644
> --- a/libavcodec/aarch64/ac3dsp_neon.S
> +++ b/libavcodec/aarch64/ac3dsp_neon.S
> @@ -64,3 +64,27 @@ function ff_float_to_fixed24_neon, export=1
>         b.ne        0b
>         ret
> endfunc
> +
> +function ff_ac3_sum_square_butterfly_int32_neon, export=1
> +        cbz         w3, 1f

The arm version of this patch doesn't have any corresponding check for 
whether this parameter is zero, and the checkasm test doesn't test that 
behaviour either. Is that never feasiable (and we could leave it out here) 
or should we test that and fix it in other assembly versions? In the 
latter case, it's of course ok to defer that to a separate later patch, 
not holding up this one.

// Martin




More information about the ffmpeg-devel mailing list