[FFmpeg-devel] [PATCH] libavcodec/aarch64/hevcdsp_idct_neon.S: Also port add_residual functions.

Martin Storsjö martin at martin.st
Sat Jan 16 00:59:55 EET 2021


On Sun, 10 Jan 2021, Reimar.Doeffinger at gmx.de wrote:

> From: Reimar Döffinger <Reimar.Doeffinger at gmx.de>
>
> Speedup is fairly small, around 1.5%, but these are fairly simple.
> ---
> libavcodec/aarch64/hevcdsp_idct_neon.S    | 190 ++++++++++++++++++++++
> libavcodec/aarch64/hevcdsp_init_aarch64.c |  24 +++
> 2 files changed, 214 insertions(+)
>
> diff --git a/libavcodec/aarch64/hevcdsp_idct_neon.S b/libavcodec/aarch64/hevcdsp_idct_neon.S
> index 9f67e45..edd03a0 100644
> --- a/libavcodec/aarch64/hevcdsp_idct_neon.S
> +++ b/libavcodec/aarch64/hevcdsp_idct_neon.S
> @@ -36,6 +36,196 @@ const trans, align=4
>         .short 31, 22, 13, 4
> endconst
> 
> +.macro clip10 in1, in2, c1, c2
> +        smax        \in1, \in1, \c1
> +        smax        \in2, \in2, \c1
> +        smin        \in1, \in1, \c2
> +        smin        \in2, \in2, \c2
> +.endm
> +
> +function ff_hevc_add_residual_4x4_8_neon, export=1
> +        ld1             {v0.8H-v1.8H}, [x1]
> +        ld1             {v2.S}[0], [x0], x2
> +        ld1             {v2.S}[1], [x0], x2
> +        ld1             {v2.S}[2], [x0], x2
> +        ld1             {v2.S}[3], [x0], x2
> +        sub             x0, x0, x2, lsl #2
> +        uxtl            v8.8H, v2.8B
> +        uxtl2           v9.8H, v2.16B
> +        sqadd           v0.8H, v0.8H, v8.8H

FWIW, as a matter of taste, I dislike the shouty uppercase version of e.g. 
element specifiers, like .8H here. The code base contains both styles, but 
I'd say the lowercase form is more prevalent.

Overall, this patch looks good, nothing much to comment on I think. Not 
tested fully though, as it depends on the other patch, which still has a 
few issues (and fails checkasm).

// Martin


More information about the ffmpeg-devel mailing list