[FFmpeg-devel] [PATCH] libavcodec/aarch64/hevcdsp_idct_neon.S: Also port add_residual functions.
Martin Storsjö
martin at martin.st
Sat Jan 16 00:59:55 EET 2021
On Sun, 10 Jan 2021, Reimar.Doeffinger at gmx.de wrote:
> From: Reimar Döffinger <Reimar.Doeffinger at gmx.de>
>
> Speedup is fairly small, around 1.5%, but these are fairly simple.
> ---
> libavcodec/aarch64/hevcdsp_idct_neon.S | 190 ++++++++++++++++++++++
> libavcodec/aarch64/hevcdsp_init_aarch64.c | 24 +++
> 2 files changed, 214 insertions(+)
>
> diff --git a/libavcodec/aarch64/hevcdsp_idct_neon.S b/libavcodec/aarch64/hevcdsp_idct_neon.S
> index 9f67e45..edd03a0 100644
> --- a/libavcodec/aarch64/hevcdsp_idct_neon.S
> +++ b/libavcodec/aarch64/hevcdsp_idct_neon.S
> @@ -36,6 +36,196 @@ const trans, align=4
> .short 31, 22, 13, 4
> endconst
>
> +.macro clip10 in1, in2, c1, c2
> + smax \in1, \in1, \c1
> + smax \in2, \in2, \c1
> + smin \in1, \in1, \c2
> + smin \in2, \in2, \c2
> +.endm
> +
> +function ff_hevc_add_residual_4x4_8_neon, export=1
> + ld1 {v0.8H-v1.8H}, [x1]
> + ld1 {v2.S}[0], [x0], x2
> + ld1 {v2.S}[1], [x0], x2
> + ld1 {v2.S}[2], [x0], x2
> + ld1 {v2.S}[3], [x0], x2
> + sub x0, x0, x2, lsl #2
> + uxtl v8.8H, v2.8B
> + uxtl2 v9.8H, v2.16B
> + sqadd v0.8H, v0.8H, v8.8H
FWIW, as a matter of taste, I dislike the shouty uppercase version of e.g.
element specifiers, like .8H here. The code base contains both styles, but
I'd say the lowercase form is more prevalent.
Overall, this patch looks good, nothing much to comment on I think. Not
tested fully though, as it depends on the other patch, which still has a
few issues (and fails checkasm).
// Martin
More information about the ffmpeg-devel
mailing list