[FFmpeg-devel] [PATCH] avfilter/scene_sad: add AArch64 SIMD
zhilizhao
quinkblack at foxmail.com
Sun Feb 2 06:54:33 EET 2020
> On Feb 2, 2020, at 4:26 AM, Marton Balint <cus at passwd.hu> wrote:
>
>
>
> On Sat, 1 Feb 2020, quinkblack at foxmail.com <mailto:quinkblack at foxmail.com> wrote:
>
>> From: Zhao Zhili <quinkblack at foxmail.com>
>>
>> For 8 bit depth:
>> ./ffmpeg -threads 1 -f lavfi -t 10 -i 'yuvtestsrc=size=4096x2048,format=yuv444p' -vf 'freezedetect' -f null -benchmark -
>>
>> Test results on Snapdragon 845:
>> Before:
>> frame= 250 fps= 23 q=-0.0 Lsize=N/A time=00:00:10.00 bitrate=N/A speed=0.924x
>> bench: utime=8.360s stime=2.350s rtime=10.820s
>> After:
>> frame= 250 fps= 51 q=-0.0 Lsize=N/A time=00:00:10.00 bitrate=N/A speed=2.04x
>> bench: utime=2.650s stime=2.210s rtime=4.909s
>>
>> Test results on HiSilicon Kirin 970:
>> Before:
>> frame= 250 fps=6.0 q=-0.0 Lsize=N/A time=00:00:10.00 bitrate=N/A speed=0.239x
>> bench: utime=35.156s stime=6.604s rtime=41.820s
>> After:
>> frame= 250 fps= 10 q=-0.0 Lsize=N/A time=00:00:10.00 bitrate=N/A speed=0.403x
>> bench: utime=18.400s stime=6.376s rtime=24.798s
>>
>> For 16 bit depth:
>> ./ffmpeg -threads 1 -f lavfi -t 10 -i 'yuvtestsrc=size=4096x2048,format=yuv444p16' -vf 'freezedetect' -f null -benchmark -
>>
>> Test results on Snapdragon 845
>> Before:
>> frame= 250 fps= 19 q=-0.0 Lsize=N/A time=00:00:10.00 bitrate=N/A speed=0.756x
>> bench: utime=8.700s stime=4.410s rtime=13.226s
>> After:
>> frame= 250 fps= 27 q=-0.0 Lsize=N/A time=00:00:10.00 bitrate=N/A speed=1.07x
>> bench: utime=4.920s stime=4.350s rtime=9.356s
>>
>> Test results on HiSilicon Kirin 970:
>> Before:
>> frame= 250 fps=4.0 q=-0.0 Lsize=N/A time=00:00:10.00 bitrate=N/A speed=0.161x
>> bench: utime=48.868s stime=13.124s rtime=62.110s
>> After:
>> frame= 250 fps=5.1 q=-0.0 Lsize=N/A time=00:00:10.00 bitrate=N/A speed=0.205x
>> bench: utime=35.600s stime=13.036s rtime=48.708s
>> ---
>> libavfilter/aarch64/Makefile | 2 +
>> libavfilter/aarch64/scene_sad_init.c | 37 +++++++
>> libavfilter/aarch64/scene_sad_neon.S | 149 +++++++++++++++++++++++++++
>> libavfilter/scene_sad.c | 2 +
>> libavfilter/scene_sad.h | 2 +
>> 5 files changed, 192 insertions(+)
>> create mode 100644 libavfilter/aarch64/scene_sad_init.c
>> create mode 100644 libavfilter/aarch64/scene_sad_neon.S
>
> Does your ASM handles cases when width is not a multiple of the vector size? If not, then you should probably do something similar to what is done for X86.
>
The code after `+ // scalar loop` handles that. It supports width and height >= 1.
> Thanks,
> Marton
>
>>
>> diff --git a/libavfilter/aarch64/Makefile b/libavfilter/aarch64/Makefile
>> index 6c727f9859..3a458f511f 100644
>> --- a/libavfilter/aarch64/Makefile
>> +++ b/libavfilter/aarch64/Makefile
>> @@ -1,7 +1,9 @@
>> OBJS-$(CONFIG_NLMEANS_FILTER) += aarch64/af_afir_init.o
>> OBJS-$(CONFIG_NLMEANS_FILTER) += aarch64/af_anlmdn_init.o
>> +OBJS-$(CONFIG_NLMEANS_FILTER) += aarch64/scene_sad_init.o
>> OBJS-$(CONFIG_NLMEANS_FILTER) += aarch64/vf_nlmeans_init.o
>> NEON-OBJS-$(CONFIG_NLMEANS_FILTER) += aarch64/af_afir_neon.o
>> NEON-OBJS-$(CONFIG_NLMEANS_FILTER) += aarch64/af_anlmdn_neon.o
>> +NEON-OBJS-$(CONFIG_NLMEANS_FILTER) += aarch64/scene_sad_neon.o
>> NEON-OBJS-$(CONFIG_NLMEANS_FILTER) += aarch64/vf_nlmeans_neon.o
>> diff --git a/libavfilter/aarch64/scene_sad_init.c b/libavfilter/aarch64/scene_sad_init.c
>> new file mode 100644
>> index 0000000000..8de769ac10
>> --- /dev/null
>> +++ b/libavfilter/aarch64/scene_sad_init.c
>> @@ -0,0 +1,37 @@
>> +/*
>> + * This file is part of FFmpeg.
>> + *
>> + * FFmpeg is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU Lesser General Public
>> + * License as published by the Free Software Foundation; either
>> + * version 2.1 of the License, or (at your option) any later version.
>> + *
>> + * FFmpeg is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
>> + * Lesser General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU Lesser General Public
>> + * License along with FFmpeg; if not, write to the Free Software
>> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
>> + */
>> +
>> +#include "libavutil/aarch64/cpu.h"
>> +#include "libavfilter/scene_sad.h"
>> +
>> +void ff_scene_sad_neon(SCENE_SAD_PARAMS);
>> +
>> +void ff_scene_sad16_neon(SCENE_SAD_PARAMS);
>> +
>> +ff_scene_sad_fn ff_scene_sad_get_fn_aarch64(int depth)
>> +{
>> + int cpu_flags = av_get_cpu_flags();
>> + if (have_neon(cpu_flags)) {
>> + if (depth == 8)
>> + return ff_scene_sad_neon;
>> + if (depth == 16)
>> + return ff_scene_sad16_neon;
>> + }
>> +
>> + return NULL;
>> +}
>> diff --git a/libavfilter/aarch64/scene_sad_neon.S b/libavfilter/aarch64/scene_sad_neon.S
>> new file mode 100644
>> index 0000000000..5b3b027a53
>> --- /dev/null
>> +++ b/libavfilter/aarch64/scene_sad_neon.S
>> @@ -0,0 +1,149 @@
>> +/*
>> + * Copyright (c) 2020 Zhao Zhili
>> + *
>> + * This file is part of FFmpeg.
>> + *
>> + * FFmpeg is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU Lesser General Public
>> + * License as published by the Free Software Foundation; either
>> + * version 2.1 of the License, or (at your option) any later version.
>> + *
>> + * FFmpeg is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
>> + * Lesser General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU Lesser General Public
>> + * License along with FFmpeg; if not, write to the Free Software
>> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
>> + */
>> +
>> +#include "libavutil/aarch64/asm.S"
>> +
>> +// void ff_scene_sadx_neon(const uint8_t *src1, ptrdiff_t stride1,
>> +// const uint8_t *src2, ptrdiff_t stride2,
>> +// ptrdiff_t width, ptrdiff_t height,
>> +// uint64_t *sum)
>> +.macro scene_sad_neon, depth=8
>> + // x0: src1
>> + // x1: stride1
>> + // x2: src2
>> + // x3: stride2
>> + // x4: width
>> + // x5: height
>> + // x6: sum
>> +
>> + // x7: step of width loop
>> + // x8: index of row
>> + // x9: width / x7 * x7
>> + // x10: sad
>> + // x11: index of column
>> + // w12: src1[x]
>> + // w13: src2[x]
>> +
>> + mov x8, xzr
>> + mov x10, xzr
>> +
>> +.if \depth == 8
>> + mov x7, #64
>> + and x9, x4, #0xFFFFFFFFFFFFFFC0
>> +.endif
>> +
>> +.if \depth == 16
>> + mov x7, #32
>> + and x9, x4, #0xFFFFFFFFFFFFFFE0
>> +.endif
>> +
>> +1: cmp x4, x7 // check width
>> + mov x11, xzr
>> + b.lt 3f
>> +
>> + mov v0.d[0], x10
>> +
>> + // vector loop
>> +2:
>> +.if \depth == 8
>> + add x14, x0, x11
>> + add x15, x2, x11
>> +.endif
>> +
>> +.if \depth == 16
>> + add x14, x0, x11, lsl #1
>> + add x15, x2, x11, lsl #1
>> +.endif
>> + ld1 {v16.4S, v17.4S, v18.4S, v19.4S}, [x14]
>> + ld1 {v20.4S, v21.4S, v22.4S, v23.4S}, [x15]
>> + add x11, x11, x7
>> + cmp x9, x11
>> +
>> +.if \depth == 8
>> + uabd v16.16B, v16.16B, v20.16B
>> + uabd v17.16B, v17.16B, v21.16B
>> + uabd v18.16B, v18.16B, v22.16B
>> + uabd v19.16B, v19.16B, v23.16B
>> + uaddlv h16, v16.16B
>> + uaddlv h17, v17.16B
>> + uaddlv h18, v18.16B
>> + uaddlv h19, v19.16B
>> +.endif
>> +
>> +.if \depth == 16
>> + uabd v16.8H, v16.8H, v20.8H
>> + uabd v17.8H, v17.8H, v21.8H
>> + uabd v18.8H, v18.8H, v22.8H
>> + uabd v19.8H, v19.8H, v23.8H
>> + uaddlv s16, v16.8H
>> + uaddlv s17, v17.8H
>> + uaddlv s18, v18.8H
>> + uaddlv s19, v19.8H
>> +.endif
>> +
>> + add d16, d16, d17
>> + add d18, d18, d19
>> + add d0, d0, d16
>> + add d0, d0, d18
>> +
>> + b.ne 2b
>> +
>> + cmp x9, x4
>> + fmov x10, d0
>> + b.eq 4f
>> +
>> + // scalar loop
>> +3:
>> +.if \depth == 8
>> + ldrb w12, [x0, x11]
>> + ldrb w13, [x2, x11]
>> +.endif
>> +
>> +.if \depth == 16
>> + ldrh w12, [x0, x11, lsl #1]
>> + ldrh w13, [x2, x11, lsl #1]
>> +.endif
>> + add x11, x11, #1
>> + subs w12, w12, w13
>> + cneg w12, w12, mi
>> + add x10, x10, x12
>> + cmp x11, x4
>> + b.ne 3b
>> +
>> + // next row
>> +4:
>> + add x8, x8, #1 // =1
>> + add x0, x0, x1
>> + cmp x8, x5
>> + add x2, x2, x3
>> + b.ne 1b
>> +
>> +5:
>> + str x10, [x6]
>> + ret
>> +.endm
>> +
>> +function ff_scene_sad_neon, export=1
>> + scene_sad_neon depth=8
>> +endfunc
>> +
>> +function ff_scene_sad16_neon, export=1
>> + scene_sad_neon depth=16
>> +endfunc
>> diff --git a/libavfilter/scene_sad.c b/libavfilter/scene_sad.c
>> index 73d3eacbfa..ee0c71f659 100644
>> --- a/libavfilter/scene_sad.c
>> +++ b/libavfilter/scene_sad.c
>> @@ -61,6 +61,8 @@ ff_scene_sad_fn ff_scene_sad_get_fn(int depth)
>> ff_scene_sad_fn sad = NULL;
>> if (ARCH_X86)
>> sad = ff_scene_sad_get_fn_x86(depth);
>> + if (ARCH_AARCH64)
>> + sad = ff_scene_sad_get_fn_aarch64(depth);
>> if (!sad) {
>> if (depth == 8)
>> sad = ff_scene_sad_c;
>> diff --git a/libavfilter/scene_sad.h b/libavfilter/scene_sad.h
>> index 173a051f2b..c868200dc4 100644
>> --- a/libavfilter/scene_sad.h
>> +++ b/libavfilter/scene_sad.h
>> @@ -37,6 +37,8 @@ void ff_scene_sad_c(SCENE_SAD_PARAMS);
>> void ff_scene_sad16_c(SCENE_SAD_PARAMS);
>> +ff_scene_sad_fn ff_scene_sad_get_fn_aarch64(int depth);
>> +
>> ff_scene_sad_fn ff_scene_sad_get_fn_x86(int depth);
>> ff_scene_sad_fn ff_scene_sad_get_fn(int depth);
>> --
>> 2.22.0
>>
>> _______________________________________________
>> ffmpeg-devel mailing list
>> ffmpeg-devel at ffmpeg.org <mailto:ffmpeg-devel at ffmpeg.org>
>> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>
>>
>> To unsubscribe, visit link above, or email
>> ffmpeg-devel-request at ffmpeg.org <mailto:ffmpeg-devel-request at ffmpeg.org> with subject "unsubscribe".
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org <mailto:ffmpeg-devel at ffmpeg.org>
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org <mailto:ffmpeg-devel-request at ffmpeg.org> with subject "unsubscribe".
More information about the ffmpeg-devel
mailing list