[FFmpeg-devel] [PATCH v2 1/1] lavc/aarch64: add some neon pix_abs functions
Michael Niedermayer
michael at niedermayer.cc
Fri Apr 15 19:43:48 EEST 2022
On Thu, Apr 14, 2022 at 04:22:58PM +0000, Swinney, Jonathan wrote:
> - ff_pix_abs16_neon
> - ff_pix_abs16_xy2_neon
>
> In direct micro benchmarks of these ff functions verses their C implementations,
> these functions performed as follows on AWS Graviton 2:
>
> ff_pix_abs16_neon:
> c: benchmark ran 100000 iterations in 0.955383 seconds
> ff: benchmark ran 100000 iterations in 0.097669 seconds
>
> ff_pix_abs16_xy2_neon:
> c: benchmark ran 100000 iterations in 1.916759 seconds
> ff: benchmark ran 100000 iterations in 0.370729 seconds
>
> Signed-off-by: Jonathan Swinney <jswinney at amazon.com>
> ---
> libavcodec/aarch64/Makefile | 2 +
> libavcodec/aarch64/me_cmp_init_aarch64.c | 39 +++++
> libavcodec/aarch64/me_cmp_neon.S | 209 +++++++++++++++++++++++
> libavcodec/me_cmp.c | 2 +
> libavcodec/me_cmp.h | 1 +
> libavcodec/x86/me_cmp.asm | 7 +
> libavcodec/x86/me_cmp_init.c | 3 +
> tests/checkasm/Makefile | 2 +-
> tests/checkasm/checkasm.c | 1 +
> tests/checkasm/checkasm.h | 1 +
> tests/checkasm/motion.c | 155 +++++++++++++++++
> 11 files changed, 421 insertions(+), 1 deletion(-)
> create mode 100644 libavcodec/aarch64/me_cmp_init_aarch64.c
> create mode 100644 libavcodec/aarch64/me_cmp_neon.S
> create mode 100644 tests/checkasm/motion.c
>
[...]
> diff --git a/libavcodec/x86/me_cmp.asm b/libavcodec/x86/me_cmp.asm
> index ad06d485ab..f73b9f9161 100644
> --- a/libavcodec/x86/me_cmp.asm
> +++ b/libavcodec/x86/me_cmp.asm
> @@ -255,6 +255,7 @@ hadamard8x8_diff %+ SUFFIX:
>
> HSUM m0, m1, eax
> and rax, 0xFFFF
> + emms
> ret
>
> hadamard8_16_wrapper 0, 14
> @@ -345,6 +346,7 @@ cglobal sse%1, 5,5,8, v, pix1, pix2, lsize, h
>
> HADDD m7, m1
> movd eax, m7 ; return value
> + emms
> RET
> %endmacro
on which arm chip did you test this ?
[...]
> diff --git a/libavcodec/x86/me_cmp_init.c b/libavcodec/x86/me_cmp_init.c
> index 9af911bb88..b330868a38 100644
> --- a/libavcodec/x86/me_cmp_init.c
> +++ b/libavcodec/x86/me_cmp_init.c
> @@ -186,6 +186,8 @@ static int vsad_intra16_mmx(MpegEncContext *v, uint8_t *pix, uint8_t *dummy,
> : "r" (stride), "m" (h)
> : "%ecx");
>
> + emms_c();
> +
> return tmp & 0xFFFF;
> }
> #undef SUM
> @@ -418,6 +420,7 @@ static inline int sum_mmx(void)
> "paddw %%mm0, %%mm6 \n\t"
> "movd %%mm6, %0 \n\t"
> : "=r" (ret));
> + emms_c();
> return ret & 0xFFFF;
> }
hmmm
Also before the patch
checkasm: all 6153 tests passed
after it
checkasm: all 3198 tests passed
thats on a x86-64
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Complexity theory is the science of finding the exact solution to an
approximation. Benchmarking OTOH is finding an approximation of the exact
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20220415/8254b96a/attachment.sig>
More information about the ffmpeg-devel
mailing list