[FFmpeg-devel] [PATCH 07/10] avcodec/vc1: Arm 64-bit NEON inverse transform fast paths

Martin Storsjö martin at martin.st
Wed Mar 30 16:49:48 EEST 2022


On Fri, 25 Mar 2022, Ben Avison wrote:

> checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows.
>
> vc1dsp.vc1_inv_trans_4x4_c: 158.2
> vc1dsp.vc1_inv_trans_4x4_neon: 65.7
> vc1dsp.vc1_inv_trans_4x4_dc_c: 86.5
> vc1dsp.vc1_inv_trans_4x4_dc_neon: 26.5
> vc1dsp.vc1_inv_trans_4x8_c: 335.2
> vc1dsp.vc1_inv_trans_4x8_neon: 106.2
> vc1dsp.vc1_inv_trans_4x8_dc_c: 151.2
> vc1dsp.vc1_inv_trans_4x8_dc_neon: 25.5
> vc1dsp.vc1_inv_trans_8x4_c: 365.7
> vc1dsp.vc1_inv_trans_8x4_neon: 97.2
> vc1dsp.vc1_inv_trans_8x4_dc_c: 139.7
> vc1dsp.vc1_inv_trans_8x4_dc_neon: 16.5
> vc1dsp.vc1_inv_trans_8x8_c: 547.7
> vc1dsp.vc1_inv_trans_8x8_neon: 137.0
> vc1dsp.vc1_inv_trans_8x8_dc_c: 268.2
> vc1dsp.vc1_inv_trans_8x8_dc_neon: 30.5
>
> Signed-off-by: Ben Avison <bavison at riscosopen.org>
> ---
> libavcodec/aarch64/vc1dsp_init_aarch64.c |  19 +
> libavcodec/aarch64/vc1dsp_neon.S         | 678 +++++++++++++++++++++++
> 2 files changed, 697 insertions(+)

Looks generally reasonable. Is it possible to factorize out the individual 
transforms (so that you'd e.g. invoke the same macro twice in the 8x8 and 
4x4 functions) without too much loss? The downshift which differs between 
thw two could either be left outside of the macro, or the downshift amount 
could be made a macro parameter.

// Martin



More information about the ffmpeg-devel mailing list