[FFmpeg-devel] [PATCH v2] lavc/aarch64/fdct: add neon-optimized fdct for aarch64

Martin Storsjö martin at martin.st
Wed Apr 17 11:19:49 EEST 2024


On Wed, 17 Apr 2024, Ramiro Polla wrote:

> The code is imported from libjpeg-turbo-3.0.1. The neon registers used
> have been changed to avoid modifying v8-v15.
> ---
> libavcodec/aarch64/Makefile               |   2 +
> libavcodec/aarch64/fdct.h                 |  26 ++
> libavcodec/aarch64/fdctdsp_init_aarch64.c |  39 +++
> libavcodec/aarch64/fdctdsp_neon.S         | 368 ++++++++++++++++++++++
> libavcodec/avcodec.h                      |   1 +
> libavcodec/fdctdsp.c                      |   4 +-
> libavcodec/fdctdsp.h                      |   2 +
> libavcodec/options_table.h                |   1 +
> libavcodec/tests/aarch64/dct.c            |   2 +
> tests/checkasm/Makefile                   |   1 +
> tests/checkasm/checkasm.c                 |   3 +
> tests/checkasm/checkasm.h                 |   1 +
> tests/checkasm/fdctdsp.c                  |  68 ++++
> tests/fate/checkasm.mak                   |   1 +
> 14 files changed, 518 insertions(+), 1 deletion(-)
> create mode 100644 libavcodec/aarch64/fdct.h
> create mode 100644 libavcodec/aarch64/fdctdsp_init_aarch64.c
> create mode 100644 libavcodec/aarch64/fdctdsp_neon.S
> create mode 100644 tests/checkasm/fdctdsp.c

Overall LGTM, thanks!

You may wish to split adding the checkasm test to a separate patch, 
before adding the new implementation.

I was surprised by the header libavcodec/aarch64/fdct.h which seemed 
redundant on first glance, but I see that this is needed for the dct test 
executable in libavcodec/tests/aarch64/dct.c, so I guess this is 
reasonable. (In most other asm implementations, we just declare the 
functions at the start of the *_init.c files.)

// Martin



More information about the ffmpeg-devel mailing list