[FFmpeg-devel] [PATCH] swscale/aarch64/rgb2rgb: add deinterleaveBytes neon implementation

Ramiro Polla ramiro.polla at gmail.com
Sun Sep 1 13:51:48 EEST 2024


On Sat, Aug 31, 2024 at 10:40 PM Michael Niedermayer
<michael at niedermayer.cc> wrote:
> On Fri, Aug 30, 2024 at 08:56:55PM +0200, Ramiro Polla wrote:
> >                                       A55               A76
> > deinterleave_bytes_c:             70342.0           34497.5
> > deinterleave_bytes_neon:          21594.5 ( 3.26x)   5535.2 ( 6.23x)
> > deinterleave_bytes_aligned_c:     71340.8           34651.2
> > deinterleave_bytes_aligned_neon:   8616.8 ( 8.28x)   3996.2 ( 8.67x)
> > ---
> >  libswscale/aarch64/rgb2rgb.c      |  4 ++
> >  libswscale/aarch64/rgb2rgb_neon.S | 59 +++++++++++++++++++++++
> >  tests/checkasm/sw_rgb.c           | 77 +++++++++++++++++++++++++++++++
> >  3 files changed, 140 insertions(+)
>
> this breaks fate on x86-64
>
> Test checkasm-sw_rgb failed. Look at tests/data/fate/checkasm-sw_rgb.err for details.

The sse2/avx implementations of deinterleaveBytes use LOOP_NVXX_TO_UV,
which checks for alignment on src (and can read unaligned data) but
expects dst to be aligned. Should the unaligned versions of these
functions be modified to support writing to unaligned data?


More information about the ffmpeg-devel mailing list