[FFmpeg-devel] [aarch64] improve performance of ff_yuv2planeX_8_neon

Michael Niedermayer michael at niedermayer.cc
Sat Jan 4 22:01:29 EET 2020


On Sat, Jan 04, 2020 at 05:53:34PM +0100, Clément Bœsch wrote:
> On Tue, Dec 10, 2019 at 04:38:25PM -0600, Sebastian Pop wrote:
> > Hi,
> > 
> > This patch rewrites the innermost loop of ff_yuv2planeX_8_neon to avoid zips and
> > horizontal adds by using fused multiply adds. The patch also uses ld1r to load
> > one element and replicate it across all lanes of the vector. The patch also
> > improves the clipping code by removing the shift right instructions and
> > performing the shift with the shift-right narrow instructions.
> > 
> > I see 8% better performance on an m6g instance with neoverse-n1 CPUs:
> > $ ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf
> > bench=start,scale=1024x1024,bench=stop -f null -
> > before: t:0.014015 avg:0.014096 max:0.015018 min:0.013971
> > after:  t:0.012985 avg:0.013013 max:0.013996 min:0.012818
> > 
> > Tested with `make check` on aarch64-linux.
> > 
> > Please let me know how I can improve the patch.
> > 
> 
> Looks nice. I can't test currently but LGTM.

will apply

thx

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety -- Benjamin Franklin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20200104/cd1e2b9a/attachment.sig>


More information about the ffmpeg-devel mailing list