[FFmpeg-devel] [PATCH v2] sws/aarch64: add {nv12, nv21, yuv420p, yuv422p}_to_{argb, rgba, abgr, rgba}_neon

Clément Bœsch u at pkh.me
Tue Mar 1 17:55:31 CET 2016


On Tue, Mar 01, 2016 at 05:18:36PM +0100, Michael Niedermayer wrote:
> On Tue, Mar 01, 2016 at 11:11:36AM +0100, Clément Bœsch wrote:
> > On Mon, Feb 29, 2016 at 10:55:49AM +0100, Clément Bœsch wrote:
> > > From: Clément Bœsch <clement at stupeflix.com>
> > > 
> > > ---
> > > Changes since latest version:
> > > - remove unused 32-bit path
> > > - make 16-bit path more accurate by mirroring the MMX code (still not bitexact)
> > > - the code as originally trying to process 2 lines at a time to save chroma pre
> > >   mult computations and avoid re-reading the whole line; for some reason, this
> > >   actually made the code around twice slower, for twice the complexity.
> > >   dropping that complexity was a win-win.
> > > ---
> > >  libswscale/aarch64/Makefile           |   3 +
> > >  libswscale/aarch64/swscale_unscaled.c | 132 ++++++++++++++++++++++
> > >  libswscale/aarch64/yuv2rgb_neon.S     | 207 ++++++++++++++++++++++++++++++++++
> > >  libswscale/swscale_internal.h         |   1 +
> > >  libswscale/swscale_unscaled.c         |   2 +
> > >  5 files changed, 345 insertions(+)
> > >  create mode 100644 libswscale/aarch64/Makefile
> > >  create mode 100644 libswscale/aarch64/swscale_unscaled.c
> > >  create mode 100644 libswscale/aarch64/yuv2rgb_neon.S
> > > 
> > 
> > Random benchmark on Hikey (Cortex-A53):
> > 
> > ./ffmpeg -nostats -f lavfi -i testsrc2=s=uhd2160:d=1 -vf format=yuv420p,bench=start,format=rgba,bench=stop -f null -
> > 
> > (yuv420p to rgba in 3840x2160)
> > 
> > before:
> > [bench @ 0x2edfe1e0] t:0.181514 avg:0.181514 max:0.181514 min:0.181514
> > [bench @ 0x2edfe1e0] t:0.178870 avg:0.180192 max:0.181514 min:0.178870
> > [bench @ 0x2edfe1e0] t:0.164448 avg:0.174944 max:0.181514 min:0.164448
> > [bench @ 0x2edfe1e0] t:0.164801 avg:0.172408 max:0.181514 min:0.164448
> > [bench @ 0x2edfe1e0] t:0.164635 avg:0.170853 max:0.181514 min:0.164448
> > [bench @ 0x2edfe1e0] t:0.164756 avg:0.169837 max:0.181514 min:0.164448
> > [bench @ 0x2edfe1e0] t:0.164784 avg:0.169115 max:0.181514 min:0.164448
> > [bench @ 0x2edfe1e0] t:0.164413 avg:0.168527 max:0.181514 min:0.164413
> > [bench @ 0x2edfe1e0] t:0.164760 avg:0.168109 max:0.181514 min:0.164413
> > [bench @ 0x2edfe1e0] t:0.164647 avg:0.167762 max:0.181514 min:0.164413
> > [bench @ 0x2edfe1e0] t:0.164698 avg:0.167484 max:0.181514 min:0.164413
> > [bench @ 0x2edfe1e0] t:0.164600 avg:0.167243 max:0.181514 min:0.164413
> > [bench @ 0x2edfe1e0] t:0.164498 avg:0.167032 max:0.181514 min:0.164413
> > [bench @ 0x2edfe1e0] t:0.164765 avg:0.166870 max:0.181514 min:0.164413
> > [bench @ 0x2edfe1e0] t:0.164613 avg:0.166720 max:0.181514 min:0.164413
> > [bench @ 0x2edfe1e0] t:0.164781 avg:0.166598 max:0.181514 min:0.164413
> > [bench @ 0x2edfe1e0] t:0.164489 avg:0.166474 max:0.181514 min:0.164413
> > [bench @ 0x2edfe1e0] t:0.164432 avg:0.166361 max:0.181514 min:0.164413
> > [bench @ 0x2edfe1e0] t:0.164540 avg:0.166265 max:0.181514 min:0.164413
> > [bench @ 0x2edfe1e0] t:0.164524 avg:0.166178 max:0.181514 min:0.164413
> > [bench @ 0x2edfe1e0] t:0.165147 avg:0.166129 max:0.181514 min:0.164413
> > [bench @ 0x2edfe1e0] t:0.165484 avg:0.166099 max:0.181514 min:0.164413
> > [bench @ 0x2edfe1e0] t:0.165703 avg:0.166082 max:0.181514 min:0.164413
> > [bench @ 0x2edfe1e0] t:0.165643 avg:0.166064 max:0.181514 min:0.164413
> > [bench @ 0x2edfe1e0] t:0.165294 avg:0.166033 max:0.181514 min:0.164413
> > 
> > after:
> > [bench @ 0x16d871e0] t:0.042296 avg:0.042296 max:0.042296 min:0.042296
> > [bench @ 0x16d871e0] t:0.041986 avg:0.042141 max:0.042296 min:0.041986
> > [bench @ 0x16d871e0] t:0.027298 avg:0.037193 max:0.042296 min:0.027298
> > [bench @ 0x16d871e0] t:0.027388 avg:0.034742 max:0.042296 min:0.027298
> > [bench @ 0x16d871e0] t:0.027383 avg:0.033270 max:0.042296 min:0.027298
> > [bench @ 0x16d871e0] t:0.027366 avg:0.032286 max:0.042296 min:0.027298
> > [bench @ 0x16d871e0] t:0.027225 avg:0.031563 max:0.042296 min:0.027225
> > [bench @ 0x16d871e0] t:0.027685 avg:0.031078 max:0.042296 min:0.027225
> > [bench @ 0x16d871e0] t:0.027246 avg:0.030652 max:0.042296 min:0.027225
> > [bench @ 0x16d871e0] t:0.027363 avg:0.030323 max:0.042296 min:0.027225
> > [bench @ 0x16d871e0] t:0.027449 avg:0.030062 max:0.042296 min:0.027225
> > [bench @ 0x16d871e0] t:0.027582 avg:0.029855 max:0.042296 min:0.027225
> > [bench @ 0x16d871e0] t:0.027374 avg:0.029664 max:0.042296 min:0.027225
> > [bench @ 0x16d871e0] t:0.027429 avg:0.029505 max:0.042296 min:0.027225
> > [bench @ 0x16d871e0] t:0.027275 avg:0.029356 max:0.042296 min:0.027225
> > [bench @ 0x16d871e0] t:0.027573 avg:0.029244 max:0.042296 min:0.027225
> > [bench @ 0x16d871e0] t:0.027219 avg:0.029125 max:0.042296 min:0.027219
> > [bench @ 0x16d871e0] t:0.027392 avg:0.029029 max:0.042296 min:0.027219
> > [bench @ 0x16d871e0] t:0.027720 avg:0.028960 max:0.042296 min:0.027219
> > [bench @ 0x16d871e0] t:0.027449 avg:0.028884 max:0.042296 min:0.027219
> > [bench @ 0x16d871e0] t:0.027473 avg:0.028817 max:0.042296 min:0.027219
> > [bench @ 0x16d871e0] t:0.027444 avg:0.028755 max:0.042296 min:0.027219
> > [bench @ 0x16d871e0] t:0.027535 avg:0.028702 max:0.042296 min:0.027219
> > [bench @ 0x16d871e0] t:0.027607 avg:0.028656 max:0.042296 min:0.027219
> > [bench @ 0x16d871e0] t:0.027476 avg:0.028609 max:0.042296 min:0.027219
> 
> LGTM
> 
> thx
> 

Applied, thanks

[...]

-- 
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20160301/3d096e76/attachment.sig>


More information about the ffmpeg-devel mailing list