[FFmpeg-devel] [PATCH] swscale/x86/output.asm: add x86-optimized planer gbr yuv2anyX functions

Michael Niedermayer michael at niedermayer.cc
Mon Oct 25 21:19:06 EEST 2021


On Sun, Oct 24, 2021 at 09:09:52PM -0700, mindmark at gmail.com wrote:
> From: Mark Reid <mindmark at gmail.com>
> 
> yuv2gbrp_full_X_4_512_c: 12096.6
> yuv2gbrp_full_X_4_512_sse2: 10782.6
> yuv2gbrp_full_X_4_512_sse4: 5143.6
> yuv2gbrp_full_X_4_512_avx2: 3000.1
> yuv2gbrap_full_X_4_512_c: 15463.1
> yuv2gbrap_full_X_4_512_sse2: 14296.6
> yuv2gbrap_full_X_4_512_sse4: 6319.1
> yuv2gbrap_full_X_4_512_avx2: 3554.1
> yuv2gbrp9be_full_X_4_512_c: 14281.6
> yuv2gbrp9be_full_X_4_512_sse2: 11206.1
> yuv2gbrp9be_full_X_4_512_sse4: 5033.6
> yuv2gbrp9be_full_X_4_512_avx2: 3012.6
> yuv2gbrp9le_full_X_4_512_c: 12688.6
> yuv2gbrp9le_full_X_4_512_sse2: 10914.1
> yuv2gbrp9le_full_X_4_512_sse4: 5144.6
> yuv2gbrp9le_full_X_4_512_avx2: 3014.6
> yuv2gbrp10be_full_X_4_512_c: 14257.6
> yuv2gbrp10be_full_X_4_512_sse2: 11089.6
> yuv2gbrp10be_full_X_4_512_sse4: 5039.1
> yuv2gbrp10be_full_X_4_512_avx2: 3001.1
> yuv2gbrp10le_full_X_4_512_c: 12098.6
> yuv2gbrp10le_full_X_4_512_sse2: 10884.1
> yuv2gbrp10le_full_X_4_512_sse4: 5138.1
> yuv2gbrp10le_full_X_4_512_avx2: 2999.6
> yuv2gbrap10be_full_X_4_512_c: 18549.6
> yuv2gbrap10be_full_X_4_512_sse2: 14538.6
> yuv2gbrap10be_full_X_4_512_sse4: 6292.6
> yuv2gbrap10be_full_X_4_512_avx2: 3583.6
> yuv2gbrap10le_full_X_4_512_c: 16631.1
> yuv2gbrap10le_full_X_4_512_sse2: 14190.6
> yuv2gbrap10le_full_X_4_512_sse4: 6348.1
> yuv2gbrap10le_full_X_4_512_avx2: 3554.6
> yuv2gbrp12be_full_X_4_512_c: 13555.1
> yuv2gbrp12be_full_X_4_512_sse2: 10952.1
> yuv2gbrp12be_full_X_4_512_sse4: 5137.6
> yuv2gbrp12be_full_X_4_512_avx2: 3009.6
> yuv2gbrp12le_full_X_4_512_c: 12082.6
> yuv2gbrp12le_full_X_4_512_sse2: 10891.1
> yuv2gbrp12le_full_X_4_512_sse4: 5184.1
> yuv2gbrp12le_full_X_4_512_avx2: 3011.1
> yuv2gbrap12be_full_X_4_512_c: 18689.6
> yuv2gbrap12be_full_X_4_512_sse2: 14522.6
> yuv2gbrap12be_full_X_4_512_sse4: 6237.6
> yuv2gbrap12be_full_X_4_512_avx2: 3585.6
> yuv2gbrap12le_full_X_4_512_c: 16760.6
> yuv2gbrap12le_full_X_4_512_sse2: 14202.1
> yuv2gbrap12le_full_X_4_512_sse4: 6252.1
> yuv2gbrap12le_full_X_4_512_avx2: 3591.1
> yuv2gbrp14be_full_X_4_512_c: 13555.6
> yuv2gbrp14be_full_X_4_512_sse2: 10949.1
> yuv2gbrp14be_full_X_4_512_sse4: 5185.1
> yuv2gbrp14be_full_X_4_512_avx2: 3012.1
> yuv2gbrp14le_full_X_4_512_c: 12068.1
> yuv2gbrp14le_full_X_4_512_sse2: 10883.6
> yuv2gbrp14le_full_X_4_512_sse4: 5145.1
> yuv2gbrp14le_full_X_4_512_avx2: 3007.1
> yuv2gbrp16be_full_X_4_512_c: 12383.6
> yuv2gbrp16be_full_X_4_512_sse2: 8230.6
> yuv2gbrp16be_full_X_4_512_sse4: 4765.6
> yuv2gbrp16be_full_X_4_512_avx2: 2742.6
> yuv2gbrp16le_full_X_4_512_c: 10906.1
> yuv2gbrp16le_full_X_4_512_sse2: 28732.1
> yuv2gbrp16le_full_X_4_512_sse4: 4709.6
> yuv2gbrp16le_full_X_4_512_avx2: 2753.1
> yuv2gbrap16be_full_X_4_512_c: 15472.6
> yuv2gbrap16be_full_X_4_512_sse2: 11021.6
> yuv2gbrap16be_full_X_4_512_sse4: 5487.6
> yuv2gbrap16be_full_X_4_512_avx2: 3143.6
> yuv2gbrap16le_full_X_4_512_c: 13668.6
> yuv2gbrap16le_full_X_4_512_sse2: 10562.1
> yuv2gbrap16le_full_X_4_512_sse4: 5506.6
> yuv2gbrap16le_full_X_4_512_avx2: 3149.6
> yuv2gbrpf32be_full_X_4_512_c: 15471.1
> yuv2gbrpf32be_full_X_4_512_sse2: 8524.6
> yuv2gbrpf32be_full_X_4_512_sse4: 4559.1
> yuv2gbrpf32be_full_X_4_512_avx2: 2388.1
> yuv2gbrpf32le_full_X_4_512_c: 14247.6
> yuv2gbrpf32le_full_X_4_512_sse2: 7600.6
> yuv2gbrpf32le_full_X_4_512_sse4: 4385.6
> yuv2gbrpf32le_full_X_4_512_avx2: 2258.6
> yuv2gbrapf32be_full_X_4_512_c: 18412.1
> yuv2gbrapf32be_full_X_4_512_sse2: 11353.6
> yuv2gbrapf32be_full_X_4_512_sse4: 5807.1
> yuv2gbrapf32be_full_X_4_512_avx2: 2928.1
> yuv2gbrapf32le_full_X_4_512_c: 16485.1
> yuv2gbrapf32le_full_X_4_512_sse2: 10202.1
> yuv2gbrapf32le_full_X_4_512_sse4: 5571.6
> yuv2gbrapf32le_full_X_4_512_avx2: 2847.6
> 
> 
> ---
>  libswscale/x86/output.asm | 440 +++++++++++++++++++++++++++++++++++++-
>  libswscale/x86/swscale.c  |  99 +++++++++
>  tests/checkasm/Makefile   |   2 +-
>  tests/checkasm/checkasm.c |   1 +
>  tests/checkasm/checkasm.h |   1 +
>  tests/checkasm/sw_gbrp.c  | 198 +++++++++++++++++
>  tests/fate/checkasm.mak   |   1 +
>  7 files changed, 740 insertions(+), 2 deletions(-)
>  create mode 100644 tests/checkasm/sw_gbrp.c

seems to work
asm review left to people who worked with asm more recently than me

also if you or anyone wants a random idea for swscale improvments
we are missing a direct yuv->yuv converter converting between different
yuv colorspaces, atm these are handled with rgb intermediate

thx

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

If you fake or manipulate statistics in a paper in physics you will never
get a job again.
If you fake or manipulate statistics in a paper in medicin you will get
a job for life at the pharma industry.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20211025/d98ae8db/attachment.sig>


More information about the ffmpeg-devel mailing list