[FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

Michael Niedermayer michael at niedermayer.cc
Thu Jan 14 02:11:44 EET 2021


On Mon, Jan 11, 2021 at 05:46:31PM +0100, Alan Kelly wrote:
> ---
>  Fixes a bug where if there is no offset and a tail which is not processed by the
>  sse3/avx2 version the dither is modified
>  Deletes mmx/mmxext yuv2yuvX version from swscale_template and adds it
>  to yuv2yuvX.asm to reduce code duplication and so that it may be used
>  to process the tail from the larger cardinal simd versions.
>  src argument of yuv2yuvX_* is now srcOffset, so that tails and offsets
>  are accounted for correctly.
>  Changes input size in checkasm so that this corner case is tested.
> 
>  libswscale/x86/Makefile           |   1 +
>  libswscale/x86/swscale.c          | 130 ++++++++++++----------------
>  libswscale/x86/swscale_template.c |  82 ------------------
>  libswscale/x86/yuv2yuvX.asm       | 136 ++++++++++++++++++++++++++++++
>  tests/checkasm/sw_scale.c         | 100 ++++++++++++++++++++++
>  5 files changed, 291 insertions(+), 158 deletions(-)
>  create mode 100644 libswscale/x86/yuv2yuvX.asm

This seems to be crashing again unless i messed up testing 

(gdb) disassemble $rip-32,$rip+32
Dump of assembler code from 0x555555572f02 to 0x555555572f42:
   0x0000555555572f02 <ff_yuv2yuvX_avx2+162>:	int    $0x71
   0x0000555555572f04 <ff_yuv2yuvX_avx2+164>:	out    %al,$0x3
   0x0000555555572f06 <ff_yuv2yuvX_avx2+166>:	vpsraw $0x3,%ymm1,%ymm1
   0x0000555555572f0b <ff_yuv2yuvX_avx2+171>:	vpackuswb %ymm4,%ymm3,%ymm3
   0x0000555555572f0f <ff_yuv2yuvX_avx2+175>:	vpackuswb %ymm1,%ymm6,%ymm6
   0x0000555555572f13 <ff_yuv2yuvX_avx2+179>:	mov    (%rdi),%rdx
   0x0000555555572f16 <ff_yuv2yuvX_avx2+182>:	vpermq $0xd8,%ymm3,%ymm3
   0x0000555555572f1c <ff_yuv2yuvX_avx2+188>:	vpermq $0xd8,%ymm6,%ymm6
=> 0x0000555555572f22 <ff_yuv2yuvX_avx2+194>:	vmovdqa %ymm3,(%rcx,%rax,1)
   0x0000555555572f27 <ff_yuv2yuvX_avx2+199>:	vmovdqa %ymm6,0x20(%rcx,%rax,1)
   0x0000555555572f2d <ff_yuv2yuvX_avx2+205>:	add    $0x40,%rax
   0x0000555555572f31 <ff_yuv2yuvX_avx2+209>:	mov    %rdi,%rsi
   0x0000555555572f34 <ff_yuv2yuvX_avx2+212>:	cmp    %r8,%rax
   0x0000555555572f37 <ff_yuv2yuvX_avx2+215>:	jb     0x555555572eae <ff_yuv2yuvX_avx2+78>
   0x0000555555572f3d <ff_yuv2yuvX_avx2+221>:	vzeroupper 
   0x0000555555572f40 <ff_yuv2yuvX_avx2+224>:	retq   
   0x0000555555572f41 <ff_yuv2yuvX_avx2+225>:	nopw   %cs:0x0(%rax,%rax,1)
   
rax            0x0	0
rbx            0x30	48
rcx            0x55555583f470	93824995292272
rdx            0x55555585e500	93824995419392

#0  0x0000555555572f22 in ff_yuv2yuvX_avx2 ()
#1  0x00005555555724ee in yuv2yuvX_avx2 ()
#2  0x000055555556b4f6 in chr_planar_vscale ()
#3  0x0000555555566d41 in swscale ()
#4  0x0000555555568284 in sws_scale ()



[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

What does censorship reveal? It reveals fear. -- Julian Assange
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20210114/f2684d65/attachment.sig>


More information about the ffmpeg-devel mailing list