[FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

Alan Kelly alankelly at google.com
Thu Jan 14 10:28:28 EET 2021


Apologies for this: when I added mmx to the yasm file, I added a macro for
the stores selecting mova for mmx and movdqu for the others. if
cpuflag(mmx) evaluates to true for all architectures so I replaced it with
if notcpuflag(sse3).

The alignment in the checkasm test has been changed to 8 from 32 so that
the test catches problems with alignment.

On Thu, Jan 14, 2021 at 1:11 AM Michael Niedermayer <michael at niedermayer.cc>
wrote:

> On Mon, Jan 11, 2021 at 05:46:31PM +0100, Alan Kelly wrote:
> > ---
> >  Fixes a bug where if there is no offset and a tail which is not
> processed by the
> >  sse3/avx2 version the dither is modified
> >  Deletes mmx/mmxext yuv2yuvX version from swscale_template and adds it
> >  to yuv2yuvX.asm to reduce code duplication and so that it may be used
> >  to process the tail from the larger cardinal simd versions.
> >  src argument of yuv2yuvX_* is now srcOffset, so that tails and offsets
> >  are accounted for correctly.
> >  Changes input size in checkasm so that this corner case is tested.
> >
> >  libswscale/x86/Makefile           |   1 +
> >  libswscale/x86/swscale.c          | 130 ++++++++++++----------------
> >  libswscale/x86/swscale_template.c |  82 ------------------
> >  libswscale/x86/yuv2yuvX.asm       | 136 ++++++++++++++++++++++++++++++
> >  tests/checkasm/sw_scale.c         | 100 ++++++++++++++++++++++
> >  5 files changed, 291 insertions(+), 158 deletions(-)
> >  create mode 100644 libswscale/x86/yuv2yuvX.asm
>
> This seems to be crashing again unless i messed up testing
>
> (gdb) disassemble $rip-32,$rip+32
> Dump of assembler code from 0x555555572f02 to 0x555555572f42:
>    0x0000555555572f02 <ff_yuv2yuvX_avx2+162>:   int    $0x71
>    0x0000555555572f04 <ff_yuv2yuvX_avx2+164>:   out    %al,$0x3
>    0x0000555555572f06 <ff_yuv2yuvX_avx2+166>:   vpsraw $0x3,%ymm1,%ymm1
>    0x0000555555572f0b <ff_yuv2yuvX_avx2+171>:   vpackuswb %ymm4,%ymm3,%ymm3
>    0x0000555555572f0f <ff_yuv2yuvX_avx2+175>:   vpackuswb %ymm1,%ymm6,%ymm6
>    0x0000555555572f13 <ff_yuv2yuvX_avx2+179>:   mov    (%rdi),%rdx
>    0x0000555555572f16 <ff_yuv2yuvX_avx2+182>:   vpermq $0xd8,%ymm3,%ymm3
>    0x0000555555572f1c <ff_yuv2yuvX_avx2+188>:   vpermq $0xd8,%ymm6,%ymm6
> => 0x0000555555572f22 <ff_yuv2yuvX_avx2+194>:   vmovdqa %ymm3,(%rcx,%rax,1)
>    0x0000555555572f27 <ff_yuv2yuvX_avx2+199>:   vmovdqa
> %ymm6,0x20(%rcx,%rax,1)
>    0x0000555555572f2d <ff_yuv2yuvX_avx2+205>:   add    $0x40,%rax
>    0x0000555555572f31 <ff_yuv2yuvX_avx2+209>:   mov    %rdi,%rsi
>    0x0000555555572f34 <ff_yuv2yuvX_avx2+212>:   cmp    %r8,%rax
>    0x0000555555572f37 <ff_yuv2yuvX_avx2+215>:   jb     0x555555572eae
> <ff_yuv2yuvX_avx2+78>
>    0x0000555555572f3d <ff_yuv2yuvX_avx2+221>:   vzeroupper
>    0x0000555555572f40 <ff_yuv2yuvX_avx2+224>:   retq
>    0x0000555555572f41 <ff_yuv2yuvX_avx2+225>:   nopw   %cs:0x0(%rax,%rax,1)
>
> rax            0x0      0
> rbx            0x30     48
> rcx            0x55555583f470   93824995292272
> rdx            0x55555585e500   93824995419392
>
> #0  0x0000555555572f22 in ff_yuv2yuvX_avx2 ()
> #1  0x00005555555724ee in yuv2yuvX_avx2 ()
> #2  0x000055555556b4f6 in chr_planar_vscale ()
> #3  0x0000555555566d41 in swscale ()
> #4  0x0000555555568284 in sws_scale ()
>
>
>
> [...]
> --
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> What does censorship reveal? It reveals fear. -- Julian Assange
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".


More information about the ffmpeg-devel mailing list