[FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

Alan Kelly alankelly at google.com
Thu Jan 7 11:39:56 EET 2021


Thanks for your patience with this, I have replaced mova with movdqu - movu
generated a compile error on ssse3. What system did this crash on?

On Wed, Jan 6, 2021 at 9:10 PM Michael Niedermayer <michael at niedermayer.cc>
wrote:

> On Tue, Jan 05, 2021 at 01:31:25PM +0100, Alan Kelly wrote:
> > Ping!
>
> crashes (due to alignment i think)
>
> (gdb) disassemble $rip-32,$rip+32
> Dump of assembler code from 0x5555555730a1 to 0x5555555730e1:
>    0x00005555555730a1 <ff_yuv2yuvX_avx2+161>:   int    $0x71
>    0x00005555555730a3 <ff_yuv2yuvX_avx2+163>:   out    %al,$0x3
>    0x00005555555730a5 <ff_yuv2yuvX_avx2+165>:   vpsraw $0x3,%ymm1,%ymm1
>    0x00005555555730aa <ff_yuv2yuvX_avx2+170>:   vpackuswb %ymm4,%ymm3,%ymm3
>    0x00005555555730ae <ff_yuv2yuvX_avx2+174>:   vpackuswb %ymm1,%ymm6,%ymm6
>    0x00005555555730b2 <ff_yuv2yuvX_avx2+178>:   mov    (%rdi),%rdx
>    0x00005555555730b5 <ff_yuv2yuvX_avx2+181>:   vpermq $0xd8,%ymm3,%ymm3
>    0x00005555555730bb <ff_yuv2yuvX_avx2+187>:   vpermq $0xd8,%ymm6,%ymm6
> => 0x00005555555730c1 <ff_yuv2yuvX_avx2+193>:   vmovdqa %ymm3,(%rcx,%rax,1)
>    0x00005555555730c6 <ff_yuv2yuvX_avx2+198>:   vmovdqa
> %ymm6,0x20(%rcx,%rax,1)
>    0x00005555555730cc <ff_yuv2yuvX_avx2+204>:   add    $0x40,%rax
>    0x00005555555730d0 <ff_yuv2yuvX_avx2+208>:   mov    %rdi,%rsi
>    0x00005555555730d3 <ff_yuv2yuvX_avx2+211>:   cmp    %r8,%rax
>    0x00005555555730d6 <ff_yuv2yuvX_avx2+214>:   jb     0x55555557304d
> <ff_yuv2yuvX_avx2+77>
>    0x00005555555730dc <ff_yuv2yuvX_avx2+220>:   vzeroupper
>    0x00005555555730df <ff_yuv2yuvX_avx2+223>:   retq
>    0x00005555555730e0 <yuv2rgb_c_48+0>: push   %r15
> End of assembler dump.
> (gdb) info all-registers
> rax            0x0      0
> rbx            0x0      0
> rcx            0x55555583f470   93824995292272
>
>
> [...]
> --
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> Modern terrorism, a quick summary: Need oil, start war with country that
> has oil, kill hundread thousand in war. Let country fall into chaos,
> be surprised about raise of fundamantalists. Drop more bombs, kill more
> people, be surprised about them taking revenge and drop even more bombs
> and strip your own citizens of their rights and freedoms. to be continued
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".


More information about the ffmpeg-devel mailing list