[FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

Alan Kelly alankelly at google.com
Wed Dec 9 12:16:56 EET 2020


This function is tested by fate-filter-fps-r. I have also added a checkasm
test and bench.

I have done a lot more testing and benching of this code and I am now happy
to activate the avx2 version because the performance is so good. On my
machine I get the following results for filter size 4 and 0 offset. For all
other sizes/offsets the results are similar:

yuv2yuvX_4_0_mmx:
1567.2 1563.1

yuv2yuvX_4_0_mmxext:
1560.7 1560.1

yuv2yuvX_4_0_sse3:
780.7 572.1 -26.7%

yuv2yuvX_4_0_avx2:
n/a 341.1 -56.3%

Interestingly I discovered that the non-temporal store movntdq results in a
very large variability in the test results, in many cases it significantly
increases the execution time. I have replaced these stores with aligned
stores which stabilised the runtimes. However, I am aware that
benchmarks often don't represent reality and these non-temporal stores were
probably used for a good reason. If you think it better to use NT stores, I
will replace them.


On Fri, Dec 4, 2020 at 2:00 PM Anton Khirnov <anton at khirnov.net> wrote:

> Quoting Alan Kelly (2020-11-19 09:41:56)
> > ---
> >  All of Henrik's suggestions have been implemented. Additionally,
> >  m3 and m6 are permuted in avx2 before storing to ensure bit by bit
> >  identical results in avx2.
> >  libswscale/x86/Makefile     |   1 +
> >  libswscale/x86/swscale.c    |  75 +++--------------------
> >  libswscale/x86/yuv2yuvX.asm | 118 ++++++++++++++++++++++++++++++++++++
> >  3 files changed, 129 insertions(+), 65 deletions(-)
> >  create mode 100644 libswscale/x86/yuv2yuvX.asm
>
> Is this function tested by FATE?
> I did some brief testing and apparently it gets called during
> fate-filter-shuffleplanes-dup-luma, but the results do not change even
> if I comment out the whole function.
>
> Also, it seems like you are adding an AVX2 version of the function, but
> I don't see it being used.
>
> --
> Anton Khirnov
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".


More information about the ffmpeg-devel mailing list