[FFmpeg-devel] [PATCH] Add support for "omp simd" pragma.

Mon Jan 11 13:03:55 EET 2021

On Mon, Jan 11, 2021 at 1:26 AM Carl Eugen Hoyos <ceffmpeg at gmail.com> wrote:

> Am So., 10. Jan. 2021 um 19:55 Uhr schrieb Lynne <dev at lynne.ee>:
> >
> > Jan 10, 2021, 17:43 by Reimar.Doeffinger at gmx.de:
> >
> > > From: Reimar Döffinger <Reimar.Doeffinger at gmx.de>
> > >
> > > This requests loops to be vectorized using SIMD
> > > instructions.
> > > The performance increase is far from hand-optimized
> > > assembly but still significant over the plain C version.
> > > Typical values are a 2-4x speedup where a hand-written
> > > version would achieve 4x-10x.
> > > So it is far from a replacement, however some architures
> > > will get hand-written assembler quite late or not at all,
> > > and this is a good improvement for a trivial amount of work.
> > > The cause, besides the compiler being a compiler, is
> > > usually that it does not manage to use saturating instructions
> > > and thus has to use 32-bit operations where actually
> > > saturating 16-bit operations would be sufficient.
> > > Other causes are for example the av_clip functions that
> > > are not ideal for vectorization (and even as scalar code
> > > not optimal for any modern CPU that has either CSEL or
> > > MAX/MIN instructions).
> > > And of course this only works for relatively simple
> > > loops, the IDCT functions for example seemed not possible
> > > to optimize that way.
> > > Also note that while clang may accept the code and sometimes
> > > produces warnings, it does not seem to do anything actually
> > > useful at all.
> > > Here are example measurements using gcc 10 under Linux (in a VM
> unfortunately)
> > > on AArch64 on Apple M1:
> > > Commad:
> > > time ./ffplay_g LG\ 4K\ HDR\ Demo\ -\ New\ York.ts -t 10 -autoexit
> -threads 1 -noframedrop
> > >
> > > Original code:
> > > real    0m19.572s
> > > user    0m23.386s
> > > sys     0m0.213s
> > >
> > > Changing all put_hevc:
> > > real    0m15.648s
> > > user    0m19.503s (83.4% of original)
> > > sys     0m0.186s
> > >
> > > In addition changing add_residual:
> > > real    0m15.424s
> > > user    0m19.278s (82.4% of original)
> > > sys     0m0.133s
> > >
> > > In addition changing planar copy dither:
> > > real    0m15.040s
> > > user    0m18.874s (80.7% of original)
> > > sys     0m0.168s
> > >
> >
> > I think I have to disagree.
>
> > The performance gains are marginal
>
> This sounds wrong.
>

I disagree with Carl.

>
> Carl Eugen
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".