[FFmpeg-devel] [PATCH] PPC64: Add versions of functions in libswscale/input.c optimized for POWER8 VSX SIMD.

Mon Jul 4 22:01:09 EEST 2016

On Mon, 2016-07-04 at 20:55 +0200, Hendrik Leppkes wrote:
> On Mon, Jul 4, 2016 at 5:20 PM, Dan Parrot <dan.parrot at mail.com> wrote:
> >> Why is this not faster?
> > Surprisingly, gcc is producing some badly suboptimal assembly. I need to
> > follow up with IBM's Linux Technology Center. The major issue is that
> > multiplication of vector quantities in C is generating as many
> > multiplications in assembly as would scalar multiplication in a loop. No
> > way that should be occurring.
> >
> 
> This is the reason why we generally don't allow intrinsic
> optimizations and instead ask people to write full assembly instead.
> It behaves more consistently everywhere.

Is this then a requirement to abandon the use of intrinsics for PPC64
SIMD and instead re-implement in assembly?