[FFmpeg-devel] [PATCH] PPC64: Add versions of functions in libswscale/input.c optimized for POWER8 VSX SIMD.

Mon Jul 4 21:55:18 EEST 2016

On Mon, Jul 4, 2016 at 5:20 PM, Dan Parrot <dan.parrot at mail.com> wrote:
>> Why is this not faster?
> Surprisingly, gcc is producing some badly suboptimal assembly. I need to
> follow up with IBM's Linux Technology Center. The major issue is that
> multiplication of vector quantities in C is generating as many
> multiplications in assembly as would scalar multiplication in a loop. No
> way that should be occurring.
>

This is the reason why we generally don't allow intrinsic
optimizations and instead ask people to write full assembly instead.
It behaves more consistently everywhere.

- Hendrik