[FFmpeg-devel] [PATCH] PPC64: Add versions of functions in libswscale/input.c optimized for POWER8 VSX SIMD.
dan.parrot at mail.com
Mon Jul 4 21:43:44 EEST 2016
> > Just to make sure I don't misunderstand:
> > Does this mean intrinsics are suboptimal to write assembly
> > code?
> Here's what I mean: All variables below are of type "vector int"
> 1. v0 = v2 * v3
> 2. v0 = v4 * v5 + v6 * v7 + v8 * v9
> The first statement produces 1 multiply, 1 multiply-sum and 1 addition
> instruction in assembly.
> The second produces 6 multiply, 6 multiply-sum, and 10 addition
> instructions in assembly! I expected 3, 3, 3 of each respective
> operations from (1) plus 2 additions.
The operations counts given above were obtained using gcc 5.3.1 on
Fedora 22. I just created a simple test with those same statements and
compiled using gcc 6.1.1 on Fedora 24. The assembly operation counts are
what I had expected initially and more reasonable.
So, I'm going to move my ffmpeg development onto the Fedora 24 cloud
image and see if the SIMD performance there is better than was on Fedora
22. The reason I'm moving to Fedora 24 instead of trying to upgrade gcc
on Fedora 22 is that I've learned to prefer standard pre-installed
images to the wrecks I've managed to create doing my own sysadmin on the
More information about the ffmpeg-devel