[FFmpeg-devel] [PATCH] PPC64: Add versions of functions in libswscale/input.c optimized for POWER8 VSX SIMD.
h.leppkes at gmail.com
Wed Jul 6 10:07:12 EEST 2016
On Wed, Jul 6, 2016 at 4:37 AM, Dan Parrot <dan.parrot at mail.com> wrote:
> Finish providing SIMD versions for POWER8 VSX of functions in libswscale/input.c That should allow trac ticket #5570 to be closed.
> The speedups obtained for the functions are:
> abgrToA_c 1.19
> bgr24ToUV_c 1.23
> bgr24ToUV_half_c 1.37
> bgr24ToY_c_vsx 1.43
> nv12ToUV_c 1.05
> nv21ToUV_c 1.06
> planar_rgb_to_uv 1.25
> planar_rgb_to_y 1.26
> rgb24ToUV_c 1.11
> rgb24ToUV_half_c 1.10
> rgb24ToY_c 0.92
> rgbaToA_c 0.88
> uyvyToUV_c 1.05
> uyvyToY_c 1.15
> yuy2ToUV_c 1.07
> yuy2ToY_c 1.17
> yvy2ToUV_c 1.05
SIMD implementations that in the best case improve the speed by 43%
(and in some cases is *slower*) seem barely worth it. One would expect
a proper SIMD implementation to offer 100% or higher increases, at
least thats the general expectation on x86 with SSE/AVX.
So the question here is - is thats VSX being bad, or the intrinsics
being bad? How would the speedup be in proper hand-written ASM? If
hand-written ASM can give us the usual 100-200% improvements we would
expect from SIMD, then this is what should generally be favored.
Also, one further thought:
>From the commit message, it sounds like you might only be doing this
for the bounty in #5570, do you plan to maintain these optimizations
in the future?
More information about the ffmpeg-devel