[FFmpeg-devel] [PATCH] PPC64: Add versions of functions in libswscale/input.c optimized for POWER8 VSX SIMD.

Tue Jul 5 07:39:48 EEST 2016

On Mon, 2016-07-04 at 23:31 -0500, Dan Parrot wrote:
> On Mon, 2016-07-04 at 09:20 +0000, Carl Eugen Hoyos wrote:
> > Dan Parrot <dan.parrot <at> mail.com> writes:
> > 
> > > The dataset used was the entire FATE regression suite.
> > 
> > I don't think this is a particularly useful testcase:
> > It takes very long but mostly tests other things.
> > 
> > Did you test if using ffmpeg -benchmark -f rawvideo -i /dev/zero... 
> > showed different results?
> > I believe this should be both easier and faster to test.
> > 
> > > name: rgb24ToY_c_vsx. 
> > > no. of calls: 9999. min: 3832 ns. avg: 4709 ns. max: 37550 ns. 
> > > total: 47093533 ns. 
> > > 
> > > name: rgb24ToY_c. 
> > > no. of calls: 9999. min: 3809 ns. avg: 4707 ns. max: 29041 ns. 
> > > total: 47072923 ns.
> > 
> > Without any data, I would have thought that this is the most 
> > important function (and "no. of calls" seems to confirm this).
> > 
> > Why is this not faster?

I believe I have answered, in earlier posts, all the questions you
raised. Finally, just to satisfy my curiosity, I used SystemTap to probe
during a run of the entire FATE regression. Here are the same two
functions, this time with GCC 6.1.1 instead of 5.3.1 (it is
representative of all other functions)

name: rgb24ToY_c_vsx. 
no. of calls: 9999. min: 3053 ns. avg: 3298 ns. max: 69359 ns. total:
32983050 ns.

name: rgb24ToY_c. 
no. of calls: 9999. min: 3040 ns. avg: 4056 ns. max: 79159 ns. total:
40561568 ns.

Non-trivial improvement is seen for the SIMD code. So: would you accept
and apply the patch?