[FFmpeg-devel] [PATCH] PPC64: Add versions of functions in libswscale/input.c optimized for POWER8 VSX SIMD.
dan.parrot at mail.com
Tue Jul 5 07:39:48 EEST 2016
On Mon, 2016-07-04 at 23:31 -0500, Dan Parrot wrote:
> On Mon, 2016-07-04 at 09:20 +0000, Carl Eugen Hoyos wrote:
> > Dan Parrot <dan.parrot <at> mail.com> writes:
> > > The dataset used was the entire FATE regression suite.
> > I don't think this is a particularly useful testcase:
> > It takes very long but mostly tests other things.
> > Did you test if using ffmpeg -benchmark -f rawvideo -i /dev/zero...
> > showed different results?
> > I believe this should be both easier and faster to test.
> > > name: rgb24ToY_c_vsx.
> > > no. of calls: 9999. min: 3832 ns. avg: 4709 ns. max: 37550 ns.
> > > total: 47093533 ns.
> > >
> > > name: rgb24ToY_c.
> > > no. of calls: 9999. min: 3809 ns. avg: 4707 ns. max: 29041 ns.
> > > total: 47072923 ns.
> > Without any data, I would have thought that this is the most
> > important function (and "no. of calls" seems to confirm this).
> > Why is this not faster?
I believe I have answered, in earlier posts, all the questions you
raised. Finally, just to satisfy my curiosity, I used SystemTap to probe
during a run of the entire FATE regression. Here are the same two
functions, this time with GCC 6.1.1 instead of 5.3.1 (it is
representative of all other functions)
no. of calls: 9999. min: 3053 ns. avg: 3298 ns. max: 69359 ns. total:
no. of calls: 9999. min: 3040 ns. avg: 4056 ns. max: 79159 ns. total:
Non-trivial improvement is seen for the SIMD code. So: would you accept
and apply the patch?
More information about the ffmpeg-devel