[FFmpeg-devel] [PATCH] PPC64: Add versions of functions in libswscale/input.c optimized for POWER8 VSX SIMD.

Dan Parrot dan.parrot at mail.com
Mon Jul 4 11:22:25 EEST 2016


On Mon, 2016-07-04 at 06:22 +0000, Carl Eugen Hoyos wrote:
> Dan Parrot <dan.parrot <at> mail.com> writes:
> 
> > Finish providing SIMD versions for POWER8 VSX of functions 
> > in libswscale/input.c
> > That should allow trac ticket #5570 to be closed.
> 
> Please add some numbers:
> Either for single functions or for a single ffmpeg command.
> (for rgb/bgr, mono is irrelevant)
> 
> Carl Eugen
> 
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

The data below show the running times, for each of the functions,
obtained using SystemTap. The dataset used was the entire FATE
regression suite. Only the first 9999 calls are used in obtaining the
data (for functions called more often). SIMD functions have suffix
"_vsx". The unit of time used is nanosecond.

---------------------------------------------------------------------------------------------------------------
name: abgrToA_c_vsx. 
no. of calls: 1408. min: 2772 ns. avg: 3106 ns. max: 44282 ns. total:
4373993 ns. 

name: abgrToA_c. 
no. of calls: 1408. min: 3088 ns. avg: 3385 ns. max: 24698 ns. total: 4766911 ns.
---------------------------------------------------------------------------------------------------------------
name: bgr24ToUV_c_vsx. 
no. of calls: 288. min: 5213 ns. avg: 5452 ns. max: 26635 ns. total:
1570338 ns. 

name: bgr24ToUV_c. 
no. of calls: 288. min: 5351 ns. avg: 5636 ns. max: 27284 ns. total: 1623277 ns.
---------------------------------------------------------------------------------------------------------------
name: bgr24ToUV_half_c_vsx. 
no. of calls: 9999. min: 4792 ns. avg: 4941 ns. max: 34340 ns. total:
49411622 ns. 

name: bgr24ToUV_half_c. 
no. of calls: 9999. min: 4795 ns. avg: 6012 ns. max: 66135 ns. total: 60122454 ns.
---------------------------------------------------------------------------------------------------------------
name: bgr24ToY_c_vsx. 
no. of calls: 9999. min: 4475 ns. avg: 4654 ns. max: 28739 ns. total:
46539077 ns. 

name: bgr24ToY_c. 
no. of calls: 9999. min: 4551 ns. avg: 5974 ns. max: 218357 ns. total: 59741865 ns.
---------------------------------------------------------------------------------------------------------------
name: monoblack2Y_c_vsx. 
no. of calls: 288. min: 2902 ns. avg: 3102 ns. max: 25454 ns. total:
893490 ns. 

name: monoblack2Y_c. 
no. of calls: 288. min: 3011 ns. avg: 3203 ns. max: 26008 ns. total: 922515 ns.
---------------------------------------------------------------------------------------------------------------
name: monowhite2Y_c_vsx. 
no. of calls: 9999. min: 2813 ns. avg: 3025 ns. max: 81510 ns. total:
30248113 ns. 

name: monowhite2Y_c. 
no. of calls: 9999. min: 2692 ns. avg: 2891 ns. max: 43653 ns. total: 28911676 ns.
---------------------------------------------------------------------------------------------------------------
name: nv12ToUV_c_vsx. 
no. of calls: 144. min: 2709 ns. avg: 2960 ns. max: 26249 ns. total:
426364 ns. 

name: nv12ToUV_c. 
no. of calls: 144. min: 2930 ns. avg: 3169 ns. max: 24483 ns. total: 456353 ns.
---------------------------------------------------------------------------------------------------------------
name: nv21ToUV_c_vsx. 
no. of calls: 144. min: 2707 ns. avg: 3001 ns. max: 26050 ns. total:
432150 ns. 

name: nv21ToUV_c. 
no. of calls: 144. min: 2887 ns. avg: 3141 ns. max: 24704 ns. total: 452426 ns.
---------------------------------------------------------------------------------------------------------------
name: planar_rgb_to_a_vsx. 
no. of calls: 288. min: 2977 ns. avg: 3223 ns. max: 24993 ns. total:
928305 ns. 

name: planar_rgb_to_a. 
no. of calls: 288. min: 3306 ns. avg: 3538 ns. max: 24350 ns. total: 1019154 ns.
---------------------------------------------------------------------------------------------------------------
name: planar_rgb_to_uv_vsx. 
no. of calls: 576. min: 5092 ns. avg: 5295 ns. max: 27170 ns. total:
3050431 ns. 

name: planar_rgb_to_uv. 
no. of calls: 576. min: 5605 ns. avg: 5864 ns. max: 26177 ns. total: 3377983 ns.
---------------------------------------------------------------------------------------------------------------
name: planar_rgb_to_y_vsx. 
no. of calls: 576. min: 4459 ns. avg: 4666 ns. max: 27760 ns. total:
2688039 ns. 

name: planar_rgb_to_y. 
no. of calls: 576. min: 4877 ns. avg: 5149 ns. max: 27879 ns. total: 2965982 ns.
---------------------------------------------------------------------------------------------------------------
name: rgb24ToUV_c_vsx. 
no. of calls: 688. min: 4090 ns. avg: 4791 ns. max: 25602 ns. total:
3296223 ns. 

name: rgb24ToUV_c. 
no. of calls: 688. min: 4077 ns. avg: 4891 ns. max: 26629 ns. total: 3365385 ns.
---------------------------------------------------------------------------------------------------------------
name: rgb24ToUV_half_c_vsx. 
no. of calls: 9999. min: 4062 ns. avg: 5074 ns. max: 975914 ns. total:
50738567 ns. 

name: rgb24ToUV_half_c. 
no. of calls: 9999. min: 4003 ns. avg: 4961 ns. max: 36193 ns. total: 49613559 ns.
---------------------------------------------------------------------------------------------------------------
name: rgb24ToY_c_vsx. 
no. of calls: 9999. min: 3832 ns. avg: 4709 ns. max: 37550 ns. total:
47093533 ns. 

name: rgb24ToY_c. 
no. of calls: 9999. min: 3809 ns. avg: 4707 ns. max: 29041 ns. total: 47072923 ns.
---------------------------------------------------------------------------------------------------------------
name: rgbaToA_c_vsx. 
no. of calls: 9999. min: 2786 ns. avg: 3922 ns. max: 43927 ns. total:
39220318 ns. 

name: rgbaToA_c. 
no. of calls: 9999. min: 3215 ns. avg: 4765 ns. max: 85278 ns. total: 47649759 ns.
---------------------------------------------------------------------------------------------------------------
name: uyvyToUV_c_vsx. 
no. of calls: 288. min: 2717 ns. avg: 2972 ns. max: 26487 ns. total:
856120 ns. 

name: uyvyToUV_c. 
no. of calls: 288. min: 2922 ns. avg: 3106 ns. max: 23510 ns. total: 894647 ns.
---------------------------------------------------------------------------------------------------------------
name: uyvyToY_c_vsx. 
no. of calls: 576. min: 2694 ns. avg: 2964 ns. max: 26149 ns. total:
1707300 ns. 

name: uyvyToY_c. 
no. of calls: 576. min: 3147 ns. avg: 3322 ns. max: 25753 ns. total: 1913776 ns.
---------------------------------------------------------------------------------------------------------------
name: yuy2ToUV_c_vsx. 
no. of calls: 288. min: 2735 ns. avg: 2931 ns. max: 25100 ns. total:
844318 ns. 

name: yuy2ToUV_c. 
no. of calls: 288. min: 2883 ns. avg: 37953 ns. max: 10053265 ns. total: 10930731 ns.
---------------------------------------------------------------------------------------------------------------
name: yuy2ToY_c_vsx. 
no. of calls: 864. min: 2773 ns. avg: 3011 ns. max: 26182 ns. total:
2601765 ns. 

name: yuy2ToY_c. 
no. of calls: 864. min: 3166 ns. avg: 3349 ns. max: 25229 ns. total: 2893687 ns.
---------------------------------------------------------------------------------------------------------------
name: yvy2ToUV_c_vsx. 
no. of calls: 288. min: 2717 ns. avg: 3035 ns. max: 39961 ns. total:
874154 ns. 

ame: yvy2ToUV_c. 
no. of calls: 288. min: 2911 ns. avg: 3066 ns. max: 23858 ns. total:
883254 ns.
---------------------------------------------------------------------------------------------------------------



More information about the ffmpeg-devel mailing list