[FFmpeg-devel] VP8 profiling status

Ronald S. Bultje rsbultje
Mon Jul 19 22:06:48 CEST 2010


Hi,

here's profiling of ffplay Elephants_Dream.webm before and after SIMD,
as a FYI. Right now we spend most time in the arithmetic coder
(particularly decode_block_coeffs, but also vp8_decode_frame). We
spend (imo) way too much time in put_vp8_epel8_h6, I'll try to figure
out what's wrong with that func (looks ok on a quick look...). libvpx
has some pretty confusing macro-mess for block coeff decoding which
apparently helps speed it up, we might want to look into doing
something like that (or other methods of optimizing this).

time ./ffmpeg_g -i Elephants_Dream-360p-Stereo.webm -f md5 -an -vcodec
rawvideo -
SIMD (today's SVN plus the patches I submitted for SSE2/MMX2/MMX loopfilter):
real	1m50.678s
user	1m28.240s
sys	0m1.335s

C (original C-only decoder as committed a few weeks ago)
real	3m21.502s
user	2m46.615s
sys	0m1.812s

libvpx (git HEAD of today 7a89d4c3d4e5470be8f1eae25e08a90ab47e16a3,
--target=x86-darwin9-gcc with --enable-runtime-cpu-detect):
real	2m5.146s
user	1m32.009s
sys	0m1.520s

Profiling data (using Shark):

before (summary - MC: 30.8%, inner LF: 20.5%, mbedge LF: 26.2%):
	11.9%	11.9%	ffplay_g	put_vp8_epel16_h6v6_c	
	8.1%	8.1%	ffplay_g	vp8_v_loop_filter16_inner_c	
	8.1%	8.1%	ffplay_g	vp8_v_loop_filter8uv_c	
	7.2%	7.2%	ffplay_g	put_vp8_epel16_v6_c	
	7.2%	7.2%	ffplay_g	vp8_h_loop_filter16_inner_c	
	6.7%	6.7%	ffplay_g	vp8_v_loop_filter16_c	
	5.7%	5.7%	ffplay_g	vp8_h_loop_filter8uv_c	
	5.7%	5.7%	ffplay_g	decode_block_coeffs	
	5.7%	5.7%	ffplay_g	vp8_h_loop_filter16_c	
	4.3%	4.3%	ffplay_g	vp8_decode_frame	
	3.0%	3.0%	libSystem.B.dylib	__memcpy	
	2.9%	2.9%	ffplay_g	vp8_v_loop_filter8uv_inner_c	
	2.7%	2.7%	ffplay_g	put_vp8_epel16_h6_c	
	2.4%	2.4%	ffplay_g	ff_emulated_edge_mc	
	2.3%	2.3%	ffplay_g	vp8_h_loop_filter8uv_inner_c	
	1.7%	1.7%	ffplay_g	put_vp8_epel8_v4_c	
	1.6%	1.6%	ffplay_g	put_vp8_epel8_h4v4_c	
	1.3%	1.3%	ffplay_g	put_vp8_epel8_h6v4_c	
	1.3%	1.3%	ffplay_g	put_vp8_epel8_h4v6_c	
	1.2%	1.2%	ffplay_g	put_vp8_epel8_h6v6_c	
	0.9%	0.9%	ffplay_g	put_vp8_epel8_v6_c	
	0.9%	0.9%	ffplay_g	vp8_idct_dc_add_c	
	0.8%	0.8%	ffplay_g	vp8_idct_add_c	
	0.6%	0.6%	ffplay_g	filter_mb	
[.. cut here ..]

after (summary - MC: 24.5%, inner LF: 8.1%, mbedge LF: 14.9%):
	17.0%	17.0%	ffplay_g	decode_block_coeffs	
	14.3%	14.3%	ffplay_g	ff_put_vp8_epel8_h6_sse2	
	9.3%	9.3%	ffplay_g	vp8_decode_frame	
	7.0%	7.0%	libSystem.B.dylib	__memcpy	
	5.4%	5.4%	ffplay_g	ff_emulated_edge_mc	
	5.2%	5.2%	ffplay_g	ff_vp8_h_loop_filter16_mbedge_mmxext	
	5.1%	5.1%	ffplay_g	ff_put_vp8_epel8_v6_sse2	
	4.8%	4.8%	ffplay_g	ff_vp8_h_loop_filter8_mbedge_mmxext	
	3.7%	3.7%	ffplay_g	ff_vp8_h_loop_filter16_inner_mmxext	
	2.7%	2.7%	ffplay_g	ff_put_vp8_epel8_h4_sse2	
	2.6%	2.6%	ffplay_g	ff_vp8_v_loop_filter16_mbedge_mmxext	
	2.6%	2.6%	ffplay_g	ff_vp8_v_loop_filter16_inner_sse2	
	2.3%	2.3%	ffplay_g	ff_vp8_v_loop_filter8_mbedge_mmxext	
	1.8%	1.8%	ffplay_g	inter_predict	
	1.5%	1.5%	ffplay_g	ff_vp8_idct_dc_add_mmx	
	1.5%	1.5%	ffplay_g	filter_mb	
	1.4%	1.4%	ffplay_g	ff_put_vp8_epel8_v4_sse2	
	1.1%	1.1%	ffplay_g	ff_vp8_h_loop_filter8_inner_mmxext	
	1.0%	1.0%	ffplay_g	ff_put_vp8_pixels16_sse	
	0.8%	0.8%	ffplay_g	read_mv_component	
	0.8%	0.8%	ffplay_g	decode_frame_header	
	0.7%	0.7%	ffplay_g	ff_vp8_v_loop_filter8_inner_sse2	
	0.7%	0.7%	ffplay_g	ff_vp8_idct_add_mmx	
	0.7%	0.7%	ffplay_g	clear_blocks_sse	
	0.7%	0.7%	ffplay_g	idct_mb	
[.. rest cut off ..]

libvpx:
	13.7%	13.7%	ffplay_g	vp8_decode_mb_tokens	
	10.0%	10.0%	libSystem.B.dylib	__memcpy	
	6.6%	6.6%	ffplay_g	vp8_mbloop_filter_vertical_edge_sse2	
	5.8%	5.8%	ffplay_g	vp8_filter_block1d16_h6_sse2	
	5.0%	5.0%	ffplay_g	vp8_mbloop_filter_vertical_edge_uv_sse2	
	4.1%	4.1%	ffplay_g	vp8_loop_filter_vertical_edge_sse2	
	4.0%	4.0%	ffplay_g	vp8_filter_block1d16_v6_sse2	
	3.8%	3.8%	ffplay_g	vp8_filter_block1d8_h6_sse2	
	3.3%	3.3%	ffplay_g	vp8_decode_mode_mvs	
	2.7%	2.7%	ffplay_g	vp8_mbloop_filter_horizontal_edge_uv_sse2	
	2.3%	2.3%	ffplay_g	vp8_mbloop_filter_horizontal_edge_sse2	
	2.1%	2.1%	ffplay_g	vp8_decode_macroblock	
	2.0%	2.0%	ffplay_g	vp8_filter_block1d16_h6_only_sse2	
	1.9%	1.9%	ffplay_g	vp8_find_near_mvs	
	1.8%	1.8%	ffplay_g	vp8_predict_intra4x4	
	1.8%	1.8%	ffplay_g	vp8_loop_filter_horizontal_edge_sse2	
	1.6%	1.6%	ffplay_g	vp8_loop_filter_frame	
	1.4%	1.4%	ffplay_g	vp8_dc_only_idct_mmx	
	1.3%	1.3%	ffplay_g	vp8_copy_mem16x16_sse2	
	1.2%	1.2%	ffplay_g	vp8_decode_mb_row	
	1.2%	1.2%	ffplay_g	vp8_filter_block1d8_v6_only_sse2	
	1.2%	1.2%	ffplay_g	vp8_copy_mem8x8_mmx	
	1.1%	1.1%	ffplay_g	vp8_loop_filter_vertical_edge_uv_sse2	
	1.1%	1.1%	ffplay_g	vp8_kfread_modes	
	1.0%	1.0%	ffplay_g	vp8_recon4b_sse2	
	1.0%	1.0%	ffplay_g	vp8_filter_block1d8_h6_only_sse2	
	1.0%	1.0%	ffplay_g	vp8_filter_block1d8_v6_sse2	
	0.9%	0.9%	ffplay_g	vp8_setup_intra_recon	
	0.9%	0.9%	ffplay_g	vp8_recon2b_sse2	
	0.9%	0.9%	ffplay_g	vp8_unpack_block1d16_h6_sse2	
	0.8%	0.8%	ffplay_g	vp8_recon16x16mb	
	0.8%	0.8%	ffplay_g	vp8_loop_filter_horizontal_edge_uv_sse2	
	0.6%	0.6%	ffplay_g	vp8_dequant_dc_idct_mmx	
[.. cut off here ..]



More information about the ffmpeg-devel mailing list