[FFmpeg-devel] VP8 profiling status
Ronald S. Bultje
rsbultje
Mon Jul 19 22:06:48 CEST 2010
Hi,
here's profiling of ffplay Elephants_Dream.webm before and after SIMD,
as a FYI. Right now we spend most time in the arithmetic coder
(particularly decode_block_coeffs, but also vp8_decode_frame). We
spend (imo) way too much time in put_vp8_epel8_h6, I'll try to figure
out what's wrong with that func (looks ok on a quick look...). libvpx
has some pretty confusing macro-mess for block coeff decoding which
apparently helps speed it up, we might want to look into doing
something like that (or other methods of optimizing this).
time ./ffmpeg_g -i Elephants_Dream-360p-Stereo.webm -f md5 -an -vcodec
rawvideo -
SIMD (today's SVN plus the patches I submitted for SSE2/MMX2/MMX loopfilter):
real 1m50.678s
user 1m28.240s
sys 0m1.335s
C (original C-only decoder as committed a few weeks ago)
real 3m21.502s
user 2m46.615s
sys 0m1.812s
libvpx (git HEAD of today 7a89d4c3d4e5470be8f1eae25e08a90ab47e16a3,
--target=x86-darwin9-gcc with --enable-runtime-cpu-detect):
real 2m5.146s
user 1m32.009s
sys 0m1.520s
Profiling data (using Shark):
before (summary - MC: 30.8%, inner LF: 20.5%, mbedge LF: 26.2%):
11.9% 11.9% ffplay_g put_vp8_epel16_h6v6_c
8.1% 8.1% ffplay_g vp8_v_loop_filter16_inner_c
8.1% 8.1% ffplay_g vp8_v_loop_filter8uv_c
7.2% 7.2% ffplay_g put_vp8_epel16_v6_c
7.2% 7.2% ffplay_g vp8_h_loop_filter16_inner_c
6.7% 6.7% ffplay_g vp8_v_loop_filter16_c
5.7% 5.7% ffplay_g vp8_h_loop_filter8uv_c
5.7% 5.7% ffplay_g decode_block_coeffs
5.7% 5.7% ffplay_g vp8_h_loop_filter16_c
4.3% 4.3% ffplay_g vp8_decode_frame
3.0% 3.0% libSystem.B.dylib __memcpy
2.9% 2.9% ffplay_g vp8_v_loop_filter8uv_inner_c
2.7% 2.7% ffplay_g put_vp8_epel16_h6_c
2.4% 2.4% ffplay_g ff_emulated_edge_mc
2.3% 2.3% ffplay_g vp8_h_loop_filter8uv_inner_c
1.7% 1.7% ffplay_g put_vp8_epel8_v4_c
1.6% 1.6% ffplay_g put_vp8_epel8_h4v4_c
1.3% 1.3% ffplay_g put_vp8_epel8_h6v4_c
1.3% 1.3% ffplay_g put_vp8_epel8_h4v6_c
1.2% 1.2% ffplay_g put_vp8_epel8_h6v6_c
0.9% 0.9% ffplay_g put_vp8_epel8_v6_c
0.9% 0.9% ffplay_g vp8_idct_dc_add_c
0.8% 0.8% ffplay_g vp8_idct_add_c
0.6% 0.6% ffplay_g filter_mb
[.. cut here ..]
after (summary - MC: 24.5%, inner LF: 8.1%, mbedge LF: 14.9%):
17.0% 17.0% ffplay_g decode_block_coeffs
14.3% 14.3% ffplay_g ff_put_vp8_epel8_h6_sse2
9.3% 9.3% ffplay_g vp8_decode_frame
7.0% 7.0% libSystem.B.dylib __memcpy
5.4% 5.4% ffplay_g ff_emulated_edge_mc
5.2% 5.2% ffplay_g ff_vp8_h_loop_filter16_mbedge_mmxext
5.1% 5.1% ffplay_g ff_put_vp8_epel8_v6_sse2
4.8% 4.8% ffplay_g ff_vp8_h_loop_filter8_mbedge_mmxext
3.7% 3.7% ffplay_g ff_vp8_h_loop_filter16_inner_mmxext
2.7% 2.7% ffplay_g ff_put_vp8_epel8_h4_sse2
2.6% 2.6% ffplay_g ff_vp8_v_loop_filter16_mbedge_mmxext
2.6% 2.6% ffplay_g ff_vp8_v_loop_filter16_inner_sse2
2.3% 2.3% ffplay_g ff_vp8_v_loop_filter8_mbedge_mmxext
1.8% 1.8% ffplay_g inter_predict
1.5% 1.5% ffplay_g ff_vp8_idct_dc_add_mmx
1.5% 1.5% ffplay_g filter_mb
1.4% 1.4% ffplay_g ff_put_vp8_epel8_v4_sse2
1.1% 1.1% ffplay_g ff_vp8_h_loop_filter8_inner_mmxext
1.0% 1.0% ffplay_g ff_put_vp8_pixels16_sse
0.8% 0.8% ffplay_g read_mv_component
0.8% 0.8% ffplay_g decode_frame_header
0.7% 0.7% ffplay_g ff_vp8_v_loop_filter8_inner_sse2
0.7% 0.7% ffplay_g ff_vp8_idct_add_mmx
0.7% 0.7% ffplay_g clear_blocks_sse
0.7% 0.7% ffplay_g idct_mb
[.. rest cut off ..]
libvpx:
13.7% 13.7% ffplay_g vp8_decode_mb_tokens
10.0% 10.0% libSystem.B.dylib __memcpy
6.6% 6.6% ffplay_g vp8_mbloop_filter_vertical_edge_sse2
5.8% 5.8% ffplay_g vp8_filter_block1d16_h6_sse2
5.0% 5.0% ffplay_g vp8_mbloop_filter_vertical_edge_uv_sse2
4.1% 4.1% ffplay_g vp8_loop_filter_vertical_edge_sse2
4.0% 4.0% ffplay_g vp8_filter_block1d16_v6_sse2
3.8% 3.8% ffplay_g vp8_filter_block1d8_h6_sse2
3.3% 3.3% ffplay_g vp8_decode_mode_mvs
2.7% 2.7% ffplay_g vp8_mbloop_filter_horizontal_edge_uv_sse2
2.3% 2.3% ffplay_g vp8_mbloop_filter_horizontal_edge_sse2
2.1% 2.1% ffplay_g vp8_decode_macroblock
2.0% 2.0% ffplay_g vp8_filter_block1d16_h6_only_sse2
1.9% 1.9% ffplay_g vp8_find_near_mvs
1.8% 1.8% ffplay_g vp8_predict_intra4x4
1.8% 1.8% ffplay_g vp8_loop_filter_horizontal_edge_sse2
1.6% 1.6% ffplay_g vp8_loop_filter_frame
1.4% 1.4% ffplay_g vp8_dc_only_idct_mmx
1.3% 1.3% ffplay_g vp8_copy_mem16x16_sse2
1.2% 1.2% ffplay_g vp8_decode_mb_row
1.2% 1.2% ffplay_g vp8_filter_block1d8_v6_only_sse2
1.2% 1.2% ffplay_g vp8_copy_mem8x8_mmx
1.1% 1.1% ffplay_g vp8_loop_filter_vertical_edge_uv_sse2
1.1% 1.1% ffplay_g vp8_kfread_modes
1.0% 1.0% ffplay_g vp8_recon4b_sse2
1.0% 1.0% ffplay_g vp8_filter_block1d8_h6_only_sse2
1.0% 1.0% ffplay_g vp8_filter_block1d8_v6_sse2
0.9% 0.9% ffplay_g vp8_setup_intra_recon
0.9% 0.9% ffplay_g vp8_recon2b_sse2
0.9% 0.9% ffplay_g vp8_unpack_block1d16_h6_sse2
0.8% 0.8% ffplay_g vp8_recon16x16mb
0.8% 0.8% ffplay_g vp8_loop_filter_horizontal_edge_uv_sse2
0.6% 0.6% ffplay_g vp8_dequant_dc_idct_mmx
[.. cut off here ..]
More information about the ffmpeg-devel
mailing list