[MPlayer-dev-eng] [PATCH]Add support for CoreAVC h264 codec
Loren Merritt
lorenm at u.washington.edu
Thu Oct 5 20:38:47 CEST 2006
On Thu, 5 Oct 2006, Luca Barbato wrote:
> Michael Niedermayer wrote:
>
>> could you post benchmarks with h264 videos with various parameters with
>> coreavc and ffh264? that should be VERY usefull to find out which parts are
>> faster then ffh264 which could then help to improve ffh264
>>
>> things to test
>> * low resolution where all reference frames+1 _easily_ fit in the L2 cache
>> * CABAC / CAVLC
>> * high bitrate / low bitrate
>> * intra only
>> * B frames vs. no B frames
>> * loop filter / disabled loop filter
>
> Please also use oprofile (latest release/cvs) or equivalent tools,
> recently I started digging a bit and looks like that:
>
> for G4 you spend 23% memcpying data (equally balanced smallcopys and
> bigcopys, I implemented a naif smallcopy and I shaved 1/25 of the time)
>
> for G5 you spend nearly 10% of the time (the glibc in gentoo includes G5
> specific improvements over standard memcpy)
Really? There aren't any calls to memcpy in the macroblock layer, only in
the slice header and codec init. memcpy doesn't even show up on the
oprofile here (i.e. less than .001% cpu time).
Unless you're measuring the memcpy to video memory, since ffh264 doesn't
do direct rendering? But that would be all big copies.
--Loren Merritt
-------------- next part --------------
libavcodec h264 decoder. content: high profile, 480p, 1mbit/s
samples % cpu function name
29.7180 motion compensation
367398 11.6777 put_h264_qpel16_mc##_mmx2
203687 6.4742 put_h264_chroma_mc8_mmx
126108 4.0083 ff_h264_biweight_#x#_mmx2
68761 2.1856 put_h264_chroma_mc2_c
41837 1.3298 put_h264_qpel8_mc##_mmx2
34502 1.0966 put_h264_chroma_mc4_mmx
30452 0.9679 put_h264_qpel4_mc##_mmx2
28621 0.9097 prefetch_mmx2
17555 0.5580 biweight_h264_pixels2x2_c
15421 0.4902 draw_edges_mmx
628 0.0200 ff_emulated_edge_mc
0.3869 intra
4405 0.1400 pred8x8l_#
3120 0.0992 pred8x8c_#
2731 0.0868 pred4x4_#
1915 0.0609 pred16x16_#
1.0699 dct
18988 0.6035 ff_h264_idct8_add_mmx
8109 0.2577 ff_h264_idct_add_mmx
4624 0.1470 ff_h264_idct_dc_add_mmx2
1941 0.0617 ff_h264_idct8_dc_add_mmx2
9.6097 deblocking
97586 3.1018 h264_#_loop_filter_luma_mmx2
68366 2.1730 filter_mb_fast
54949 1.7465 h264_#_loop_filter_chroma_mmx2
37682 1.1977 h264_loop_filter_strength_mmx2
34575 1.0990 filter_mb_edge#
9177 0.2917 filter_mb
18.6672 bitstream parsing
298997 9.5036 get_cabac
160229 5.0929 decode_cabac_residual
44466 1.4133 decode_cabac_mb_skip
34223 1.0878 decode_cabac_mb_cbp_luma
22523 0.7159 decode_cabac_mb_mvd
6429 0.2043 decode_cabac_mb_type
5782 0.1838 decode_cabac_mb_ref
4682 0.1488 decode_cabac_mb_dqp
3541 0.1126 decode_cabac_mb_cbp_chroma
2791 0.0887 decode_cabac_mb_intra4x4_pred_mode
1224 0.0389 ff_init_cabac_states
606 0.0193 decode_cabac_mb_chroma_pre_mode
478 0.0152 decode_cabac_intra_mb_type
352 0.0112 decode_cabac_p_mb_sub_type
339 0.0108 get_ue_golomb
337 0.0107 decode_cabac_b_mb_sub_type
212 0.0067 decode_cabac_mb_transform_size
84 0.0027 ff_init_cabac_decoder
40.5155 control-flow and overhead
317413 10.0889 mc_part
297271 9.4487 hl_decode_mb
208579 6.6297 fill_caches
156143 4.9630 decode_mb_skip
114398 3.6361 decode_mb_cabac
53610 1.7040 pred_direct_motion
40733 1.2947 clear_blocks_mmx
30472 0.9685 write_back_motion
22394 0.7118 decode_slice
9498 0.3019 write_back_non_zero_count
8628 0.2742 pred_motion
7052 0.2241 decode_slice_header
4676 0.1486 decode_nal_units
1069 0.0340 compute_mb_neighbors
527 0.0168 pred_intra_mode
501 0.0159 get_chroma_qp
368 0.0117 get_dct8x8_allowed
358 0.0114 pred_8x16_motion
249 0.0079 fetch_diagonal_mv
217 0.0069 write_back_intra_pred_mode
195 0.0062 pred_16x8_motion
178 0.0057 check_intra_pred_mode
152 0.0048 check_intra4x4_pred_mode
More information about the MPlayer-dev-eng
mailing list