[MPlayer-dev-eng] [PATCH]Add support for CoreAVC h264 codec

Loren Merritt lorenm at u.washington.edu
Thu Oct 5 20:38:47 CEST 2006


On Thu, 5 Oct 2006, Luca Barbato wrote:
> Michael Niedermayer wrote:
>
>> could you post benchmarks with h264 videos with various parameters with
>> coreavc and ffh264? that should be VERY usefull to find out which parts are
>> faster then ffh264 which could then help to improve ffh264
>>
>> things to test
>> * low resolution where all reference frames+1 _easily_ fit in the L2 cache
>> * CABAC / CAVLC
>> * high bitrate / low bitrate
>> * intra only
>> * B frames vs. no B frames
>> * loop filter / disabled loop filter
>
> Please also use oprofile (latest release/cvs) or equivalent tools,
> recently I started digging a bit and looks like that:
>
> for G4 you spend 23% memcpying data (equally balanced smallcopys and
> bigcopys, I implemented a naif smallcopy and I shaved 1/25 of the time)
>
> for G5 you spend nearly 10% of the time (the glibc in gentoo includes G5
> specific improvements over standard memcpy)

Really? There aren't any calls to memcpy in the macroblock layer, only in 
the slice header and codec init. memcpy doesn't even show up on the 
oprofile here (i.e. less than .001% cpu time).
Unless you're measuring the memcpy to video memory, since ffh264 doesn't 
do direct rendering? But that would be all big copies.

--Loren Merritt
-------------- next part --------------
libavcodec h264 decoder. content: high profile, 480p, 1mbit/s

samples   % cpu   function name

         29.7180    motion compensation
367398   11.6777  put_h264_qpel16_mc##_mmx2
203687    6.4742  put_h264_chroma_mc8_mmx
126108    4.0083  ff_h264_biweight_#x#_mmx2
68761     2.1856  put_h264_chroma_mc2_c
41837     1.3298  put_h264_qpel8_mc##_mmx2
34502     1.0966  put_h264_chroma_mc4_mmx
30452     0.9679  put_h264_qpel4_mc##_mmx2
28621     0.9097  prefetch_mmx2
17555     0.5580  biweight_h264_pixels2x2_c
15421     0.4902  draw_edges_mmx
628       0.0200  ff_emulated_edge_mc

          0.3869    intra
4405      0.1400  pred8x8l_#
3120      0.0992  pred8x8c_#
2731      0.0868  pred4x4_#
1915      0.0609  pred16x16_#

          1.0699    dct
18988     0.6035  ff_h264_idct8_add_mmx
8109      0.2577  ff_h264_idct_add_mmx
4624      0.1470  ff_h264_idct_dc_add_mmx2
1941      0.0617  ff_h264_idct8_dc_add_mmx2

          9.6097    deblocking
97586     3.1018  h264_#_loop_filter_luma_mmx2
68366     2.1730  filter_mb_fast
54949     1.7465  h264_#_loop_filter_chroma_mmx2
37682     1.1977  h264_loop_filter_strength_mmx2
34575     1.0990  filter_mb_edge#
9177      0.2917  filter_mb

         18.6672    bitstream parsing
298997    9.5036  get_cabac
160229    5.0929  decode_cabac_residual
44466     1.4133  decode_cabac_mb_skip
34223     1.0878  decode_cabac_mb_cbp_luma
22523     0.7159  decode_cabac_mb_mvd
6429      0.2043  decode_cabac_mb_type
5782      0.1838  decode_cabac_mb_ref
4682      0.1488  decode_cabac_mb_dqp
3541      0.1126  decode_cabac_mb_cbp_chroma
2791      0.0887  decode_cabac_mb_intra4x4_pred_mode
1224      0.0389  ff_init_cabac_states
606       0.0193  decode_cabac_mb_chroma_pre_mode
478       0.0152  decode_cabac_intra_mb_type
352       0.0112  decode_cabac_p_mb_sub_type
339       0.0108  get_ue_golomb
337       0.0107  decode_cabac_b_mb_sub_type
212       0.0067  decode_cabac_mb_transform_size
84        0.0027  ff_init_cabac_decoder

         40.5155    control-flow and overhead
317413   10.0889  mc_part
297271    9.4487  hl_decode_mb
208579    6.6297  fill_caches
156143    4.9630  decode_mb_skip
114398    3.6361  decode_mb_cabac
53610     1.7040  pred_direct_motion
40733     1.2947  clear_blocks_mmx
30472     0.9685  write_back_motion
22394     0.7118  decode_slice
9498      0.3019  write_back_non_zero_count
8628      0.2742  pred_motion
7052      0.2241  decode_slice_header
4676      0.1486  decode_nal_units
1069      0.0340  compute_mb_neighbors
527       0.0168  pred_intra_mode
501       0.0159  get_chroma_qp
368       0.0117  get_dct8x8_allowed
358       0.0114  pred_8x16_motion
249       0.0079  fetch_diagonal_mv
217       0.0069  write_back_intra_pred_mode
195       0.0062  pred_16x8_motion
178       0.0057  check_intra_pred_mode
152       0.0048  check_intra4x4_pred_mode


More information about the MPlayer-dev-eng mailing list