[FFmpeg-devel] [PATCH] h264.c/decode_cabac_residual optimization
Alexander Strange
astrange
Tue Jul 1 23:23:01 CEST 2008
On Jul 1, 2008, at 2:44 PM, Michael Niedermayer wrote:
> On Tue, Jul 01, 2008 at 01:14:56PM -0400, Alexander Strange wrote:
>>
>> 1 - Don't check for encoding in hl_decode_mb, since there's no
>> encoder.
>> 2 - Turn off cloning hl_decode_mb under CONFIG_SMALL.
>> 3 - Clone decode_cabac_residual depending on cat. This gets rid of
>> several
>> cat checks and the if (qmul) for every coeff.
>>
>> On Core 2:
>> Before-
>> 24205 dezicycles in decode_cabac_residual, 1048505 runs, 71 skips
>> 23079 dezicycles in decode_cabac_residual, 2096992 runs, 160 skips
>> 22186 dezicycles in decode_cabac_residual, 4193933 runs, 371 skips
>> 22444 dezicycles in decode_cabac_residual, 8387862 runs, 746 skips
>>
>> After-
>> 24035 dezicycles in decode_cabac_residual, 1048502 runs, 74 skips
>> 22922 dezicycles in decode_cabac_residual, 2096981 runs, 171 skips
>> 22037 dezicycles in decode_cabac_residual, 4193955 runs, 349 skips
>> 22293 dezicycles in decode_cabac_residual, 8387927 runs, 681 skips
>>
>> cabac_residual's time is definitely not normally distributed, so I
>> wouldn't
>> trust the cycle counts for anything, but at least it went down.
>>
>> 4 - reindention
>> 5 - Simplify "for( coeff_count--; coeff_count >= 0; coeff_count-- )".
>> 6 - Reorder the cat checks based on frequency, mostly from counting
>> block
>> types in x264 output. gcc statically predicts if == to be not
>> taken, so the
>> more common one should be in the else block. Moving cat == 5 to the
>> top of
>> the if/elses also makes it more obvious that it can be merged into
>> the
>> significance map.
>> 7 - Use get_cabac_bypass_sign in both branches of the loop.
>>
>> Before-
>> 24043 dezicycles in decode_cabac_residual, 524159 runs, 129 skips
>> 24462 dezicycles in decode_cabac_residual, 1048335 runs, 241 skips
>> 23323 dezicycles in decode_cabac_residual, 2096708 runs, 444 skips
>> 22416 dezicycles in decode_cabac_residual, 4193483 runs, 821 skips
>>
>> After-
>> 23318 dezicycles in decode_cabac_residual, 524112 runs, 176 skips
>> 23712 dezicycles in decode_cabac_residual, 1048286 runs, 290 skips
>> 22608 dezicycles in decode_cabac_residual, 2096607 runs, 545 skips
>> 21731 dezicycles in decode_cabac_residual, 4193412 runs, 892 skips
>>
>> There are some interesting things to look at after these-
>> - gcc pointlessly unrolls the "while( coeff_abs < 15 &&
>> get_cabac( CC, ctx
>> ) )" loop into taking up about half of the compiled function when
>> x86 asm
>> is used, so it could be rewritten in asm to fix that.
>> - get_cabac_bypass* could use cmov, and refill() is so small it
>> might take
>> less time than a branch mispredict penalty.
>> - The asm operands for get_cabac itself could be relaxed (not
>> requiring
>> output to eax, using +m/+g, that kind of thing). I tried this
>> earlier and
>> got bad results, but maybe it'll work better after this.
>> - refill* should use AV_RB16.
>> - I'm not sure what CABAC_ON_STACK is used for.
>>
>
>> But I don't want to spend too much time not doing SoC - someone
>> else can
>> try those if they want.
>
> fine, whoever wants to work on this should post
> proper benchmarks, not 1 run after half a dozen changes
BTW, that benchmark was only for the last patch.
>> diff -ru --exclude='*svn*' ffmpeg-/libavcodec/h264.c ffmpeg/
>> libavcodec/h264.c
>> --- ffmpeg-/libavcodec/h264.c 2008-06-30 14:47:21.000000000 -0400
>> +++ ffmpeg/libavcodec/h264.c 2008-06-30 14:47:36.000000000 -0400
>> @@ -2726,10 +2726,7 @@
>> MpegEncContext * const s = &h->s;
>> const int mb_xy= h->mb_xy;
>> const int mb_type= s->current_picture.mb_type[mb_xy];
>> - int is_complex = FRAME_MBAFF || MB_FIELD ||
>> IS_INTRA_PCM(mb_type) || s->codec_id != CODEC_ID_H264 ||
>> (ENABLE_GRAY && (s->flags&CODEC_FLAG_GRAY)) || s->encoding;
>> -
>> - if(!s->decode)
>> - return;
>> + int is_complex = FRAME_MBAFF || MB_FIELD ||
>> IS_INTRA_PCM(mb_type) || s->codec_id != CODEC_ID_H264 ||
>> (ENABLE_GRAY && (s->flags&CODEC_FLAG_GRAY));
>
> the if(!decode) should be under a #ifdef h264_encoder
attached, and split the very long line
> diff -ru --exclude='*svn*' ffmpeg-/libavcodec/h264.c ffmpeg/
> libavcodec/h264.c
>>
>> --- ffmpeg-/libavcodec/h264.c 2008-06-30 14:47:53.000000000 -0400
>> +++ ffmpeg/libavcodec/h264.c 2008-06-30 14:47:59.000000000 -0400
>> @@ -5517,7 +5517,7 @@
>> }
>> }
>>
>> - for( coeff_count--; coeff_count >= 0; coeff_count-- ) {
>> + while( coeff_count-- ) {
>> uint8_t *ctx = coeff_abs_level1_ctx[node_ctx] +
>> abs_level_m1_ctx_base;
>>
>> int j= scantable[index[coeff_count]];
>
> ok if faster or same speed
Same speed with two compilers here, slightly smaller code. Since the
compiler practically randomly rewrites a simple loop condition like
this, I don't want to claim it's any faster.
> [...]
>> diff -ru --exclude='*svn*' ffmpeg-/libavcodec/h264.c ffmpeg/
>> libavcodec/h264.c
>> --- ffmpeg-/libavcodec/h264.c 2008-06-30 14:48:05.000000000 -0400
>> +++ ffmpeg/libavcodec/h264.c 2008-06-30 14:48:11.000000000 -0400
>> @@ -5552,11 +5552,9 @@
>> }
>>
>> if( is_dc ) {
>> - if( get_cabac_bypass( CC ) ) block[j] = -coeff_abs;
>> - else block[j] =
>> coeff_abs;
>> + block[j] = get_cabac_bypass_sign( CC, -coeff_abs );
>> }else{
>> - if( get_cabac_bypass( CC ) ) block[j] = (-
>> coeff_abs * qmul[j] + 32) >> 6;
>> - else block[j] =
>> ( coeff_abs * qmul[j] + 32) >> 6;
>> + block[j] = (get_cabac_bypass_sign( CC, -
>> coeff_abs ) * qmul[j] + 32) >> 6;
>> }
>
> ok if faster or same speed
Slightly faster, small code size reduction.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1-noencoding.diff
Type: text/x-diff
Size: 1151 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080701/ca063210/attachment.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 2-hldecodesmall.diff
Type: text/x-diff
Size: 729 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080701/ca063210/attachment-0001.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 8-split-line.diff
Type: text/x-diff
Size: 766 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080701/ca063210/attachment-0002.diff>
More information about the ffmpeg-devel
mailing list