[FFmpeg-devel] [PATCH] H.264: Faster DC handling (idct_dc and more)

Tue Feb 3 19:05:46 CET 2009

On Mon, Jan 26, 2009 at 05:32:14PM -0800, Jason Garrett-Glaser wrote:
> The first of these patches modifies ffmpeg to implement x264-style
> scan8; this means storing nnz from luma/chroma DC in scan8.  This is
> extremely useful because it means one can check with a single if
> whether luma/chroma DC exists.  Normally, you can only do it in the
> case of CABAC where the data is stored in the cbp.  This also
> simplifies decode_residual a lot and will surely be useful for future
> optimizations.
> 
> The second of these patches ports an 8x8 idct_dc function from x264
> which I just wrote (so new, in fact, it isn't in x264 yet!).  This one
> is LGPL.  The entire patch drops chroma idct/dequant time from ~268
> clocks to ~205 clocks on a Core 2 Conroe.  Note that there *are*
> warnings relating to pointers in this patch--suggestions welcome on
> the best way to clean them up without making the source messy.
> 
> And yes, I tested adding the ifs I used in the DC-only section to the
> other chroma decoding section--it makes it slower, probably because
> you almost never have chroma AC without chroma DC.
> 
> Dark Shikari

first iam sorry for not reviewing this sooner ...

[...]
> @@ -4113,21 +4146,14 @@
>  
>      //FIXME put trailing_onex into the context
>  
> -    if(n == CHROMA_DC_BLOCK_INDEX){
> +    if(n >= CHROMA_DC_BLOCK_INDEX){
>          coeff_token= get_vlc2(gb, chroma_dc_coeff_token_vlc.table, CHROMA_DC_COEFF_TOKEN_VLC_BITS, 1);
> -        total_coeff= coeff_token>>2;
>      }else{
> -        if(n == LUMA_DC_BLOCK_INDEX){
> -            total_coeff= pred_non_zero_count(h, 0);
> -            coeff_token= get_vlc2(gb, coeff_token_vlc[ coeff_token_table_index[total_coeff] ].table, COEFF_TOKEN_VLC_BITS, 2);
> -            total_coeff= coeff_token>>2;
> -        }else{
> -            total_coeff= pred_non_zero_count(h, n);
> -            coeff_token= get_vlc2(gb, coeff_token_vlc[ coeff_token_table_index[total_coeff] ].table, COEFF_TOKEN_VLC_BITS, 2);
> -            total_coeff= coeff_token>>2;
> -            h->non_zero_count_cache[ scan8[n] ]= total_coeff;
> -        }

> +        total_coeff= pred_non_zero_count(h, n == LUMA_DC_BLOCK_INDEX ? 0 : n);

it would be faster maybe if its not inlined if scan8[0] : scan8[n] would be
passed
instead of 0 : n

[...]
> @@ -5299,10 +5325,14 @@
>      assert(coeff_count > 0);
>  
>      if( is_dc ) {
> -        if( cat == 0 )
> +        if( cat == 0 ){
>              h->cbp_table[h->mb_xy] |= 0x100;
> -        else
> +            h->non_zero_count_cache[scan8[LUMA_DC_BLOCK_INDEX]] = coeff_count;

> +        }
> +        else{

the } should be on the same line as the else

except these the first patch is ok

[...]

> @@ -2621,23 +2642,35 @@
>                      }
>                  }
>              }else{
> -                chroma_dc_dequant_idct_c(h->mb + 16*16, h->chroma_qp[0], h->dequant4_coeff[IS_INTRA(mb_type) ? 1:4][h->chroma_qp[0]][0]);
> -                chroma_dc_dequant_idct_c(h->mb + 16*16+4*16, h->chroma_qp[1], h->dequant4_coeff[IS_INTRA(mb_type) ? 2:5][h->chroma_qp[1]][0]);
> -                if(is_h264){
> -                    idct_add = s->dsp.h264_idct_add;
> -                    idct_dc_add = s->dsp.h264_idct_dc_add;
> -                    for(i=16; i<16+8; i++){
> -                        if(h->non_zero_count_cache[ scan8[i] ])
> -                            idct_add   (dest[(i&4)>>2] + block_offset[i], h->mb + i*16, uvlinesize);
> -                        else if(h->mb[i*16])
> -                            idct_dc_add(dest[(i&4)>>2] + block_offset[i], h->mb + i*16, uvlinesize);
[...]
> +                    chroma_dc_dequant_idct_unpack_c(h->mb + 16*16, h->chroma_qp[0], h->dequant4_coeff[IS_INTRA(mb_type) ? 1:4][h->chroma_qp[0]][0]);
> +                    chroma_dc_dequant_idct_unpack_c(h->mb + 16*16+4*16, h->chroma_qp[1], h->dequant4_coeff[IS_INTRA(mb_type) ? 2:5][h->chroma_qp[1]][0]);
> +                    if(is_h264){
> +                        idct_add = s->dsp.h264_idct_add;
> +                        idct_dc_add = s->dsp.h264_idct_dc_add;
> +                        for(i=16; i<16+8; i++){
> +                            if(h->non_zero_count_cache[ scan8[i] ])
> +                                idct_add   (dest[(i&4)>>2] + block_offset[i], h->mb + i*16, uvlinesize);
> +                            else if(h->mb[i*16])
> +                                idct_dc_add(dest[(i&4)>>2] + block_offset[i], h->mb + i*16, uvlinesize);
>                          }

please dont mix indention changes with functional ones
this makes things hard to review

[...]
> Index: libavcodec/h264.h
> ===================================================================
> --- libavcodec/h264.h	(revision 16807)
> +++ libavcodec/h264.h	(working copy)
> @@ -36,8 +36,8 @@
>  #define interlaced_dct interlaced_dct_is_a_bad_name
>  #define mb_intra mb_intra_is_not_initialized_see_mb_type
>  
> -#define LUMA_DC_BLOCK_INDEX   25
> -#define CHROMA_DC_BLOCK_INDEX 26
> +#define LUMA_DC_BLOCK_INDEX    24
> +#define CHROMA_DC_BLOCK_INDEX  25
>  
>  #define CHROMA_DC_COEFF_TOKEN_VLC_BITS 8
>  #define COEFF_TOKEN_VLC_BITS           8

that hunk is duplicated

[...]
> @@ -2896,6 +2898,13 @@
>          }
>  #endif
>  
> +#ifdef HAVE_YASM

this should be #if now

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Frequently ignored awnser#1 FFmpeg bugs should be sent to our bugtracker. User
questions about the command line tools should be sent to the ffmpeg-user ML.
And questions about how to use libav* should be sent to the libav-user ML.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090203/cfd7f699/attachment.pgp>