[FFmpeg-devel] [PATCH] H.264: Faster DC handling (idct_dc and more)
Michael Niedermayer
michaelni
Tue Feb 3 19:05:46 CET 2009
On Mon, Jan 26, 2009 at 05:32:14PM -0800, Jason Garrett-Glaser wrote:
> The first of these patches modifies ffmpeg to implement x264-style
> scan8; this means storing nnz from luma/chroma DC in scan8. This is
> extremely useful because it means one can check with a single if
> whether luma/chroma DC exists. Normally, you can only do it in the
> case of CABAC where the data is stored in the cbp. This also
> simplifies decode_residual a lot and will surely be useful for future
> optimizations.
>
> The second of these patches ports an 8x8 idct_dc function from x264
> which I just wrote (so new, in fact, it isn't in x264 yet!). This one
> is LGPL. The entire patch drops chroma idct/dequant time from ~268
> clocks to ~205 clocks on a Core 2 Conroe. Note that there *are*
> warnings relating to pointers in this patch--suggestions welcome on
> the best way to clean them up without making the source messy.
>
> And yes, I tested adding the ifs I used in the DC-only section to the
> other chroma decoding section--it makes it slower, probably because
> you almost never have chroma AC without chroma DC.
>
> Dark Shikari
first iam sorry for not reviewing this sooner ...
[...]
> @@ -4113,21 +4146,14 @@
>
> //FIXME put trailing_onex into the context
>
> - if(n == CHROMA_DC_BLOCK_INDEX){
> + if(n >= CHROMA_DC_BLOCK_INDEX){
> coeff_token= get_vlc2(gb, chroma_dc_coeff_token_vlc.table, CHROMA_DC_COEFF_TOKEN_VLC_BITS, 1);
> - total_coeff= coeff_token>>2;
> }else{
> - if(n == LUMA_DC_BLOCK_INDEX){
> - total_coeff= pred_non_zero_count(h, 0);
> - coeff_token= get_vlc2(gb, coeff_token_vlc[ coeff_token_table_index[total_coeff] ].table, COEFF_TOKEN_VLC_BITS, 2);
> - total_coeff= coeff_token>>2;
> - }else{
> - total_coeff= pred_non_zero_count(h, n);
> - coeff_token= get_vlc2(gb, coeff_token_vlc[ coeff_token_table_index[total_coeff] ].table, COEFF_TOKEN_VLC_BITS, 2);
> - total_coeff= coeff_token>>2;
> - h->non_zero_count_cache[ scan8[n] ]= total_coeff;
> - }
> + total_coeff= pred_non_zero_count(h, n == LUMA_DC_BLOCK_INDEX ? 0 : n);
it would be faster maybe if its not inlined if scan8[0] : scan8[n] would be
passed
instead of 0 : n
[...]
> @@ -5299,10 +5325,14 @@
> assert(coeff_count > 0);
>
> if( is_dc ) {
> - if( cat == 0 )
> + if( cat == 0 ){
> h->cbp_table[h->mb_xy] |= 0x100;
> - else
> + h->non_zero_count_cache[scan8[LUMA_DC_BLOCK_INDEX]] = coeff_count;
> + }
> + else{
the } should be on the same line as the else
except these the first patch is ok
[...]
> @@ -2621,23 +2642,35 @@
> }
> }
> }else{
> - chroma_dc_dequant_idct_c(h->mb + 16*16, h->chroma_qp[0], h->dequant4_coeff[IS_INTRA(mb_type) ? 1:4][h->chroma_qp[0]][0]);
> - chroma_dc_dequant_idct_c(h->mb + 16*16+4*16, h->chroma_qp[1], h->dequant4_coeff[IS_INTRA(mb_type) ? 2:5][h->chroma_qp[1]][0]);
> - if(is_h264){
> - idct_add = s->dsp.h264_idct_add;
> - idct_dc_add = s->dsp.h264_idct_dc_add;
> - for(i=16; i<16+8; i++){
> - if(h->non_zero_count_cache[ scan8[i] ])
> - idct_add (dest[(i&4)>>2] + block_offset[i], h->mb + i*16, uvlinesize);
> - else if(h->mb[i*16])
> - idct_dc_add(dest[(i&4)>>2] + block_offset[i], h->mb + i*16, uvlinesize);
[...]
> + chroma_dc_dequant_idct_unpack_c(h->mb + 16*16, h->chroma_qp[0], h->dequant4_coeff[IS_INTRA(mb_type) ? 1:4][h->chroma_qp[0]][0]);
> + chroma_dc_dequant_idct_unpack_c(h->mb + 16*16+4*16, h->chroma_qp[1], h->dequant4_coeff[IS_INTRA(mb_type) ? 2:5][h->chroma_qp[1]][0]);
> + if(is_h264){
> + idct_add = s->dsp.h264_idct_add;
> + idct_dc_add = s->dsp.h264_idct_dc_add;
> + for(i=16; i<16+8; i++){
> + if(h->non_zero_count_cache[ scan8[i] ])
> + idct_add (dest[(i&4)>>2] + block_offset[i], h->mb + i*16, uvlinesize);
> + else if(h->mb[i*16])
> + idct_dc_add(dest[(i&4)>>2] + block_offset[i], h->mb + i*16, uvlinesize);
> }
please dont mix indention changes with functional ones
this makes things hard to review
[...]
> Index: libavcodec/h264.h
> ===================================================================
> --- libavcodec/h264.h (revision 16807)
> +++ libavcodec/h264.h (working copy)
> @@ -36,8 +36,8 @@
> #define interlaced_dct interlaced_dct_is_a_bad_name
> #define mb_intra mb_intra_is_not_initialized_see_mb_type
>
> -#define LUMA_DC_BLOCK_INDEX 25
> -#define CHROMA_DC_BLOCK_INDEX 26
> +#define LUMA_DC_BLOCK_INDEX 24
> +#define CHROMA_DC_BLOCK_INDEX 25
>
> #define CHROMA_DC_COEFF_TOKEN_VLC_BITS 8
> #define COEFF_TOKEN_VLC_BITS 8
that hunk is duplicated
[...]
> @@ -2896,6 +2898,13 @@
> }
> #endif
>
> +#ifdef HAVE_YASM
this should be #if now
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Frequently ignored awnser#1 FFmpeg bugs should be sent to our bugtracker. User
questions about the command line tools should be sent to the ffmpeg-user ML.
And questions about how to use libav* should be sent to the libav-user ML.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090203/cfd7f699/attachment.pgp>
More information about the ffmpeg-devel
mailing list