[FFmpeg-devel] multithreaded H.264 / macroblock layer

Fri Jun 19 15:12:21 CEST 2009

On Sat, Jun 13, 2009 at 12:55 PM, Andreas ?man <andreas at lonelycoder.com> wrote:
>
> I tried to split the entropy decoding and picture reconstruction into
> two threads a few years ago. You can follow the thread here:
>
> http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/2007-September/035861.html
>
> I would not expect the patch to apply clean on a up-to-date
> repository. But you might get a few ideas out of it.

Hello,

I'm having some problems with seperating entropy decoding and picture
reconstruction.
I have looked into Andreas ?man's patch from 2 years ago and borrowed
the functions to save and restore the per macroblock data which has to
be preserved when seperating the decoding stages. I just introduced
two seperate loops then for entropy decoding and picture
reconstruction, which are each processing the macroblocks in raster
scan order, nothing multithreaded yet. After adding h->mb_xy to the
save/restore function I got a picture which at least showed moving
objects, however with heavy distortions. Have a look here:

http://www-user.rhrk.uni-kl.de/~brocksch/snapshot.png

It's a scene from the popular rush_hour sample video.
You can already see the cars, their movements are also correct. So for
me, it looks like some intra related information is missing!?
Can someone tell me, which kind of information is probably missing in
the picture reconstruction stage, so that I could include that in the
save/restore functions? I'm using ffmpeg 0.5. The sample video is not
coded in MBAFF mode.

I already did a diff of the h264.h files from the revision I am using
and Andreas used, so that I could see, if new per-macroblock data was
added, which I would have to save/restore. But so far I couldn't find
anything new which is per-macroblock only. I'm also looking into the
decode_mb_cabac() function to see, which information is overwritten
with each new macroblock. This task is however very complex, since
many functions are called there too. I've tried more than ten
additional variables to be saved/restored without success. For whom is
interested I also provide my code below. I'd be great if somebody
could give me advice..

Regards
Martin

This is the data I am saving and restoring for each macroblock:

typedef struct H264mb {
//data saved by Andreas
    int mb_x, mb_y;
    int qscale;
    int chroma_qp[2]; //QPc
    int chroma_pred_mode;
    int intra16x16_pred_mode;
    unsigned int topleft_samples_available;
    unsigned int topright_samples_available;
    int8_t intra4x4_pred_mode_cache[5*8];
    uint8_t non_zero_count_cache[6*8];
    int16_t mv_cache[2][5*8][2];
    int8_t ref_cache[2][5*8];
    int cbp;
    int top_mb_xy;
    int left_mb_xy[2];
    //unsigned int sub_mb_type[4];
    uint16_t sub_mb_type[4];
    DCTELEM mb[16*24];

//things I've tried (making more or less sense)
	int mb_xy;
	int is_complex;
	int luma_weight_flag[2];
	int chroma_weight_flag[2];
	int prev_mb_skipped;
	int next_mb_skipped;
	int last_qscale_diff;
	int mb_field_decoding_flag;
	int mb_mbaff;
	unsigned int ref_count[2];
	unsigned int top_samples_available;
	unsigned int left_samples_available;
	int top_cbp;
	int left_cbp;
	int mv_cache_clean[2];
	int neighbor_transform_size;
	int16_t mvd_cache[2][5*8][2];
	uint8_t direct_cache[5*8];
} H264mb;

This is the relevant part of my modified decode_slice() function:
( copy_context_to_mb() and copy_mb_to_context() are the functions to
save/restore the per macroblock data listed above)

for(;;){
//START_TIMER
            int ret = decode_mb_cabac(h);
            int eos;
//STOP_TIMER("decode_mb_cabac")

            //if(ret>=0) hl_decode_mb(h);
            copy_context_to_mb(h->blocks + count, h);

            if( ret >= 0 && FRAME_MBAFF ) { //FIXME optimal? or let
mb_decode decode 16x32 ?
                fprintf(stderr, "WARNING: MBAFF\n");
                s->mb_y++;
                count++;

                ret = decode_mb_cabac(h);

                //if(ret>=0) hl_decode_mb(h);
                copy_context_to_mb(h->blocks + count, h);
                s->mb_y--;
            }
            eos = get_cabac_terminate( &h->cabac );

            if( ret < 0 || h->cabac.bytestream > h->cabac.bytestream_end + 2) {
                av_log(h->s.avctx, AV_LOG_ERROR, "error while decoding
MB %d %d, bytestream (%td)\n", s->mb_x, s->mb_y,
h->cabac.bytestream_end - h->cabac.bytestream);
                ff_er_add_slice(s, s->resync_mb_x, s->resync_mb_y,
s->mb_x, s->mb_y, (AC_ERROR|DC_ERROR|MV_ERROR)&part_mask);
                return -1;
            }

if( ++s->mb_x >= s->mb_width ) {
                s->mb_x = 0;
                ff_draw_horiz_band(s, 16*s->mb_y, 16);
                ++s->mb_y;
                if(FIELD_OR_MBAFF_PICTURE) {
                    ++s->mb_y;
                }
            }

if( eos || s->mb_y >= s->mb_height ) {
                break;
                }

count++;
}

for(i=0;i<=count;i++){
		copy_mb_to_context(h, h->blocks + i);
		hl_decode_mb(h);
		}

tprintf(s->avctx, "slice end %d %d\n", get_bits_count(&s->gb),
s->gb.size_in_bits);
ff_er_add_slice(s, s->resync_mb_x, s->resync_mb_y, s->mb_x-1, s->mb_y,
(AC_END|DC_END|MV_END)&part_mask);
av_free(h->blocks);
return 0;