[FFmpeg-devel] [PATCH] h264 parallelized, (was: Parallelized h264 proof-of-concept)
Andreas Öman
andreas
Sat Jun 16 23:03:22 CEST 2007
Hi
Michael Niedermayer wrote:
> Hi
>
> On Fri, Jun 15, 2007 at 10:10:54PM +0200, Andreas ?man wrote:
>> Andreas ?man wrote:
>>> Hi
>>>
>>> Michael Niedermayer wrote:
>>>> Hi
>>>>
>>>>
>>>> av_free() + av_malloc() or pass an argument to MPV_common_init()
>>> I'll extend MPV_common_init() with an additional argument then.
>>>
>>>>> static void filter_mb_fast( H264Context *h, int mb_x, int mb_y, uint8_t
>>>>> *img_y, uint8_t *img_cb, uint8_t *img_cr, unsigned int linesize,
>>>>> unsigned int uvlinesize);
>>>>> +static void execute_decode_slices(H264Context *h, int reset);
>>>> cant you order the new functions so as to avoid that?
>>> Hm, not really. decode_slice_header() needs to be able to
>>> fire off any pending slices in case the deblocking-type changes
>>> within a frame. (AFAIK this is valid according to the specs,
>>> perhaps I'm wrong?)
>>>
>> Okay, here are the finalized patches.
>>
>> #1 - Extend MPV_common_init() with an addition arg for context size
>> when doing multi threading.
>
> hmm, seeing the patch, i think i would prefer some simpler solution,
I tend to agree, this is a bit too intrusive. But i thought
you were okay with it.
> maybe adding the size to MpegEncContext? or even better adding
> thread_context[] to H264Context, this would also avoid the casts to
> H264Context
Personally, i like the idea of having mpegvideo.c do most of the
thread context setup. In the long run i guess it will be less
code duplication and such things.
How about just letting MPV_common_init use ctx->codec->priv_data_size
for malloc'ing the contexts? Or is that too inflexible, in that case
I tend to favor adding "priv_data_size" to MpegEncContext
>> #4 - Slice level parallelism for deblocking type 0 and 2
>>
>> regression tests passes
>
> regression tests ? theres no h.264 regression test ...
> i assume you mean you tested this on several h.264 streams and the
> output is binary identical ...
It was more to make sure that i didn't screw anything else up when
doing the MPV_common_init() thingie.
I've tested with all the h264 samples from mphq and they
are all md5sum identical. I've also got a few off-list replies
with people testing the code whom reported back various issues
which I've fixed (Thanks to all of you).
> [...]
>> @@ -3022,8 +3067,18 @@
>> MpegEncContext * const s = &h->s;
>> int temp8, i;
>> uint64_t temp64;
>> - int deblock_left = (s->mb_x > 0);
>> - int deblock_top = (s->mb_y > 0);
>> + int deblock_left;
>> + int deblock_top;
>> + int mb_xy;
>> +
>> + if(h->deblocking_filter == 2) {
>> + mb_xy = s->mb_x + s->mb_y*s->mb_stride;
>> + deblock_left = h->slice_table[mb_xy] == h->slice_table[mb_xy - 1];
>> + deblock_top = h->slice_table[mb_xy] == h->slice_table[h->top_mb_xy];
>> + } else {
>> + deblock_left = (s->mb_x > 0);
>> + deblock_top = (s->mb_y > 0);
>> + }
>
> is this multitrheading specific? or a deblocking_filter == 2 fix? in the later
> case it should be in a seperate patch
It's required for multi-threading to work correctly. Otherwise,
the xchg-code might incorrectly swap in invalid data since
no deblocking has yet completed for the previous row
(at least, thats how I understand it)
Having said that, it does give a slight performance improvement
for deblocking_type == 2, so i guess it could go in as a separate patch.
> [...]
>> if(!FRAME_MBAFF){
>> int qp_thresh = 15 - h->slice_alpha_c0_offset - FFMAX(0, h->pps.chroma_qp_index_offset);
>> int qp = s->current_picture.qscale_table[mb_xy];
>> - if(qp <= qp_thresh
>> - && (mb_x == 0 || ((qp + s->current_picture.qscale_table[mb_xy-1] + 1)>>1) <= qp_thresh)
>> - && (mb_y == 0 || ((qp + s->current_picture.qscale_table[h->top_mb_xy] + 1)>>1) <= qp_thresh)){
>> - return;
>> +
>> + if(qp <= qp_thresh) {
>> + if(h->deblocking_filter == 1) {
>> + if((mb_x == 0 || ((qp + s->current_picture.qscale_table[mb_xy-1] + 1)>>1) <= qp_thresh) &&
>> + (mb_y == 0 || ((qp + s->current_picture.qscale_table[h->top_mb_xy] + 1)>>1) <= qp_thresh))
>> + return;
>> + } else {
>> + int mb_slice = h->slice_table[mb_xy];
>> + int left_qp, top_qp;
>> +
>> + left_qp = h->slice_table[mb_xy - 1] == mb_slice ? ((qp + s->current_picture.qscale_table[mb_xy-1] + 1)>>1) : 0;
>> + top_qp = h->slice_table[h->top_mb_xy] == mb_slice ? ((qp + s->current_picture.qscale_table[h->top_mb_xy] + 1)>>1) : 0;
>> + if(left_qp <= qp_thresh &&
>> + top_qp <= qp_thresh)
>> + return;
>> + }
>> }
>> }
>
> is this specific to threads?
Hm .. now when i come to think of it, it might not be needed at all.
The only time when the current code would incorrectly leave out a
deblocking round is when it happens on a slice boundary (the previous
slice may not be complete yet, and thus
qscale_table[mb_xy - <somevalue>] may be incorrect (it's still the
value from the previous frame).
These cases will be ruled out anyway, a bit further down
in the code. I'll rip this change out.
>> +
>> + h->max_contexts = avctx->thread_count > 0 ? avctx->thread_count : 1;
>
> i think thread_count must be >0
Hmm, set by whom? The application certainly doesn't need to and AFAIK
lavc doesn't peg it up to 1 either.
>
>
> [...]
>> + hx = (H264Context *)s->thread_context[h->current_context] ? (H264Context *)s->thread_context[h->current_context] : h;
>
> what about s->thread_context[0] == h, i think that would avoid this check?
> or does that cause other parts of the code to become more complex?
s->thread_context[0] is initialized during MPV_common_init(),
which is not called until the first slice is processed.
That's why I added this check. But you are right, i'm gonna add
s->thread_context[0] = h;
to decode_init() to avoid this check.
Good point.
>
>
> [...]
>> case NAL_DPA:
>> - init_get_bits(&s->gb, ptr, bit_length);
>> - h->intra_gb_ptr=
>> - h->inter_gb_ptr= NULL;
>> - s->data_partitioning = 1;
>> + init_get_bits(&hx->s.gb, ptr, bit_length);
>> + hx->intra_gb_ptr=
>> + hx->inter_gb_ptr= NULL;
>> + hx->s.data_partitioning = 1;
>>
>> - if(decode_slice_header(h) < 0){
>> + if(decode_slice_header(hx, h) < 0){
>> av_log(h->s.avctx, AV_LOG_ERROR, "decode_slice_header error\n");
>> }
>
> indention is wrong here
I'll look into it. btw parts of decode_nal_units() in svn is incorrectly
indented, wouldn't it be wise to fix that up? Should i send a patch?
>
>
> [...]
>> Index: libavcodec/h264.h
>> ===================================================================
>> --- libavcodec/h264.h (revision 9281)
>> +++ libavcodec/h264.h (working copy)
>> @@ -381,6 +381,16 @@
>> const uint8_t *field_scan8x8_cavlc_q0;
>>
>> int x264_build;
>> +
>> + /* Slice-based multi threading members.
>> + * These are only used in the "master" context */
>
> doxygen supports comments on groups of variables, this should be used
> here
Ok
More information about the ffmpeg-devel
mailing list