[FFmpeg-devel] [PATCH] h264: integrate clear_blocks calls with IDCT.

Fri Feb 8 20:53:14 CET 2013

Hi,

On Fri, Feb 8, 2013 at 9:53 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Fri, Feb 08, 2013 at 09:09:41AM -0800, Ronald S. Bultje wrote:
>> Hi,
>>
>> On Fri, Feb 8, 2013 at 4:51 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
>> > On Thu, Feb 07, 2013 at 10:20:52PM -0800, Ronald S. Bultje wrote:
>> >> From: "Ronald S. Bultje" <rsbultje at gmail.com>
>> >>
>> >> In case of no-transform, integrate it with put_pixels4/8(). In case
>> >> of intra PCM, do an explicit memset(0) call. Together, this makes
>> >> the H264 decoder almost-independent of dsputil.
>> >>
>> >> (PPC and Arm assembly not yet ported.)
>> >
>> > breaks fate-h264-lossless
>>
>> Will check, I thought it worked locally.
>>
>> >> diff --git a/libavcodec/h264_mb_template.c b/libavcodec/h264_mb_template.c
>> >> index a841555..4fe2030 100644
>> >> --- a/libavcodec/h264_mb_template.c
>> >> +++ b/libavcodec/h264_mb_template.c
>> >> @@ -133,6 +133,7 @@ static av_noinline void FUNC(hl_decode_mb)(H264Context *h)
>> >>                      }
>> >>                  }
>> >>              }
>> >> +            memset(h->mb, 0, ff_h264_mb_sizes[h->sps.chroma_format_idc] * 2);
>> >>          } else {
>> >>              for (i = 0; i < 16; i++)
>> >>                  memcpy(dest_y + i * linesize, (uint8_t *)h->mb + i * 16, 16);
>> >> @@ -151,6 +152,7 @@ static av_noinline void FUNC(hl_decode_mb)(H264Context *h)
>> >>                      }
>> >>                  }
>> >>              }
>> >> +            memset(h->mb, 0, ff_h264_mb_sizes[h->sps.chroma_format_idc]);
>> >>          }
>> >>      } else {
>> >>          if (IS_INTRA(mb_type)) {
>> >
>> > why should these (and other cases) use memset instead of (the faster)
>> > clear_block* ?
>>
>> The first is init code, so speed is irrelevant.
>>
>> The second remaining use of memset(0) is intra PCM. Two comments on
>> that. First, from h264_cabac.c:
>>         // We assume these blocks are very rare so we do not optimize it.
>>         // FIXME The two following lines get the bitstream position in the cabac
>>         // decode, I think it should be done by a function in cabac.h
>> (or cabac.c).
>> I like that concept if it removes dsputil dependencies.
>>
>> Then second:
>>         // The pixels are stored in the same order as levels in h->mb array.
>>         if ((int) (h->cabac.bytestream_end - ptr) < mb_size)
>>             return -1;
>>         memcpy(h->mb, ptr, mb_size); ptr+=mb_size;
>> There is no reason to memcpy here, the source data remains valid. We
>> can just store the pointer and read from there directly. If we do
>> that, we don't need to clear h->mb for intraPCM anymore either. That
>> was my longer-term plan, I can do that in the next patch iteration.
>
> i dont object to replacing clear_block* where it has no meassureable
> speed effect
> but iam not sure if making only specific dsp util contexts but not
> a misc (what doesnt fit in the others) context available to
> h264 is really a good idea.
> It might work out now but the first function that one wants to
> optimize that doesnt fit in the existing specific dsp contexts would
> need to be put somewhere else.
> And if a misc context is available, clear_block probably would be
> in it and could then still be used instead of memset even if the
> speed difference is negligible

That would probably be videodsp.

I disagree clear_blocks would go there. It is _always_ faster to clear
as part of the IDCT, so we should move towards that. clear_blocks is
just legacy which we keep around to not break codecs that haven't been
converted to the new logic yet.

Ronald