[FFmpeg-devel] [PATCH 6/9] vp9: add keyframe profile 2/3 support.

Wed May 6 19:56:42 CEST 2015

Hi,

On Wed, May 6, 2015 at 1:40 PM, Carl Eugen Hoyos <cehoyos at ag.or.at> wrote:
> Ronald S. Bultje <rsbultje <at> gmail.com> writes:
> > +static void vert_4x4_c(uint8_t *_dst, ptrdiff_t stride,
> > +                       const uint8_t *left, const uint8_t *_top)
>
> Once upon a time, it was claimed that we must not
> use identifiers starting with "_".

Well, they're not really variable names, just pre-cast placeholders. I'm
basically just copying the approach that hevc/h264 templating uses. For
example:

static void FUNC(put_hevc_pel_pixels)(int16_t *dst,
                                      uint8_t *_src, ptrdiff_t _srcstride,
                                      int height, intptr_t mx, intptr_t my,
int width)
{
    int x, y;
    pixel *src          = (pixel *)_src;
    ptrdiff_t srcstride = _srcstride / sizeof(pixel);

Note that all pre-cast placeholders start with a _ to prevent name-clashes
with the post-cast variables of interest.

> Would it be slower to decode to YUV420P16 and set
> bits_per_coded_sample? (Just being curious.)

Someone capable and interested would need to test this. I simply copied the
hevc/h264 approach.

Theoretically, I think certain parts would be faster if we kept p10/p12,
e.g. that flat loop filter (since it can be done in 16bits, and would need
to be done in 32bits if we used p16). I also think the directional
predictors (3-tap, specifically) would be slightly more complex in p16 than
in p10/p12 (see also how we do it for 8bit to keep it 8bits instead of
going to 16bits). This is admittedly minor, but it's still a factor.
Overall, the effect would be minor, like in the lower single-digit percents
or perhaps even fractional percent, but I would absolutely expect a small
performance gain from using p10/p12 over p16 w/ bits_per_coded_sample. Also
note most of this would only be noticeable after simd optimizations; in C
there would be no difference.

Ronald