[FFmpeg-devel] [PATCH] VP8 MMX optimizations (MC and IDCT dc_add)

Wed Jun 23 13:15:25 CEST 2010

Jason Garrett-Glaser <darkshikari at gmail.com> writes:

> +static void ff_put_vp8_epel16_h ## TAPNUMX ## v ## TAPNUMY ## _ ## INSTR( \
> +                                                uint8_t *dst, \
> +                                                uint8_t *src, \
> +                                                int stride, int height, \
> +                                                int mx, int my) \
> +{ \
> +    uint8_t tmp_arr[stride * (16 + TAPNUMY - 1)], \

This is insane.  Not only is it a VLA, which is bad in itself, it's a
HUGE one.  For an HD video, it will be roughly 40k, far more than
should go on the stack.  You're also using only a few bytes of this.

> +           *tmp = tmp_arr + stride * (TAPNUMY / 2 - 1); \
> + \
> +    ff_put_vp8_epel16_h ## TAPNUMX ## _ ##INSTR(tmp_arr, \
> +                                                src - stride * (TAPNUMY / 2 - 1), \
> +                                                stride, \
> +                                                height + TAPNUMY - 1, mx, my); \
> +    ff_put_vp8_epel16_v ## TAPNUMY ## _ ##INSTR(dst, tmp, stride, \
> +                                                height, mx, my); \
> +}

Change these functions to take separate source and dest strides, and
make the temp array a sensible size.  Aligning the temp array is
probably a good idea too.

-- 
M?ns Rullg?rd
mans at mansr.com