[FFmpeg-devel] [PATCH 3/3] avcodec/scpr: optimize shift loop.

Sat Sep 9 00:43:06 EEST 2017

On 9/8/2017 6:29 PM, Michael Niedermayer wrote:
> Speeds code up from 50sec to 15sec
> 
> Fixes Timeout
> Fixes: 3242/clusterfuzz-testcase-5811951672229888
> 
> Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
> Signed-off-by: Michael Niedermayer <michael at niedermayer.cc>
> ---
>  libavcodec/scpr.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/libavcodec/scpr.c b/libavcodec/scpr.c
> index 37fbe7a106..2ef63a7bf8 100644
> --- a/libavcodec/scpr.c
> +++ b/libavcodec/scpr.c
> @@ -827,7 +827,16 @@ static int decode_frame(AVCodecContext *avctx, void *data, int *got_frame,
>              return ret;
>  
>          for (y = 0; y < avctx->height; y++) {
> -            for (x = 0; x < avctx->width * 4; x++) {
> +            if (!(((uintptr_t)dst) & 7)) {
> +                uint64_t *dst64 = (uint64_t *)dst;
> +                int w = avctx->width>>1;
> +                for (x = 0; x < w; x++) {
> +                    dst64[x] = (dst64[x] << 3) & 0xFCFCFCFCFCFCFCFCULL;

Shouldn't this be used only if HAVE_FAST_64BIT is true, and a version
shifting four bytes at a time used otherwise? That's how we do almost
everywhere else.

The chances for anyone bothering writing simd for this decoder are
almost none, so adding C optimized loops is ok in this case.

> +                }
> +                x *= 8;
> +            } else
> +                x = 0;

How does this fix the timeout if the new code is only run if the pointer
is eight byte aligned? (or four once you add that).

> +            for (; x < avctx->width * 4; x++) {
>                  dst[x] = dst[x] << 3;
>              }
>              dst += frame->linesize[0];
>