[FFmpeg-devel] [PATCH 1/3] avutil/imgutils: Optimize writing 4 bytes in memset_bytes()

Wed Dec 26 03:12:13 EET 2018

On 12/25/2018 7:15 PM, Michael Niedermayer wrote:
> Fixes: Timeout
> Fixes: 11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920
> Before: Executed clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 in 11294 ms
> After : Executed clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 in 4249 ms
> 
> Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
> Signed-off-by: Michael Niedermayer <michael at niedermayer.cc>
> ---
>  libavutil/imgutils.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c
> index 4938a7ef67..cc38f1e878 100644
> --- a/libavutil/imgutils.c
> +++ b/libavutil/imgutils.c
> @@ -529,6 +529,12 @@ static void memset_bytes(uint8_t *dst, size_t dst_size, uint8_t *clear,
>          }
>      } else if (clear_size == 4) {
>          uint32_t val = AV_RN32(clear);
> +        uint64_t val8 = val * 0x100000001ULL;
> +        for (; dst_size >= 32; dst_size -= 32) {
> +            AV_WN64(dst   , val8); AV_WN64(dst+ 8, val8);
> +            AV_WN64(dst+16, val8); AV_WN64(dst+24, val8);
> +            dst += 32;
> +        }

This should be wrapped with a HAVE_FAST_64BIT preprocessor check.

Also, is it much slower if you also write one per loop like everywhere
else in the function? I'd prefer if things are consistent.
Similarly, you could add four and eight bytes loops to the clear_size ==
2 case above.

>          for (; dst_size >= 4; dst_size -= 4) {
>              AV_WN32(dst, val);
>              dst += 4;
>