[FFmpeg-devel] [PATCH 1/3] avutil/imgutils: Optimize writing 4 bytes in memset_bytes()
Marton Balint
cus at passwd.hu
Sat Jan 19 01:28:25 EET 2019
On Thu, 17 Jan 2019, Michael Niedermayer wrote:
> On Wed, Jan 16, 2019 at 08:00:22PM +0100, Marton Balint wrote:
>>
>>
>> On Tue, 15 Jan 2019, Michael Niedermayer wrote:
>>
>>> On Sun, Dec 30, 2018 at 07:15:49PM +0100, Marton Balint wrote:
>>>>
>>>>
>>>> On Fri, 28 Dec 2018, Michael Niedermayer wrote:
>>>>
>>>>> On Wed, Dec 26, 2018 at 10:16:47PM +0100, Marton Balint wrote:
>>>>>>
>>>>>>
>>>>>> On Wed, 26 Dec 2018, Paul B Mahol wrote:
>>>>>>
>>>>>>> On 12/26/18, Michael Niedermayer <michael at niedermayer.cc> wrote:
>>>>>>>> On Wed, Dec 26, 2018 at 04:32:17PM +0100, Paul B Mahol wrote:
>>>>>>>>> On 12/25/18, Michael Niedermayer <michael at niedermayer.cc> wrote:
>>>>>>>>>> Fixes: Timeout
>>>>>>>>>> Fixes:
>>>>>>>>>> 11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920
>>>>>>>>>> Before: Executed
>>>>>>>>>> clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920
>>>>>>>>>> in 11294 ms
>>>>>>>>>> After : Executed
>>>>>>>>>> clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920
>>>>>>>>>> in 4249 ms
>>>>>>>>>>
>>>>>>>>>> Found-by: continuous fuzzing process
>>>>>>>>>> https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
>>>>>>>>>> Signed-off-by: Michael Niedermayer <michael at niedermayer.cc>
>>>>>>>>>> ---
>>>>>>>>>> libavutil/imgutils.c | 6 ++++++
>>>>>>>>>> 1 file changed, 6 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/libavutil/imgutils.c b/libavutil/imgutils.c
>>>>>>>>>> index 4938a7ef67..cc38f1e878 100644
>>>>>>>>>> --- a/libavutil/imgutils.c
>>>>>>>>>> +++ b/libavutil/imgutils.c
>>>>>>>>>> @@ -529,6 +529,12 @@ static void memset_bytes(uint8_t *dst, size_t
>>>>>>>>>> dst_size,
>>>>>>>>>> uint8_t *clear,
>>>>>>>>>> }
>>>>>>>>>> } else if (clear_size == 4) {
>>>>>>>>>> uint32_t val = AV_RN32(clear);
>>>>>>>>>> + uint64_t val8 = val * 0x100000001ULL;
>>>>>>>>>> + for (; dst_size >= 32; dst_size -= 32) {
>>>>>>>>>> + AV_WN64(dst , val8); AV_WN64(dst+ 8, val8);
>>>>>>>>>> + AV_WN64(dst+16, val8); AV_WN64(dst+24, val8);
>>>>>>>>>> + dst += 32;
>>>>>>>>>> + }
>>>>>>>>>> for (; dst_size >= 4; dst_size -= 4) {
>>>>>>>>>> AV_WN32(dst, val);
>>>>>>>>>> dst += 4;
>>>>>>>>>> --
>>>>>>>>>> 2.20.1
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> NAK, implement special memset function instead.
>>>>>>>>
>>>>>>>> I can move the added loop into a seperate function, if thats what you
>>>>>>>> suggest ?
>>>>>>>
>>>>>>> No, don't do that.
>>>>>>>
>>>>>>>> All the code is already in a "special" memset though, this is
>>>>>>>> memset_bytes()
>>>>>>>>
>>>>>>>
>>>>>>> I guess function is less useful if its static. So any duplicate should
>>>>>>> be avoided in codebase.
>>>>>>
>>>>>> Isn't av_memcpy_backptr does almost exactly what is needed here? That can
>>>>>> also be optimized further if needed.
>>>>>
>>>>> av_memcpy_backptr() copies data with overlap, its more like a recursive
>>>>> memmove().
>>>>
>>>> So? As far as I see the memset_bytes function in imgutils.c can be replaced
>>>> with this:
>>>>
>>>> if (clear_size > dst_size)
>>>> clear_size = dst_size;
>>>> memcpy(dst, clear, clear_size);
>>>> av_memcpy_backptr(dst + clear_size, clear_size, dst_size - clear_size);
>>>>
>>>> I am not against an av_memset_bytes API addition, but I believe it should
>>>> share code with av_memcpy_backptr to avoid duplication.
>>>
>>> ive implemented this, it does not seem to be really faster in the testcase
>>
>> I guess it is not faster because you have not applied your original
>> optimalization to fill32 in libavutil/mem.c. Could you compare speed after
>> optimizing that the same way your original patch did it with imgutils
>> memset_bytes?
>
> sure, that makes it faster:
Thanks, both patches LGTM.
Regards,
Marton
More information about the ffmpeg-devel
mailing list