[FFmpeg-devel] [PATCH 1/2] avcodec/lcldec: Optimize YUV422 case
James Almer
jamrial at gmail.com
Sun Jul 28 17:06:16 EEST 2019
On 7/28/2019 8:56 AM, Michael Niedermayer wrote:
> On Sun, Jul 28, 2019 at 12:45:36AM +0200, Reimar Döffinger wrote:
>>
>>
>> On 28.07.2019, at 00:31, Michael Niedermayer <michael at niedermayer.cc> wrote:
>>
>>> This merges several byte operations and avoids some shifts inside the loop
>>>
>>> Improves: Timeout (330sec -> 134sec)
>>> Improves: 15599/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_MSZH_fuzzer-5658127116009472
>>>
>>> Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
>>> Signed-off-by: Michael Niedermayer <michael at niedermayer.cc>
>>> ---
>>> libavcodec/lcldec.c | 10 +++++-----
>>> 1 file changed, 5 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/libavcodec/lcldec.c b/libavcodec/lcldec.c
>>> index 104defa5f5..c3787b3cbe 100644
>>> --- a/libavcodec/lcldec.c
>>> +++ b/libavcodec/lcldec.c
>>> @@ -391,13 +391,13 @@ static int decode_frame(AVCodecContext *avctx, void *data, int *got_frame, AVPac
>>> break;
>>> case IMGTYPE_YUV422:
>>> for (row = 0; row < height; row++) {
>>> - for (col = 0; col < width - 3; col += 4) {
>>> + for (col = 0; col < (width - 2)>>1; col += 2) {
>>> memcpy(y_out + col, encoded, 4);
>>> encoded += 4;
>>> - u_out[ col >> 1 ] = *encoded++ + 128;
>>> - u_out[(col >> 1) + 1] = *encoded++ + 128;
>>> - v_out[ col >> 1 ] = *encoded++ + 128;
>>> - v_out[(col >> 1) + 1] = *encoded++ + 128;
>>> + AV_WN16(u_out + col, AV_RN16(encoded) ^ 0x8080);
>>> + encoded += 2;
>>> + AV_WN16(v_out + col, AV_RN16(encoded) ^ 0x8080);
>>> + encoded += 2;
>>
>> Huh? Surely the pixel stride used for y_out still needs to be double of the u/v one?
>
>> I suspect doing only the AV_RN16/xor optimization might be best, the one shift saved seems not worth the risk/complexity...
>
> if you want i can remove the shift change ?
> with the fixed shift change its 155sec, if i remove the shift optimization its 170sec
>
> patch for the 155 case below:
>
> commit 56998b7d57a2cd0ed7f53981c50e76fd419cd86f (HEAD)
> Author: Michael Niedermayer <michael at niedermayer.cc>
> Date: Sat Jul 27 22:46:34 2019 +0200
>
> avcodec/lcldec: Optimize YUV422 case
>
> This merges several byte operations and avoids some shifts inside the loop
>
> Improves: Timeout (330sec -> 155sec)
> Improves: 15599/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_MSZH_fuzzer-5658127116009472
>
> Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
> Signed-off-by: Michael Niedermayer <michael at niedermayer.cc>
>
> diff --git a/libavcodec/lcldec.c b/libavcodec/lcldec.c
> index 104defa5f5..9e018ff5a9 100644
> --- a/libavcodec/lcldec.c
> +++ b/libavcodec/lcldec.c
> @@ -391,13 +391,13 @@ static int decode_frame(AVCodecContext *avctx, void *data, int *got_frame, AVPac
> break;
> case IMGTYPE_YUV422:
> for (row = 0; row < height; row++) {
> - for (col = 0; col < width - 3; col += 4) {
> - memcpy(y_out + col, encoded, 4);
> + for (col = 0; col < (width - 2)>>1; col += 2) {
> + memcpy(y_out + 2 * col, encoded, 4);
> encoded += 4;
> - u_out[ col >> 1 ] = *encoded++ + 128;
> - u_out[(col >> 1) + 1] = *encoded++ + 128;
> - v_out[ col >> 1 ] = *encoded++ + 128;
> - v_out[(col >> 1) + 1] = *encoded++ + 128;
> + AV_WN16(u_out + col, AV_RN16(encoded) ^ 0x8080);
> + encoded += 2;
> + AV_WN16(v_out + col, AV_RN16(encoded) ^ 0x8080);
> + encoded += 2;
> }
> y_out -= frame->linesize[0];
> u_out -= frame->linesize[1];
> [...]
As others pointed before, this kind of optimization is usually meant for
the SIMD implementations and not the C boilerplate/reference. So
prioritize readability above speed if possible when choosing which
version to apply.
More information about the ffmpeg-devel
mailing list