[FFmpeg-devel] 回复: [PATCH 1/3] avcodec/x86/vvc/vvc_alf: fix integer overflow

Thu May 30 19:31:57 EEST 2024

Ronald S. Bultje:
> 发件人: Ronald S. Bultje <rsbultje at gmail.com>
> 发送时间: 2024年5月29日 13:56
> 收件人: Wu Jianhua
> 抄送: FFmpeg development discussions and patches; Nuo Mi; James Almer
> 主题: Re: [FFmpeg-devel] [PATCH 1/3] avcodec/x86/vvc/vvc_alf: fix integer overflow
> 
> Hi,
> 
> On Wed, May 29, 2024 at 3:44 PM Wu Jianhua <toqsxw at outlook.com<mailto:toqsxw at outlook.com>> wrote:
> Ronald S. Bultje:
>> On Wed, May 29, 2024 at 11:38 AM <toqsxw at outlook.com<mailto:toqsxw at outlook.com>> <mailto:toqsxw at outlook.com<mailto:toqsxw at outlook.com>>> wrote:
>> +%else
>> +    vpunpcklqdq      m11, m2, m2
>> +    vpunpckhqdq      m12, m2, m2
>> +    vpunpcklwd       m11, m11, m14
>> +    vpunpcklwd       m12, m12, m14
>> +    paddd             m0, m11
>> +    paddd             m1, m12
>>  +    packssdw          m0, m0, m1
>> +%endif
>
> [..]
> > Also, the whole thing just emulates a saturated add. Can't you use paddsw instead of paddw and be done with it? To add to Andreas' question: is >>  saturating here normatively required?
> 
> > We didn't have any sample that failed for this issue except for the checksum with specific seeds. I think we can keep not changing it until a real  sample has something wrong.
> 
> @Nuomi to get more details.
> 
> I think "just" replacing paddw with paddsw is correct, since the input pixels are 12bit (so they could be either unsigned or signed), the filtered output > is the result of packssdw (so signed words), and the desired output is 12bit pixels anyway, anything greater than that is clipped to 12bit range. So to > me, it seems paddsw is a cheaper way to accomplish the same thing.
> 
> Ronald

Hi Ronald,

Yes, it does. I've test paddsw and everything works well. It must be a cheaper way to get minimum performance loss.

And v2 sent.

Thanks for this.
Jianhua