[FFmpeg-devel] [PATCH v2 3/4] libswscale/x86/rgb2rgb: add uyvytoyuv422 avx2

chen chenm003 at 163.com
Thu Sep 30 10:39:39 EEST 2021




At 2021-09-30 15:23:08, "Wu, Jianhua" <jianhua.wu at intel.com> wrote:
>Min Chen wrote:
>> Sent: Thursday, September 30, 2021 10:29 AM
>> To: FFmpeg development discussions and patches <ffmpeg-
>> devel at ffmpeg.org>
>> Subject: Re: [FFmpeg-devel] [PATCH v2 3/4] libswscale/x86/rgb2rgb: add
>> uyvytoyuv422 avx2
>> 
>> Hello,
>> 
>> >+pb_shuffle_low: times 4 db 1, 3, 5, 7, 9, 11, 13, 15, -1, -1, -1, -1,
>> >+-1, -1, -1, -1
>> Why we times 4?
>> AVX2 provided instruction VPBROADCASTQ to load these constant into SIMD
>> register.
>> 
>> Moreover, the plane U/V also apply same algorithm to get improve.
>> 
>> Regards,
>> Min Chen
>> 
>Hi Min Chen,
>
>Much appreciated your helpful suggestions. 
>
>Correct! It's not necessary to use time 4 here.  It's funny that I did try to avoid using it here
>when writing the codes and get no way because I ignored the VBROADCASTI128 instruction.
>
>About the UV extracting, I have estimated the new method before making a decision to keep
>using the masterpiece of the previous author. The former is better, and pand instruction has a better
>reciprocal throughput, or issue latency.
>
>Best regards,
>Jianhua



For VBROADCASTI128, we don't care high part of result, so we just need lowest 64-bits constant table. VPBROADCASTQ enough.


Regards,
Min Chen


More information about the ffmpeg-devel mailing list