[FFmpeg-devel] [PATCH 3/4] libswscale/x86/rgb2rgb: add uyvytoyuv422 avx2

Tue Sep 28 10:13:20 EEST 2021

Min Chen wrote:
> 
> The current algoithm may get improve, may you combin these optimize with
> your patches? since extra VPERM make code a little more slower.
> 
> 
> 
> On Haswell
> Current alogithm:
> RSHIFT_COPY m6, m2, 1 ; UYVY UYVY -> YVYU YVY...
> pand m6, m1; YxYx YxYx... RSHIFT_COPY m7, m3, 1 ; UYVY UYVY -> YVYU YVY...
> pand m7, m1 ; YxYx YxYx... packuswb m6, m7 ; YYYY YYYY...
> 
> 
> Latency:
> 1 + 1 + 1 + 1 + 1 = 5
> 
> 
> Proposed:
> pshufb m6, m2, mX ; UYVY UYVY -> xxxx YYYY pshufb m7, m3, mX
> punpcklqdq m6, m7 ; YYYY YYYY
> 
> 
> Latency:
> 1 + 1 + 1 = 3
> 
> 
> I guess the current algorithm optimize for compatible with SSE2, because
> PSHUFB addition since SSSE3.
> Now, we try to optimzie with AVX, AVX2 and AVX512, so I suggest we use
> proposed algorithm to get more performance.
> 
> 
> Regards,
> Min Chen
> 

Hi Min Chen,

Thanks for the careful review. You're right. 

Using the specific functionalities added in AVX2/512 should be better. I'll try
your proposal and see if it has a better performance. If so, I'll resubmit the new patches.

Best regards,
Jianhua