[FFmpeg-devel] [PATCH] Added ff_v210_planar_unpack_aligned_avx2
Mike Stoner
mdstoner23 at yahoo.com
Thu Mar 7 08:50:47 EET 2019
Thanks for the feedback. You are right, I can use VPERMQ to free up a register. I can also remove the PAND mask by doing PSLLD/PSRLD. That eliminates the need for an x86-64 block.
I tried the naive 'unrolled' version with no permute, and it was much slower, about the same as the AVX/SSSE3 code. VPERMQ/D is a single shuffle uop on port 5, so it turns out to be useful.
I will submit a new patch with those improvements and the VBROADCASTI128 macro. I role-modeled my code from 'v210enc.asm' which also could be updated with VBROADCASTI128.
Note, I'm running on Windows and it looks like 'checkasm' performance benchmarking is only enabled on Linux. For my tests I put a 100x loop around the 'unpack_frame' call and ran:
ffmpeg.exe -s:v 1920x1080 -vcodec v210 -stream_loop 200 -i OddaView_1920x1080.v210 -f null -y NUL
If there is a better way, let me know...
Thanks,Mike
More information about the ffmpeg-devel
mailing list