[FFmpeg-devel] swscale/rgb2rgb : add X86_64 SIMD (SSSE3 and AVX2) for shuffly_bytes func
jamrial at gmail.com
Sun Mar 18 18:47:56 EET 2018
On 3/18/2018 1:28 PM, Nicolas George wrote:
> Martin Vignali (2018-03-18):
>> I run the test again with a bigger width (512 instead of 128)
>> This is my result :
>> shuffle_bytes_0321_c: 128.6
>> shuffle_bytes_0321_ssse3: 41.6
>> shuffle_bytes_0321_avx2: 23.4
> IIUC, these benchmarks are expressed in CPU cycles. But what James says
> is that it can cause the CPU frequency to be throttled: if that happens,
> less cycles can use more time, and even worse, cause other unrelated to
> take more time. A benchmark in actual time and typical use case would be
> needed to decide.
In any case, short of swscale being used without any decoding going on,
AVX2 code is most likely going to be used and said throttling will
already have taken place because countless other functions.
And 2x speed up from an AVX2 version is basically the best you're going
to get out of such an implementation.
More information about the ffmpeg-devel