[FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Adds av_cpu_has_fast_gather to detect cpus with avx fast gather instruction

James Almer jamrial at gmail.com
Thu Jun 24 16:49:14 EEST 2021


On 6/24/2021 10:30 AM, Alan Kelly wrote:
> Hi,
> 
> Sorry for the late reply, busy oncall week. Thanks for your responses. I
> have looked at the code for cpuflags and what you suggested makes sense. I
> just have a question about naming. EXTERNAL_AVX2_FAST is already used in
> many places - it checks whether the flag AV_CPU_FLAG_AVXSLOW is set so I
> can't use this as it would change the meaning of it. Could I define a flag
> like for AV_CPU_FLAG_CMOV? AV_CPU_FLAG_FAST_GATHER or similar? Or could you
> please suggest a better solution.

Add a new AV_CPU_FLAG_AVX2SLOW public define (use the available value 
0x2000000), then maybe add an internal EXTERNAL_AVX2_FAST_GATHER() 
helper macro that expands to CPUEXT_SUFFIX_FAST(flags, _EXTERNAL, AVX2).

The AV_CPU_FLAG_AVX2SLOW flag should be set for all cpus currently being 
flagged as AV_CPU_FLAG_AVXSLOW, plus Haswell and all AMD cpus prior to 
Zen 3.

Also, please don't top post when replying to an email.

> 
> Thanks
> 
> On Mon, Jun 14, 2021 at 2:17 PM James Almer <jamrial at gmail.com> wrote:
> 
>> On 6/14/2021 8:53 AM, Ronald S. Bultje wrote:
>>> Hi Alan,
>>>
>>> On Mon, Jun 14, 2021 at 7:20 AM Alan Kelly <
>>> alankelly-at-google.com at ffmpeg.org> wrote:
>>>
>>>> Broadwell and later have fast gather instructions.
>>>> ---
>>>>    This is so that the avx2 version of ff_hscale8to15X which uses gather
>>>>    instructions is only selected on machines where it will actually be
>>>>    faster.
>>>>
>>>
>>> We've in the past typically done this with a bit in the cpuflags return
>>> value. Can this be added there instead of being its own function?
>>>
>>> Also, what is the cycle count of ssse3/avx2 implementation for this
>>> specific function on Haswell? It would be good to note that in the
>>> respective patch so that we understand why the check was added.
>>
>> Between 9 and 12 on Haswell, 5 to 7 on Broadwell, and about 2 to 5 on
>> Skylake and newer, acording to Agner's pdf if i'm reading it right. It's
>> also slow on AMD before Zen 3.
>>
>> And yes, this should if anything be a new cpu flag and not a new function.
>> _______________________________________________
>> ffmpeg-devel mailing list
>> ffmpeg-devel at ffmpeg.org
>> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>
>> To unsubscribe, visit link above, or email
>> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
>>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
> 



More information about the ffmpeg-devel mailing list