[FFmpeg-devel] [PATCH 2/2] libswscale: Adds ff_hscale8to15_4_avx2 and ff_hscale8to15_X4_avx2 for all filter sizes.

Ronald S. Bultje rsbultje at gmail.com
Fri Jun 25 15:24:15 EEST 2021


Hi Alan,

On Fri, Jun 25, 2021 at 7:53 AM Alan Kelly <alankelly at google.com> wrote:

>
>
> On Fri, Jun 25, 2021 at 1:26 PM Ronald S. Bultje <rsbultje at gmail.com>
> wrote:
>
>> Hi Alan,
>>
>> On Fri, Jun 25, 2021 at 3:59 AM Alan Kelly <
>> alankelly-at-google.com at ffmpeg.org> wrote:
>>
>>> These functions replace all ff_hscale8to15_*_ssse3 when avx2 is
>>> available.
>>>
>>
>> Re-asking a question I asked before in the other thread:
>>
>> Also, what is the cycle count of ssse3/avx2 implementation for this
>> specific function on Haswell? It would be good to note that in the
>> respective patch so that we understand why the check was added.
>>
>> You should be able to find this in the checkasm --bench --test=X numbers
>> for this relevant function.
>>
>> Ronald
>>
>
> Hi Ronald,
>
> Skylake Haswell
> hscale_8_to_15_width4_ssse3 761.2 760
> hscale_8_to_15_width4_avx2 468.7 957
> hscale_8_to_15_width8_ssse3 1170.7 1032
> hscale_8_to_15_width8_avx2 865.7 1979
> hscale_8_to_15_width12_ssse3 2172.2 2472
> hscale_8_to_15_width12_avx2 1245.7 2901
> hscale_8_to_15_width16_ssse3 2244.2 2400
> hscale_8_to_15_width16_avx2 1647.2 3681
>
> As you can see, it is catastrophic on Haswell. In the next iteration of
> the patch, I will update the description with these numbers.
>

Thanks, that's very helpful. No further comments from me.

Ronald


More information about the ffmpeg-devel mailing list