[FFmpeg-devel] [PATCH 2/2] libswscale: Adds ff_hscale8to15_4_avx2 and ff_hscale8to15_X4_avx2 for all filter sizes.

Alan Kelly alankelly at google.com
Fri Jun 25 14:52:51 EEST 2021


On Fri, Jun 25, 2021 at 1:26 PM Ronald S. Bultje <rsbultje at gmail.com> wrote:

> Hi Alan,
>
> On Fri, Jun 25, 2021 at 3:59 AM Alan Kelly <
> alankelly-at-google.com at ffmpeg.org> wrote:
>
>> These functions replace all ff_hscale8to15_*_ssse3 when avx2 is available.
>>
>
> Re-asking a question I asked before in the other thread:
>
> Also, what is the cycle count of ssse3/avx2 implementation for this
> specific function on Haswell? It would be good to note that in the
> respective patch so that we understand why the check was added.
>
> You should be able to find this in the checkasm --bench --test=X numbers
> for this relevant function.
>
> Ronald
>

Hi Ronald,

Skylake Haswell
hscale_8_to_15_width4_ssse3 761.2 760
hscale_8_to_15_width4_avx2 468.7 957
hscale_8_to_15_width8_ssse3 1170.7 1032
hscale_8_to_15_width8_avx2 865.7 1979
hscale_8_to_15_width12_ssse3 2172.2 2472
hscale_8_to_15_width12_avx2 1245.7 2901
hscale_8_to_15_width16_ssse3 2244.2 2400
hscale_8_to_15_width16_avx2 1647.2 3681

As you can see, it is catastrophic on Haswell. In the next iteration of the
patch, I will update the description with these numbers.

Thanks


More information about the ffmpeg-devel mailing list