[FFmpeg-devel] [PATCH] swscale/x86: add avx2 {lum, chr}ConvertRange
Ramiro Polla
ramiro.polla at gmail.com
Fri Jun 7 20:43:10 EEST 2024
On Fri, Jun 7, 2024 at 7:38 PM Ramiro Polla <ramiro.polla at gmail.com> wrote:
>
> chrRangeFromJpeg_8_c: 49.4
> chrRangeFromJpeg_8_sse4: 15.9
> chrRangeFromJpeg_8_avx2: 22.6
> chrRangeFromJpeg_24_c: 129.4
> chrRangeFromJpeg_24_sse4: 32.1
> chrRangeFromJpeg_24_avx2: 35.1
> chrRangeFromJpeg_128_c: 534.6
> chrRangeFromJpeg_128_sse4: 165.6
> chrRangeFromJpeg_128_avx2: 100.4
> chrRangeFromJpeg_144_c: 735.6
> chrRangeFromJpeg_144_sse4: 185.1
> chrRangeFromJpeg_144_avx2: 109.4
> chrRangeFromJpeg_256_c: 634.6
> chrRangeFromJpeg_256_sse4: 323.6
> chrRangeFromJpeg_256_avx2: 192.6
> chrRangeFromJpeg_512_c: 1242.4
> chrRangeFromJpeg_512_sse4: 662.1
> chrRangeFromJpeg_512_avx2: 409.1
> chrRangeToJpeg_8_c: 39.6
> chrRangeToJpeg_8_sse4: 15.9
> chrRangeToJpeg_8_avx2: 25.4
> chrRangeToJpeg_24_c: 118.9
> chrRangeToJpeg_24_sse4: 32.9
> chrRangeToJpeg_24_avx2: 30.1
> chrRangeToJpeg_128_c: 636.9
> chrRangeToJpeg_128_sse4: 164.4
> chrRangeToJpeg_128_avx2: 96.6
> chrRangeToJpeg_144_c: 716.4
> chrRangeToJpeg_144_sse4: 187.1
> chrRangeToJpeg_144_avx2: 109.4
> chrRangeToJpeg_256_c: 1258.6
> chrRangeToJpeg_256_sse4: 326.1
> chrRangeToJpeg_256_avx2: 193.9
> chrRangeToJpeg_512_c: 2489.4
> chrRangeToJpeg_512_sse4: 662.1
> chrRangeToJpeg_512_avx2: 382.4
> lumRangeFromJpeg_8_c: 13.6
> lumRangeFromJpeg_8_sse4: 14.4
> lumRangeFromJpeg_8_avx2: 19.6
> lumRangeFromJpeg_24_c: 38.9
> lumRangeFromJpeg_24_sse4: 18.9
> lumRangeFromJpeg_24_avx2: 23.9
> lumRangeFromJpeg_128_c: 239.4
> lumRangeFromJpeg_128_sse4: 81.9
> lumRangeFromJpeg_128_avx2: 51.6
> lumRangeFromJpeg_144_c: 285.1
> lumRangeFromJpeg_144_sse4: 92.1
> lumRangeFromJpeg_144_avx2: 59.6
> lumRangeFromJpeg_256_c: 857.1
> lumRangeFromJpeg_256_sse4: 164.4
> lumRangeFromJpeg_256_avx2: 101.9
> lumRangeFromJpeg_512_c: 1028.6
> lumRangeFromJpeg_512_sse4: 335.6
> lumRangeFromJpeg_512_avx2: 201.4
> lumRangeToJpeg_8_c: 20.4
> lumRangeToJpeg_8_sse4: 14.4
> lumRangeToJpeg_8_avx2: 20.4
> lumRangeToJpeg_24_c: 58.1
> lumRangeToJpeg_24_sse4: 18.9
> lumRangeToJpeg_24_avx2: 22.6
> lumRangeToJpeg_128_c: 327.4
> lumRangeToJpeg_128_sse4: 83.4
> lumRangeToJpeg_128_avx2: 53.6
> lumRangeToJpeg_144_c: 375.6
> lumRangeToJpeg_144_sse4: 93.9
> lumRangeToJpeg_144_avx2: 58.9
> lumRangeToJpeg_256_c: 649.6
> lumRangeToJpeg_256_sse4: 162.1
> lumRangeToJpeg_256_avx2: 101.9
> lumRangeToJpeg_512_c: 1289.1
> lumRangeToJpeg_512_sse4: 335.6
> lumRangeToJpeg_512_avx2: 201.4
> ---
> libswscale/x86/range_convert.asm | 46 ++++++++++++++++++++++++++------
> libswscale/x86/swscale.c | 5 +++-
> 2 files changed, 42 insertions(+), 9 deletions(-)
>
> diff --git a/libswscale/x86/range_convert.asm b/libswscale/x86/range_convert.asm
> index 13983a386b..54c2f64769 100644
> --- a/libswscale/x86/range_convert.asm
> +++ b/libswscale/x86/range_convert.asm
[...]
> @@ -66,10 +66,19 @@ cglobal %1, 2, 3, 3, dst, width, x
> paddd m1, m5
> psrad m0, %4
> psrad m1, %4
> +%if mmsize == 16
> packssdw m0, m0
> packssdw m1, m1
> movq [dstq+xq*2], m0
> movq [dstq+xq*2+mmsize/2], m1
> +%else
> + vextracti128 xm7, ym0, 1
> + packssdw xm0, xm7
> + vextracti128 xm7, ym1, 1
> + packssdw xm1, xm7
> + movdqu [dstq+xq*2], xm0
> + movdqu [dstq+xq*2+mmsize/2], xm1
> +%endif
> add xq, mmsize / 2
> cmp xd, widthd
> jl .loop
Is there a cleaner way to do this packing in avx2 (or a macro to have
the same code as non-avx2)? Also is there some cleaner way to move
half the register into memory (instead of having to ifdef between
mmsize)?
More information about the ffmpeg-devel
mailing list