[FFmpeg-devel] [PATCH] vp9: add 32x32 idct AVX2 implementation.

Ronald S. Bultje rsbultje at gmail.com
Sat Jul 16 06:10:56 EEST 2016


Hi,

On Wed, Jul 13, 2016 at 12:37 PM, Ronald S. Bultje <rsbultje at gmail.com>
wrote:

> About 1.8x speedup compared to AVX version for full IDCT. Other
> sub-IDCT scenarios also see speedups. Full --bench output for
> idct_32x32_add_{bpp}_${subidct}_${opt} (50k cycles):
>
> nop: 16.5
> vp9_inv_dct_dct_32x32_add_8_1_c: 2284.4
> vp9_inv_dct_dct_32x32_add_8_1_sse2: 145.0
> vp9_inv_dct_dct_32x32_add_8_1_ssse3: 137.4
> vp9_inv_dct_dct_32x32_add_8_1_avx: 137.1
> vp9_inv_dct_dct_32x32_add_8_1_avx2: 73.2
> vp9_inv_dct_dct_32x32_add_8_2_c: 14680.8
> vp9_inv_dct_dct_32x32_add_8_2_sse2: 2617.2
> vp9_inv_dct_dct_32x32_add_8_2_ssse3: 982.9
> vp9_inv_dct_dct_32x32_add_8_2_avx: 958.5
> vp9_inv_dct_dct_32x32_add_8_2_avx2: 704.2
> vp9_inv_dct_dct_32x32_add_8_4_c: 14443.1
> vp9_inv_dct_dct_32x32_add_8_4_sse2: 2717.1
> vp9_inv_dct_dct_32x32_add_8_4_ssse3: 965.7
> vp9_inv_dct_dct_32x32_add_8_4_avx: 1000.7
> vp9_inv_dct_dct_32x32_add_8_4_avx2: 717.1
> vp9_inv_dct_dct_32x32_add_8_8_c: 14436.4
> vp9_inv_dct_dct_32x32_add_8_8_sse2: 2671.8
> vp9_inv_dct_dct_32x32_add_8_8_ssse3: 1038.5
> vp9_inv_dct_dct_32x32_add_8_8_avx: 983.0
> vp9_inv_dct_dct_32x32_add_8_8_avx2: 729.4
> vp9_inv_dct_dct_32x32_add_8_16_c: 14614.7
> vp9_inv_dct_dct_32x32_add_8_16_sse2: 2701.7
> vp9_inv_dct_dct_32x32_add_8_16_ssse3: 1334.4
> vp9_inv_dct_dct_32x32_add_8_16_avx: 1276.7
> vp9_inv_dct_dct_32x32_add_8_16_avx2: 719.5
> vp9_inv_dct_dct_32x32_add_8_32_c: 14363.6
> vp9_inv_dct_dct_32x32_add_8_32_sse2: 2575.6
> vp9_inv_dct_dct_32x32_add_8_32_ssse3: 2633.9
> vp9_inv_dct_dct_32x32_add_8_32_avx: 2539.6
> vp9_inv_dct_dct_32x32_add_8_32_avx2: 1395.0
> ---
>  libavcodec/x86/vp9dsp_init.c |   2 +
>  libavcodec/x86/vp9itxfm.asm  | 223
> ++++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 222 insertions(+), 3 deletions(-)


Ping.

Ronald


More information about the ffmpeg-devel mailing list