[FFmpeg-devel] [PATCH] x86/hecv_res_add: add ff_hevc_transform_add{8, 16, 32}_8_avx

Christophe Gisquet christophe.gisquet at gmail.com
Wed Aug 20 09:29:42 CEST 2014


Hi,

2014-08-20 4:55 GMT+02:00 James Almer <jamrial at gmail.com>:
> ~15% faster than sse2
[...]
> @@ -509,7 +509,11 @@ void ff_hevc_dsp_init_x86(HEVCDSPContext *c, const int bit_depth)
>              if (ARCH_X86_64) {
>                  c->hevc_v_loop_filter_luma = ff_hevc_v_loop_filter_luma_8_avx;
>                  c->hevc_h_loop_filter_luma = ff_hevc_h_loop_filter_luma_8_avx;
> +
> +                c->transform_add[2]    = ff_hevc_transform_add16_8_avx;
> +                c->transform_add[3]    = ff_hevc_transform_add32_8_avx;

Does avx => ARCH_X86_64 (didn't know) ? Otherwise the reg count seems
fine, meaning the condition is unneeded.

>              }
> +            c->transform_add[1]    = ff_hevc_transform_add8_8_avx;

I'm not entirely sure, but this is instantiated through INIT_YMM avx2,
and I wouldn't expect performance improvement past the 3-op-form?

So couldn't this one be instantiated to use xmm regs? (mmx may be a
burden eg need for emms and need to rewrite it).

-- 
Christophe


More information about the ffmpeg-devel mailing list