[FFmpeg-devel] [PATCH] x86/hecv_res_add: add ff_hevc_transform_add{8, 16, 32}_8_avx
Christophe Gisquet
christophe.gisquet at gmail.com
Wed Aug 20 09:29:42 CEST 2014
Hi,
2014-08-20 4:55 GMT+02:00 James Almer <jamrial at gmail.com>:
> ~15% faster than sse2
[...]
> @@ -509,7 +509,11 @@ void ff_hevc_dsp_init_x86(HEVCDSPContext *c, const int bit_depth)
> if (ARCH_X86_64) {
> c->hevc_v_loop_filter_luma = ff_hevc_v_loop_filter_luma_8_avx;
> c->hevc_h_loop_filter_luma = ff_hevc_h_loop_filter_luma_8_avx;
> +
> + c->transform_add[2] = ff_hevc_transform_add16_8_avx;
> + c->transform_add[3] = ff_hevc_transform_add32_8_avx;
Does avx => ARCH_X86_64 (didn't know) ? Otherwise the reg count seems
fine, meaning the condition is unneeded.
> }
> + c->transform_add[1] = ff_hevc_transform_add8_8_avx;
I'm not entirely sure, but this is instantiated through INIT_YMM avx2,
and I wouldn't expect performance improvement past the 3-op-form?
So couldn't this one be instantiated to use xmm regs? (mmx may be a
burden eg need for emms and need to rewrite it).
--
Christophe
More information about the ffmpeg-devel
mailing list