[FFmpeg-devel] [PATCH 1/7] x86: hevc_mc: add AVX2 optimizations
James Almer
jamrial at gmail.com
Fri Feb 6 01:15:28 CET 2015
On 05/02/15 4:20 PM, Christophe Gisquet wrote:
> From: plepere <pierre-edouard.lepere at insa-rennes.fr>
This should probably be changed to Pierre Edouard Lepere.
> +%if cpuflag(avx2) && (%0 == 3)
> +
> + vextracti128 xm10, m0, 1
> + vinserti128 m10, m1, xm10, 0
> + vinserti128 m0, m0, xm1, 1
> + mova m1, m10
> +
> + vextracti128 xm10, m2, 1
> + vinserti128 m10, m3, xm10, 0
> + vinserti128 m2, m2, xm3, 1
> + mova m3, m10
> +
> +
> + vextracti128 xm10, m4, 1
> + vinserti128 m10, m5, xm10, 0
> + vinserti128 m4, m4, xm5, 1
> + mova m5, m10
> +
> + vextracti128 xm10, m6, 1
> + vinserti128 m10, m7, xm10, 0
> + vinserti128 m6, m6, xm7, 1
> + mova m7, m10
> +%endif
I didn't check but i think these can be simplified using vperm2i128.
It can be done in a separate patch anyway.
> @@ -619,6 +761,89 @@ void ff_hevc_dsp_init_x86(HEVCDSPContext *c, const int bit_depth)
> c->idct_dc[3] = ff_hevc_idct32x32_dc_8_avx2;
> if (ARCH_X86_64) {
> SAO_BAND_INIT(8, avx2);
> + c->put_hevc_epel[7][0][0] = ff_hevc_put_hevc_pel_pixels32_8_avx2;
> + c->put_hevc_epel[8][0][0] = ff_hevc_put_hevc_pel_pixels48_8_avx2;
> + c->put_hevc_epel[9][0][0] = ff_hevc_put_hevc_pel_pixels64_8_avx2;
[...]
It would be nice all this was compressed to a couple macros like with SSE4. But that's
cosmetics and not a blocker.
> }
>
> c->transform_add[2] = ff_hevc_transform_add16_10_avx2;
>
Should be ok if it passes fate and compiles with yasm <= 1.1.0 (there are C wrappers
and those usually need more strict checks for HAVE_AVX2_EXTERNAL because dead code
elimination doesn't seem to trigger until after pre-processing is done).
More information about the ffmpeg-devel
mailing list