[FFmpeg-devel] [PATCH 2/2] x86/dsputilenc: implement SSE2 versions of pix_{sum16, norm1}
Christophe Gisquet
christophe.gisquet at gmail.com
Tue May 27 10:09:39 CEST 2014
Hi,
2014-05-27 9:27 GMT+02:00 James Almer <jamrial at gmail.com>:
> - mov r2, r1
> - neg r2
> - shl r2, 4
> - sub r0, r2
[...]
> - movd eax, m6
> - and eax, 0xffff
> + paddw m4, m3
> +%if mmsize == 8
> + add r0, r1
> +%else
> + lea r0, [r0+r1*2]
> +%endif
> + dec r2
> + jne .loop
Sorry for not doing my work of sufficiently studying these changes, but...
Are you removing some corner cases handling? The top part is maybe a
better loop control in your code, but the second part you remove looks
like some overflow prevention.
> INIT_MMX mmx
> +PIX_SUM16 0, 16
> +INIT_XMM sse2
> +PIX_SUM16 6, 8
> +
[...]
> +INIT_MMX mmx
> +PIX_NORM1 0, 16
> +INIT_XMM sse2
> +PIX_NORM1 6, 8
PHADDW may map to an ssse3 instruction. I suspect it's probably not
worth the 3% cycle count decrease, but do you have plans about it? (eg
in a later patch)
> --- a/libavutil/x86/x86util.asm
> +++ b/libavutil/x86/x86util.asm
> @@ -288,7 +288,12 @@
> paddd %1, %2
> %endif
> %if notcpuflag(xop) || sizeof%1 != 16
> +%if cpuflag(mmxext)
> PSHUFLW %2, %1, q0032
> +%else ; mmx
> + mova %2, %1
> + psrlq %2, 32
> +%endif
> paddd %1, %2
> %endif
> %undef %1
If I'm not mistaken, x264 folks don't care about mmx, so no need to
wait for their opinion on this change. This part OK then.
Not tested but looks good to me otherwise.
Best regards,
--
Christophe
More information about the ffmpeg-devel
mailing list