[FFmpeg-devel] [PATCH 1/2] x86/vf_blend: add sse and ssse3 extremity functions
Ivan Kalvachev
ikalvachev at gmail.com
Wed Jun 28 02:19:37 EEST 2017
On 6/27/17, James Almer <jamrial at gmail.com> wrote:
> Signed-off-by: James Almer <jamrial at gmail.com>
> ---
> libavfilter/x86/vf_blend.asm | 25 +++++++++++++++++++++++++
> libavfilter/x86/vf_blend_init.c | 4 ++++
> tests/checkasm/vf_blend.c | 1 +
> 3 files changed, 30 insertions(+)
>
> diff --git a/libavfilter/x86/vf_blend.asm b/libavfilter/x86/vf_blend.asm
> index 33b1ad1496..25f6f5affc 100644
> --- a/libavfilter/x86/vf_blend.asm
> +++ b/libavfilter/x86/vf_blend.asm
> @@ -286,6 +286,31 @@ BLEND_INIT difference, 3
> jl .loop
> BLEND_END
>
> +BLEND_INIT extremity, 8
> + pxor m2, m2
> + mova m4, [pw_255]
> +.nextrow:
> + mov xq, widthq
> +
> + .loop:
> + movu m0, [topq + xq]
> + movu m1, [bottomq + xq]
> + punpckhbw m5, m0, m2
> + punpcklbw m0, m2
> + punpckhbw m6, m1, m2
> + punpcklbw m1, m2
> + psubw m3, m4, m0
> + psubw m7, m4, m5
> + psubw m3, m1
> + psubw m7, m6
> + ABS1 m3, m1
> + ABS1 m7, m6
Minor nitpick.
There exists ABS2 that takes 4 parameters and that does
two interleaved ABS1 , that are (hopefully) faster on sse2.
It should generate exactly the same code on ssse3.
More information about the ffmpeg-devel
mailing list