[FFmpeg-devel] [PATCH] vf_interlace: Add SIMD for lowpass filter

James Almer jamrial at gmail.com
Mon Nov 10 23:04:23 CET 2014


On 10/11/14 6:42 PM, Kieran Kunhya wrote:

Can't test since it doesn't apply cleanly, but here are a few comments anyway.

> diff --git a/libavfilter/x86/vf_interlace.asm b/libavfilter/x86/vf_interlace.asm
> new file mode 100644
> index 0000000..40b10fc
> --- /dev/null
> +++ b/libavfilter/x86/vf_interlace.asm
> @@ -0,0 +1,80 @@
> +;*****************************************************************************
> +;* x86-optimized functions for interlace filter
> +;*
> +;* Copyright (C) 2014 Kieran Kunhya <kierank at obe.tv>
> +;*
> +;* This file is part of Libav.
> +;*
> +;* Libav is free software; you can redistribute it and/or modify
> +;* it under the terms of the GNU General Public License as published by
> +;* the Free Software Foundation; either version 2 of the License, or
> +;* (at your option) any later version.
> +;*
> +;* Libav is distributed in the hope that it will be useful,
> +;* but WITHOUT ANY WARRANTY; without even the implied warranty of
> +;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +;* GNU General Public License for more details.
> +;*
> +;* You should have received a copy of the GNU General Public License along
> +;* with Libav; if not, write to the Free Software Foundation, Inc.,
> +;* 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
> +;******************************************************************************
> +
> +%include "libavutil/x86/x86util.asm"
> +
> +SECTION_RODATA
> +
> +pw_1: times  8 dw 1
> +
> +SECTION .text
> +
> +%macro LOWPASS_LINE 0
> +cglobal lowpass_line, 5, 5

You're using m6, you need to declare 7 xmm regs.
Also, naming the regs would be better than using r*.

> +    add r0, r1
> +    add r2, r1
> +    add r3, r1
> +    add r4, r1
> +    neg r1
> +
> +    pxor m6, m6
> +
> +.loop
> +    mova m0, [r2+r1]
> +    punpcklbw m1, m0, m6
> +    punpckhbw m0, m6
> +    psllw m0, 1
> +    psllw m1, 1
> +
> +    mova m2, [r3+r1]
> +    punpcklbw m3, m2, m6
> +    punpckhbw m2, m6
> +
> +    mova m4, [r4+r1]
> +    punpcklbw m5, m4, m6
> +    punpckhbw m4, m6
> +
> +    paddw m1, m3
> +    paddw m1, m5
> +
> +    paddw m0, m2
> +    paddw m0, m4
> +
> +    paddw m0, [pw_1]
> +    paddw m1, [pw_1]
> +
> +    psrlw m0, 2
> +    psrlw m1, 2

Can't pavgw be used here?

> +
> +    packuswb m1, m0
> +    mova [r0+r1], m1
> +
> +    add r1, mmsize
> +    jl .loop
> +REP_RET
> +%endmacro
> +
> +INIT_XMM sse2
> +LOWPASS_LINE
> +
> +INIT_XMM avx
> +LOWPASS_LINE




More information about the ffmpeg-devel mailing list