[FFmpeg-devel] [PATCH] avcodec/takdec: add x86 SIMD for rest of decorrelation modes

James Almer jamrial at gmail.com
Tue Oct 6 04:24:56 CEST 2015


On 10/5/2015 8:04 PM, Paul B Mahol wrote:
> diff --git a/libavcodec/x86/takdsp.asm b/libavcodec/x86/takdsp.asm
> new file mode 100644
> index 0000000..bc881bf
> --- /dev/null
> +++ b/libavcodec/x86/takdsp.asm
> @@ -0,0 +1,105 @@
> +;******************************************************************************
> +;* TAK DSP SIMD optimizations
> +;*
> +;* Copyright (C) 2015 Paul B Mahol
> +;*
> +;* This file is part of FFmpeg.
> +;*
> +;* FFmpeg is free software; you can redistribute it and/or
> +;* modify it under the terms of the GNU Lesser General Public
> +;* License as published by the Free Software Foundation; either
> +;* version 2.1 of the License, or (at your option) any later version.
> +;*
> +;* FFmpeg is distributed in the hope that it will be useful,
> +;* but WITHOUT ANY WARRANTY; without even the implied warranty of
> +;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +;* Lesser General Public License for more details.
> +;*
> +;* You should have received a copy of the GNU Lesser General Public
> +;* License along with FFmpeg; if not, write to the Free Software
> +;* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> +;******************************************************************************
> +
> +%include "libavutil/x86/x86util.asm"
> +
> +SECTION_RODATA
> +
> +pd_128: times 4 dd 128
> +
> +SECTION .text
> +
> +INIT_XMM sse2
> +cglobal tak_decorrelate_ls, 3, 3, 2, p1, p2, length

As i said before:

shl lengthd, 2   ; length *= sizeof(int32)
add p1q, lengthq
add p2q, lengthq
neg lengthq

> +    .loop:
> +    mova                 m0, [p1q+mmsize*0]
> +    mova                 m1, [p1q+mmsize*1]
> +    paddd                m0, [p2q+mmsize*0]
> +    paddd                m1, [p2q+mmsize*1]
> +    mova     [p2q+mmsize*0], m0
> +    mova     [p2q+mmsize*1], m1

p{1,2}q+lengthq+mmsize

> +    add                 p1q, mmsize*2
> +    add                 p2q, mmsize*2
> +    sub             lengthd, mmsize/2
> +    jg .loop

add lengthq, mmsize*2
jl .loop

Same for every other function.
Should be ok to commit after those changes if nobody else comments.


More information about the ffmpeg-devel mailing list