[FFmpeg-devel] [PATCH] vp9: add x86 simd (sse2/ssse3) for iadst4 10bpp functions.

Henrik Gramner henrik at gramner.com
Tue Oct 6 20:41:26 CEST 2015


On Tue, Oct 6, 2015 at 5:42 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
> +cglobal vp9_%1_%3_4x4_add_10, 3, 3, 0, dst, stride, block, eob
[...]
> +    mova                m0, [blockq+0*16+0]
> +    mova                m4, [blockq+0*16+8]
> +    mova                m1, [blockq+1*16+0]
> +    mova                m5, [blockq+1*16+8]
> +    packssdw            m0, m4
> +    packssdw            m1, m5
> +    mova                m2, [blockq+2*16+0]
> +    mova                m4, [blockq+2*16+8]
> +    mova                m3, [blockq+3*16+0]
> +    mova                m5, [blockq+3*16+8]
> +    packssdw            m2, m4
> +    packssdw            m3, m5

Use packssdw with a memory arg as the second operand.

The mixing of MMX and SSE is quite ugly in general, but whatever works.


More information about the ffmpeg-devel mailing list