[FFmpeg-devel] [PATCH] x86: hevc: adding transform_add
Ronald S. Bultje
rsbultje at gmail.com
Wed Jul 30 23:12:44 CEST 2014
Hi,
On Wed, Jul 30, 2014 at 5:04 PM, James Almer <jamrial at gmail.com> wrote:
> On 30/07/14 10:33 AM, Pierre Edouard Lepere wrote:
>
> > +%macro TR_ADD_INIT_SSE_8 2
> > + movu m4, [r1]
> > + movu m6, [r1+16]
> > + movu m8, [r1+32]
> > + movu m10, [r1+48]
>
> You can use mova here, and probably in every other movu as well.
>
> > + lea %1, [%2*3]
> > + pxor m5, m5
> > + psubw m5, m4
> > + packuswb m4, m4
> > + packuswb m5, m5
> > + pxor m7, m7
> > + psubw m7, m6
> > + packuswb m6, m6
> > + packuswb m7, m7
> > + pxor m9, m9
> > + psubw m9, m8
> > + packuswb m8, m8
> > + packuswb m9, m9
> > + pxor m11, m11
> > + psubw m11, m10
> > + packuswb m10, m10
> > + packuswb m11, m11
> > +%endmacro
> >
> > +%macro TR_ADD_OP_SSE 4
> > + %1 m0, [%2 ]
> > + %1 m1, [%2+%3 ]
> > + %1 m2, [%2+%3*2]
> > + %1 m3, [%2+%4 ]
> > + paddusb m0, m4
> > + paddusb m1, m6
> > + paddusb m2, m8
> > + paddusb m3, m10
> > + psubusb m0, m5
> > + psubusb m1, m7
> > + psubusb m2, m9
> > + psubusb m3, m11
> > + %1 [%2 ], m0
> > + %1 [%2+%3 ], m1
> > + %1 [%2+2*%3], m2
> > + %1 [%2+%4 ], m3
> > +%endmacro
>
> You can use packuswb to pack two regs into one, like you did in
> TR_ADD_INIT_SSE_16.
> Then you simply use movq+movhps to load and store data, like so:
>
> %macro TR_ADD_INIT_SSE_8 2
> mova m4, [r1]
> mova m6, [r1+16]
> mova m0, [r1+32]
> mova m2, [r1+48]
> lea %1, [%2*3]
> pxor m5, m5
> psubw m5, m4
> pxor m7, m7
> psubw m7, m6
> pxor m1, m1
> psubw m1, m0
> packuswb m4, m0
> packuswb m5, m1
> pxor m3, m3
> psubw m3, m2
> packuswb m6, m2
> packuswb m7, m3
> %endmacro
>
> %macro TR_ADD_OP_SSE 4
> movq m0, [%2 ]
> movq m1, [%2+%3 ]
> movhps m0, [%2+%3*2]
> movhps m1, [%2+%4 ]
> paddusb m0, m4
> paddusb m1, m6
> psubusb m0, m5
> psubusb m1, m7
> movq [%2 ], m0
> movq [%2+%3 ], m1
> movhps [%2+2*%3], m0
> movhps [%2+%4 ], m1
> %endmacro
Why all these memory round-trips?
Ronald
More information about the ffmpeg-devel
mailing list