[FFmpeg-devel] [PATCH] vp9: add 32x32 idct AVX2 implementation.

Ronald S. Bultje rsbultje at gmail.com
Tue Jul 19 17:38:15 EEST 2016


Hi,

On Sat, Jul 16, 2016 at 5:55 AM, Henrik Gramner <henrik at gramner.com> wrote:

> On Wed, Jul 13, 2016 at 6:37 PM, Ronald S. Bultje <rsbultje at gmail.com>
> wrote:
> > +cglobal vp9_idct_idct_32x32_add, 4, 9, 16, 2048, dst, stride, block, eob
> [...]
> > +    movd               xm0, [blockq]
> > +    mova                m1, [pw_11585x2]
> > +    pmulhrsw            m0, m1
> > +    pmulhrsw            m0, m1
> > +    vpbroadcastw        m0, xm0
> > +    pmulhrsw            m0, [pw_512]
>
> [..]

Is it mathematically possible to merge consecutive pmulhrsw
> instructions into a single one using a different constant? I'm
> guessing no, but I'm not sure.


To my knowledge: no. The intermediate rounding step gets rid of the least
significant bits before the second mul, and merging the muls would remove
this which would change the integer result.

Ronald


More information about the ffmpeg-devel mailing list