[FFmpeg-devel] [PATCH] vp9: add 32x32 idct AVX2 implementation.
Ronald S. Bultje
rsbultje at gmail.com
Tue Jul 19 17:38:15 EEST 2016
Hi,
On Sat, Jul 16, 2016 at 5:55 AM, Henrik Gramner <henrik at gramner.com> wrote:
> On Wed, Jul 13, 2016 at 6:37 PM, Ronald S. Bultje <rsbultje at gmail.com>
> wrote:
> > +cglobal vp9_idct_idct_32x32_add, 4, 9, 16, 2048, dst, stride, block, eob
> [...]
> > + movd xm0, [blockq]
> > + mova m1, [pw_11585x2]
> > + pmulhrsw m0, m1
> > + pmulhrsw m0, m1
> > + vpbroadcastw m0, xm0
> > + pmulhrsw m0, [pw_512]
>
> [..]
Is it mathematically possible to merge consecutive pmulhrsw
> instructions into a single one using a different constant? I'm
> guessing no, but I'm not sure.
To my knowledge: no. The intermediate rounding step gets rid of the least
significant bits before the second mul, and merging the muls would remove
this which would change the integer result.
Ronald
More information about the ffmpeg-devel
mailing list