[FFmpeg-devel] [PATCH] vp9: add 32x32 idct AVX2 implementation.
Henrik Gramner
henrik at gramner.com
Sat Jul 16 12:55:24 EEST 2016
On Wed, Jul 13, 2016 at 6:37 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
> +cglobal vp9_idct_idct_32x32_add, 4, 9, 16, 2048, dst, stride, block, eob
[...]
> + movd xm0, [blockq]
> + mova m1, [pw_11585x2]
> + pmulhrsw m0, m1
> + pmulhrsw m0, m1
> + vpbroadcastw m0, xm0
> + pmulhrsw m0, [pw_512]
The vpbroadcastw could be done from memory in the beginning which
would get rid of the movd.
Is it mathematically possible to merge consecutive pmulhrsw
instructions into a single one using a different constant? I'm
guessing no, but I'm not sure.
[...]
> + ; at the end of the loop, m7 should still be zero
> + ; use that to zero out block coefficients
> + ZERO_BLOCK blockq, 64, 16, m1
comment says m7, code says m1.
[...]
> + ; at the end of the loop, m7 should still be zero
> + ; use that to zero out block coefficients
> + ZERO_BLOCK blockq, 64, 32, m1
Ditto.
More information about the ffmpeg-devel
mailing list