[FFmpeg-devel] [PATCH] vp9: minor refactorings in idct ssse3 assembly.
Clément Bœsch
u at pkh.me
Mon Dec 2 13:36:41 CET 2013
On Sun, Dec 01, 2013 at 08:17:28PM -0500, Ronald S. Bultje wrote:
> Make register usage in macros explicit; change mulsub_2w_4x to use 2
> instead of 3 temp registers. Use this free'ed register to load 11585x2
> once before the 2 1d idct4s instead of within each 1d idct4.
> ---
> libavcodec/x86/vp9itxfm.asm | 84 ++++++++++++++++++++++-----------------------
> 1 file changed, 41 insertions(+), 43 deletions(-)
>
> diff --git a/libavcodec/x86/vp9itxfm.asm b/libavcodec/x86/vp9itxfm.asm
> index 7b9d7df..730a26e 100644
> --- a/libavcodec/x86/vp9itxfm.asm
> +++ b/libavcodec/x86/vp9itxfm.asm
> @@ -51,33 +51,32 @@ SECTION .text
> %macro VP9_MULSUB_2W_2X 6 ; dst1, dst2, src (unchanged), round, coefs1, coefs2
> pmaddwd m%1, m%3, %5
> pmaddwd m%2, m%3, %6
> - paddd m%1, m%4
> - paddd m%2, m%4
> - psrad m%1, 14
> - psrad m%2, 14
> + paddd m%1, %4
> + paddd m%2, %4
> + psrad m%1, 14
> + psrad m%2, 14
Can't you keep this function unchanged? It seems this make the calls to
VP9_UNPACK_MULSUB_2W_4X take only one argument with the "m" form (m7),
which sounds a bit inconsistent and seems to create pointless diff.
> %endmacro
>
> -%macro VP9_UNPACK_MULSUB_2W_4X 4 ; dst1, dst2, coef1, coef2
> - punpckhwd m6, m%2, m%1
> - VP9_MULSUB_2W_2X 4, 5, 6, 7, [pw_m%3_%4], [pw_%4_%3]
> +%macro VP9_UNPACK_MULSUB_2W_4X 7 ; dst1, dst2, coef1, coef2, rnd, tmp1, tmp2
> + punpckhwd m%6, m%2, m%1
> + VP9_MULSUB_2W_2X %7, %6, %6, %5, [pw_m%3_%4], [pw_%4_%3]
> punpcklwd m%2, m%1
> - VP9_MULSUB_2W_2X %1, 6, %2, 7, [pw_m%3_%4], [pw_%4_%3]
> - packssdw m%1, m4
> - packssdw m6, m5
> - SWAP %2, 6
> + VP9_MULSUB_2W_2X %1, %2, %2, %5, [pw_m%3_%4], [pw_%4_%3]
> + packssdw m%1, m%7
> + packssdw m%2, m%6
> %endmacro
>
> -%macro VP9_STORE_2X 2
> - movh m6, [dstq]
> - movh m7, [dstq+strideq]
> - punpcklbw m6, m4
> - punpcklbw m7, m4
> - paddw m6, %1
> - paddw m7, %2
> - packuswb m6, m4
> - packuswb m7, m4
> - movh [dstq], m6
> - movh [dstq+strideq], m7
> +%macro VP9_STORE_2X 5 ; reg1, reg2, tmp1, tmp2, zero, zero
one trailing parameter
> + movh m%3, [dstq]
> + movh m%4, [dstq+strideq]
> + punpcklbw m%3, m%5
> + punpcklbw m%4, m%5
> + paddw m%3, m%1
> + paddw m%4, m%2
> + packuswb m%3, m%5
> + packuswb m%4, m%5
> + movh [dstq], m%3
> + movh [dstq+strideq], m%4
> %endmacro
>
[...]
Rest LGTM, thanks
Note: ideally the parametrization and the 11585 coeff pre-load could be
split, but I won't insist on it.
--
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 490 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20131202/87a24ee4/attachment.asc>
More information about the ffmpeg-devel
mailing list