[FFmpeg-devel] [PATCH] vp9/x86: 16x16 iadst_idct, idct_iadst and iadst_iadst (ssse3+avx).
Clément Bœsch
u at pkh.me
Thu Jan 16 13:17:36 CET 2014
On Wed, Jan 15, 2014 at 09:04:41PM -0500, Ronald S. Bultje wrote:
> Sample timings on ped1080p.webm (of the ssse3 functions):
> iadst_idct: 4672 -> 1175 cycles
> idct_iadst: 4736 -> 1263 cycles
> iadst_iadst: 4924 -> 1438 cycles
> Total decoding time changed from 6.565s to 6.413s.
> ---
> libavcodec/x86/vp9dsp_init.c | 34 ++++--
> libavcodec/x86/vp9itxfm.asm | 272 ++++++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 293 insertions(+), 13 deletions(-)
>
[...]
> +%macro VP9_IADST16_1D 2 ; src, pass
> +%assign %%str 16*%2
> + mova m0, [%1+ 0*32] ; in0
> + mova m1, [%1+15*32] ; in15
> + mova m8, [%1+ 7*32] ; in7
> + mova m9, [%1+ 8*32] ; in8
> +
> + VP9_UNPACK_MULSUB_2D_4X 1, 0, 2, 3, 16364, 804 ; m1/2=t1[d], m0/3=t0[d]
> + VP9_UNPACK_MULSUB_2D_4X 8, 9, 11, 10, 11003, 12140 ; m8/11=t9[d], m9/10=t8[d]
> + VP9_RND_SH_SUMSUB_BA 9, 0, 10, 3, 4, [pd_8192] ; m9=t0[w], m0=t8[w]
> + VP9_RND_SH_SUMSUB_BA 8, 1, 11, 2, 4, [pd_8192] ; m8=t1[w], m1=t9[w]
> +
> + mova m11, [%1+ 2*32] ; in2
> + mova m10, [%1+13*32] ; in13
> + mova m3, [%1+ 5*32] ; in5
> + mova m2, [%1+10*32] ; in10
> +
> + VP9_UNPACK_MULSUB_2D_4X 10, 11, 6, 7, 15893, 3981 ; m3/6=t3[d], m2/7=t2[d]
> + VP9_UNPACK_MULSUB_2D_4X 3, 2, 4, 5, 8423, 14053 ; m10/4=t11[d], m11/5=t10[d]
The comments look entangled here.
[...]
Rest LGTM
--
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 490 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20140116/3727bf38/attachment.asc>
More information about the ffmpeg-devel
mailing list