[FFmpeg-devel] [PATCH 6/8 v2] x86/flacdsp: add a SSE2 version of wasted32
Lynne
dev at lynne.ee
Sun May 12 23:22:10 EEST 2024
On 12/05/2024 20:51, James Almer wrote:
> flac_wasted_32_c: 851.3
> flac_wasted_32_sse2: 41.3
>
> Signed-off-by: James Almer <jamrial at gmail.com>
> ---
> libavcodec/x86/flacdsp.asm | 24 ++++++++++++++++++++++++
> libavcodec/x86/flacdsp_init.c | 3 +++
> 2 files changed, 27 insertions(+)
>
> diff --git a/libavcodec/x86/flacdsp.asm b/libavcodec/x86/flacdsp.asm
> index f38eb7db76..21b2439bc0 100644
> --- a/libavcodec/x86/flacdsp.asm
> +++ b/libavcodec/x86/flacdsp.asm
> @@ -89,6 +89,30 @@ LPC_32 sse4, 32, psrlq
> LPC_32 xop, 32, psrlq
> %endif
>
> +INIT_XMM sse2
> +cglobal flac_wasted_32, 3,3,5, decoded, wasted, len
> + shl lend, 2
> + add decodedq, lenq
> + neg lenq
> + movd m4, wastedd
> +ALIGN 16
> +.loop:
> + mova m0, [decodedq+lenq+mmsize*0]
> + mova m1, [decodedq+lenq+mmsize*1]
> + mova m2, [decodedq+lenq+mmsize*2]
> + mova m3, [decodedq+lenq+mmsize*3]
> + pslld m0, m4
> + pslld m1, m4
> + pslld m2, m4
> + pslld m3, m4
> + mova [decodedq+lenq+mmsize*0], m0
> + mova [decodedq+lenq+mmsize*1], m1
> + mova [decodedq+lenq+mmsize*2], m2
> + mova [decodedq+lenq+mmsize*3], m3
> + add lenq, mmsize * 4
> + jl .loop
> + RET
Looks good
More information about the ffmpeg-devel
mailing list