[FFmpeg-devel] [PATCH 6/8 v2] x86/flacdsp: add a SSE2 version of wasted32

Lynne dev at lynne.ee
Sun May 12 23:22:10 EEST 2024


On 12/05/2024 20:51, James Almer wrote:
> flac_wasted_32_c: 851.3
> flac_wasted_32_sse2: 41.3
> 
> Signed-off-by: James Almer <jamrial at gmail.com>
> ---
>   libavcodec/x86/flacdsp.asm    | 24 ++++++++++++++++++++++++
>   libavcodec/x86/flacdsp_init.c |  3 +++
>   2 files changed, 27 insertions(+)
> 
> diff --git a/libavcodec/x86/flacdsp.asm b/libavcodec/x86/flacdsp.asm
> index f38eb7db76..21b2439bc0 100644
> --- a/libavcodec/x86/flacdsp.asm
> +++ b/libavcodec/x86/flacdsp.asm
> @@ -89,6 +89,30 @@ LPC_32 sse4, 32, psrlq
>   LPC_32 xop,  32, psrlq
>   %endif
>   
> +INIT_XMM sse2
> +cglobal flac_wasted_32, 3,3,5, decoded, wasted, len
> +    shl   lend, 2
> +    add   decodedq, lenq
> +    neg   lenq
> +    movd  m4, wastedd
> +ALIGN 16
> +.loop:
> +    mova  m0, [decodedq+lenq+mmsize*0]
> +    mova  m1, [decodedq+lenq+mmsize*1]
> +    mova  m2, [decodedq+lenq+mmsize*2]
> +    mova  m3, [decodedq+lenq+mmsize*3]
> +    pslld m0, m4
> +    pslld m1, m4
> +    pslld m2, m4
> +    pslld m3, m4
> +    mova  [decodedq+lenq+mmsize*0], m0
> +    mova  [decodedq+lenq+mmsize*1], m1
> +    mova  [decodedq+lenq+mmsize*2], m2
> +    mova  [decodedq+lenq+mmsize*3], m3
> +    add lenq, mmsize * 4
> +    jl .loop
> +    RET

Looks good


More information about the ffmpeg-devel mailing list