[FFmpeg-devel] [PATCH 3/9] SBR DSP x86: implement SSE qmf_deint_bfly

Michael Niedermayer michaelni at gmx.at
Fri Apr 5 15:44:44 CEST 2013


On Thu, Apr 04, 2013 at 07:45:47PM +0000, Christophe Gisquet wrote:
> From 713 to 209 cycles on Arrandale and Win64.
> Having a loop counter is a 7 cycle gain.
> Unrolling is another 7 cycle gain.
> Working in reverse scan is another 6 cycles.
> ---
>  libavcodec/x86/sbrdsp.asm    | 28 ++++++++++++++++++++++++++++
>  libavcodec/x86/sbrdsp_init.c |  2 ++
>  2 files changed, 30 insertions(+)
> 
> diff --git a/libavcodec/x86/sbrdsp.asm b/libavcodec/x86/sbrdsp.asm
> index 85e197a..573981a 100644
> --- a/libavcodec/x86/sbrdsp.asm
> +++ b/libavcodec/x86/sbrdsp.asm
> @@ -273,3 +273,31 @@ cglobal sbr_qmf_deint_neg, 2,3,3,v,src,vrev
>      cmp        vq, vrevq
>      jl      .loop
>      REP_RET
> +
> +INIT_XMM sse
> +; sbr_qmf_deint_bfly(float *v, const float *src0, const float *src1)
> +cglobal sbr_qmf_deint_bfly, 3,5,8, v,src0,src1,vrev,c
> +    mov        cq, 64*4-2*mmsize
> +    lea     vrevq, [vq + 64*4]
> +.loop:
> +    mova       m0, [src0q+cq]
> +    mova       m1, [src1q]
> +    mova       m4, [src0q+cq+mmsize]
> +    mova       m5, [src1q+mmsize]

> +    shufps m2, m0, m0, q0123
> +    shufps m3, m1, m1, q0123
> +    shufps m6, m4, m4, q0123
> +    shufps m7, m5, m5, q0123

replacing these by pshufd changes it from 68 to 47 cycles on
sandybridge


> +    addps      m5, m2
> +    subps      m0, m7
> +    addps      m1, m6
> +    subps      m4, m3
> +    mova  [vrevq], m1
> +    mova  [vrevq+mmsize], m5
> +    mova  [vq+cq], m0
> +    mova  [vq+cq+mmsize], m4
> +    add     src1q, 2*mmsize
> +    add     vrevq, 2*mmsize
> +    sub        cq, 2*mmsize
> +    jge     .loop

i tried to reorder the instructions but didnt see a speedgain from it
but in theory memory accesses might benefit from being done in order
that is 8 7 6 5 4 3 instead of 7 8 5 6 3 4

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

There will always be a question for which you do not know the correct answer.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20130405/02d43443/attachment.asc>


More information about the ffmpeg-devel mailing list