[FFmpeg-devel] [PATCH 4/7] x86: sbrdsp: implement SSE hf_apply_noise

Jason Garrett-Glaser darkshikari at gmail.com
Sat Apr 6 20:50:26 CEST 2013


On Sat, Apr 6, 2013 at 6:44 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Sat, Apr 06, 2013 at 10:52:11AM +0000, Christophe Gisquet wrote:
>> 233 to 115(sse)/110(sse2) cycles on Arrandale and Win64.
>> Replacing the multiplication by s_m[m] by an andps and an xorps with
>> appropriate vectors is slower. Unrolling is a 15 cycles win.
>> ---
>>  libavcodec/x86/sbrdsp.asm    | 145 +++++++++++++++++++++++++++++++++++++++++++
>>  libavcodec/x86/sbrdsp_init.c |  32 ++++++++++
>>  2 files changed, 177 insertions(+)
>>
>> diff --git a/libavcodec/x86/sbrdsp.asm b/libavcodec/x86/sbrdsp.asm
>> index 65c972e..a7998fa 100644
>> --- a/libavcodec/x86/sbrdsp.asm
>> +++ b/libavcodec/x86/sbrdsp.asm
>> @@ -26,6 +26,12 @@ SECTION_RODATA
>>  ps_mask         times 2 dd 1<<31, 0
>>  ps_mask2        times 2 dd 0, 1<<31
>>  ps_neg          times 4 dd 1<<31
>> +ps_noise0       times 2 dd  1.0,  0.0,
>> +ps_noise2       times 2 dd -1.0,  0.0
>> +ps_noise13      dd  0.0,  1.0, 0.0, -1.0
>> +                dd  0.0, -1.0, 0.0,  1.0
>> +                dd  0.0,  1.0, 0.0, -1.0
>> +cextern         sbr_noise_table
>>
>>  SECTION_TEXT
>>
>
>> @@ -358,3 +364,142 @@ SBR_QMF_DEINT_BFLY
>>
>>  INIT_XMM sse2
>>  SBR_QMF_DEINT_BFLY
>> +
>> +%if WIN64
>> +%define NREGS 0
>> +%else
>
>> +%ifndef PIC
>
> ifdef
>
>
> [...]
>> +%endif
>> +    mulps      m1, m3 ; m2 = q_filt[m] * ff_sbr_noise_table[noise]
>> +    mulps      m2, m4 ; m2 = q_filt[m] * ff_sbr_noise_table[noise]
>> +    mova       m3, [s_mq + count]
>> +    ; TODO: replace by a vpermd in AVX2
>
>> +%if cpuflag(sse2)
>> +    punpckhdq  m4, m3, m3
>> +    punpckldq  m3, m3, m3
>> +%else
>> +    unpckhps   m4, m3, m3
>> +    unpcklps   m3, m3, m3
>> +%endif
>
> it might make sense to do something in some header with a macro
> maybe so that punpckl/dq get turned into unpck* on SSE1

Maybe modify SBUTTERFLY to do that if SSE1 is on?  SBUTTERFLY is
basically this macro, I think.

Jason


More information about the ffmpeg-devel mailing list