[FFmpeg-devel] [PATCH] x86/dcadec: add ff_lfe_fir1_float_{sse3, avx}

James Almer jamrial at gmail.com
Tue Feb 23 00:01:34 CET 2016


On 2/22/2016 7:44 PM, Christophe Gisquet wrote:
> Hi,
> 
> 2016-02-22 22:43 GMT+01:00 James Almer <jamrial at gmail.com>:
>> +.loop:
>> +%if cpuflag(avx)
>> +    cvtdq2ps  m4, [lfeq]
>> +    shufps    m5, m4, m4, q0123
>> +%elif cpuflag(sse2)
>> +    movu      m4, [lfeq]
>> +    cvtdq2ps  m4, m4
>> +    pshufd    m5, m4, q0123
>> +%endif
>> +
>> +.inner_loop:
>> +%if ARCH_X86_64
>> +    movaps    m6, [coeffq+cnt1q*4   ]
>> +    movaps    m7, [coeffq+cnt1q*4+16]
>> +    movaps    m8, [coeffq+cnt1q*4+32]
>> +    movaps    m9, [coeffq+cnt1q*4+48]
>> +    mulps     m0, m5, m6
>> +    mulps     m1, m5, m7
>> +    mulps     m2, m5, m8
>> +    mulps     m3, m5, m9
>> +%else
>> +    movaps    m6, [coeffq+cnt1q*4   ]
>> +    movaps    m7, [coeffq+cnt1q*4+16]
>> +    mulps     m0, m5, m6
>> +    mulps     m1, m5, m7
>> +    mulps     m2, m5, [coeffq+cnt1q*4+32]
>> +    mulps     m3, m5, [coeffq+cnt1q*4+48]
>> +%endif
> 
> Is OOE the reason why you don't move the common code out of those
> conditional blocks? Otherwise, it looks cleaner to me to do:

Not really. I just thought having x86_64 and X86_32 clearly separated
was easier to read.

>     movaps    m6, [coeffq+cnt1q*4   ]
>     movaps    m7, [coeffq+cnt1q*4+16]
>     mulps     m0, m3, m6
>     mulps     m1, m3, m7
> %if ARCH_X86_64
>     movaps    m8, [coeffq+cnt1q*4+32]
>     movaps    m9, [coeffq+cnt1q*4+48]
>     mulps     m2, m5, m8
>     mulps     m3, m5, m9
> %else
>     mulps     m2, m5, [coeffq+cnt1q*4+32]
>     mulps     m3, m5, [coeffq+cnt1q*4+48]
> %endif
> and let OOE do its job.
> 
> Secondly, m5 is not reused afterwards, so maybe replace m5 by m3 for
> all code up to this, and load something into m5 instead?

m5 and m4 contain the lfe samples. I can't reuse them inside the inner
loop.

> 
>> +    haddps    m0, m1
>> +    haddps    m2, m3
>> +    haddps    m0, m2
>> +    movaps [samplesq+cnt1q], m0
> 
> I suppose you've already looked at most arrangements that would help
> doing fewer shuffles. And I don't see any obvious one either.
> 



More information about the ffmpeg-devel mailing list