[FFmpeg-devel] [PATCH] iirfilter: Rework FILTER_O2() to improve performance.

Justin Ruggles justin.ruggles
Mon Jan 31 22:36:07 CET 2011


On 01/31/2011 04:19 PM, M?ns Rullg?rd wrote:

> Justin Ruggles <justin.ruggles at gmail.com> writes:
> 
>> Store coefficient pointers locally.
>> Store state locally, then save at end.
>> Split up arithmetic.
>> ---
>>  libavcodec/iirfilter.c |   27 +++++++++++++++++++--------
>>  1 files changed, 19 insertions(+), 8 deletions(-)
>>
>> I was able to gcc to do what I wanted, but it's not pretty.
>>
>> Also, if a simplified version of ff_iir_filter_flt() is added that
>> doesn't read/save the state and always has sstep/dstep == 1, it can
>> gain about another 3% speedup.  I'll save that for later though
>> since it would be unnecessarily complicating something that's not
>> even used yet.
>>
>> Athlon64
>> current: 217731
>> patched: 195939
>> simple:  189951
>>
>> Atom
>> current: 569708
>> patched: 542916
>> simple:  524855
>>
>> Cheers,
>> Justin
>>
>> diff --git a/libavcodec/iirfilter.c b/libavcodec/iirfilter.c
>> index bc63c39..8398e22 100644
>> --- a/libavcodec/iirfilter.c
>> +++ b/libavcodec/iirfilter.c
>> @@ -257,19 +257,30 @@ av_cold struct FFIIRFilterState* ff_iir_filter_init_state(int order)
>>  }
>>  
>>  #define FILTER_O2(type, fmt) {                                          \
>> -    int i;                                                              \
>>      const type *src0 = src;                                             \
>>      type       *dst0 = dst;                                             \
>> -    for (i = 0; i < size; i++) {                                        \
>> -        float in = *src0   * c->gain  +                                 \
>> -                   s->x[0] * c->cy[0] +                                 \
>> -                   s->x[1] * c->cy[1];                                  \
>> -        CONV_##fmt(*dst0, s->x[0] + in + s->x[1] * c->cx[1])            \
>> -        s->x[0] = s->x[1];                                              \
>> -        s->x[1] = in;                                                   \
>> +    const float *g = &c->gain;                                          \
> 
> Why is this a pointer?
> 
>> +    const int   *x = c->cx;                                             \
>> +    const float *y = c->cy;                                             \
>> +    float s0 = s->x[0];                                                 \
>> +    float s1 = s->x[1];                                                 \
>> +    while (size--) {                                                    \
>> +        float in  = *src0 * *g;                                         \
>> +        float tmp = s0 * y[0];                                          \
>> +        in += tmp;                                                      \
>> +        tmp = s1 * y[1];                                                \
>> +        in += tmp;                                                      \
>> +        tmp = s1 * x[1];                                                \
>> +        s0 += tmp;                                                      \
>> +        s0 += in;                                                       \
> 
> Does using the 'tmp' variable really make it faster?  I have seen such
> things with gcc before, but I have to ask.
> 
>> +        CONV_##fmt(*dst0, s0)                                           \
>> +        s0 = s1;                                                        \
>> +        s1 = in;                                                        \
>>          src0 += sstep;                                                  \
>>          dst0 += dstep;                                                  \
>>      }                                                                   \
>> +    s->x[0] = s0;                                                       \
>> +    s->x[1] = s1;                                                       \
>>  }
>>  
>>  void ff_iir_filter(const struct FFIIRFilterCoeffs *c,
> 


Everything that seems weird about the function is only because the
obvious alternative is slower on my Athlon64.

I basically started by disassembling the gcc output and converting it to
yasm.  Then I tweaked and tweaked and tried every possible combination I
could think of.  Some really random stuff was significantly slower.
Once I got it to the fastest I could get it, I tried to get gcc to match
the asm version as closely as possible.  Merging any of those tmp values
makes it slower.  Changing any of the local pointers to values makes it
slower.

-Justin




More information about the ffmpeg-devel mailing list