[FFmpeg-devel] [PATCH] iirfilter: Use local variables for state in loop for FILTER_O2().
Justin Ruggles
justin.ruggles
Sun Jan 30 22:04:36 CET 2011
On 01/30/2011 02:26 PM, Justin Ruggles wrote:
> On 01/30/2011 02:05 PM, Justin Ruggles wrote:
>
>> On 01/30/2011 01:51 PM, M?ns Rullg?rd wrote:
>>
>>> Justin Ruggles <justin.ruggles at gmail.com> writes:
>>>
>>>> 4% faster 2nd order ff_iir_filter_flt().
>>>> ---
>>>> libavcodec/iirfilter.c | 12 +++++++-----
>>>> 1 files changed, 7 insertions(+), 5 deletions(-)
>>>>
>>>> I tried most of yesterday and this morning trying to make an asm
>>>> version of the float biquad filter, but nothing I came up with was
>>>> faster than what gcc did with the C version. I did, however manage
>>>> to speed up the C version by about 4% by adding local variables
>>>> inside the inner loop for the 2 states.
>>>>
>>>> diff --git a/libavcodec/iirfilter.c b/libavcodec/iirfilter.c
>>>> index bc63c39..dd593dd 100644
>>>> --- a/libavcodec/iirfilter.c
>>>> +++ b/libavcodec/iirfilter.c
>>>> @@ -261,11 +261,13 @@ av_cold struct FFIIRFilterState* ff_iir_filter_init_state(int order)
>>>> const type *src0 = src; \
>>>> type *dst0 = dst; \
>>>> for (i = 0; i < size; i++) { \
>>>> - float in = *src0 * c->gain + \
>>>> - s->x[0] * c->cy[0] + \
>>>> - s->x[1] * c->cy[1]; \
>>>> - CONV_##fmt(*dst0, s->x[0] + in + s->x[1] * c->cx[1]) \
>>>> - s->x[0] = s->x[1]; \
>>>> + float s0 = s->x[0]; \
>>>> + float s1 = s->x[1]; \
>>>> + float in = *src0 * c->gain + \
>>>> + s0 * c->cy[0] + \
>>>> + s1 * c->cy[1]; \
>>>> + CONV_##fmt(*dst0, in + s0 + s1 * c->cx[1]) \
>>>> + s->x[0] = s1; \
>>>> s->x[1] = in; \
>>>> src0 += sstep; \
>>>> dst0 += dstep; \
>>>
>>> Why do you do load/store the struct values in the loop? Wouldn't it
>>> be better to load the x[] values to locals before the loop and write
>>> them back after? You might try doing the same with c[xy] as well.
>>
>>
>> Yes, that was my first thought, and I tried with state and with coefs
>> outside the loop. They were slower.
>
>
> Looking at the disassembly of iirfilter.o, it seems that gcc does this
> anyway, but it probably does it more efficiently by itself than with me
> trying to force it.
Ok, ignore this patch. I'll send a new one shortly. I took a closer look
at the gcc output and figured out how to move the 2 state variables
outside the loop in a way that makes gcc not mess it up.
current: 217731 dezicycles
this patch: 209010
new patch I haven't sent yet: 189568
I'm going to play around more with the asm to see if I can tweak anymore
speed out of it. This function is a big portion of CPU time for AC3
transient detection, so I want to get it as fast as I can before adding
that to the AC3 encoder.
-Justin
More information about the ffmpeg-devel
mailing list