[FFmpeg-devel] [PATCH + RFC] Faster ff_celp_lp_synthesis_filterf() (and failed SSE SIMD version)
Vitor Sessak
vitor1001
Mon Dec 14 22:21:47 CET 2009
Vitor Sessak wrote:
> Michael Niedermayer wrote:
>> On Sun, Dec 13, 2009 at 08:55:08PM +0100, Vitor Sessak wrote:
>> [...]
>>> + old_out3 = old_out2;
>>> + old_out2 = old_out1;
>>> + old_out1 = old_out0;
>>> + old_out0 = out[-i-1];
>>> +
>>> + val = filter_coeffs[i];
>>> +
>>> + out0 -= val * old_out0;
>>> + out1 -= val * old_out1;
>>> + out2 -= val * old_out2;
>>> + out3 -= val * old_out3;
>>
>> old_out3 = out[-i-1];
>>
>> val = filter_coeffs[i];
>> out0 -= val * old_out3;
>> out1 -= val * old_out0;
>> out2 -= val * old_out1;
>> out3 -= val * old_out2;
>>
>> and similarly you can get rid of the other copies if you unroll it more
>
> Indeed, done. New patch attached.
>
> BTW, in my SSE code, there was a line of code missing:
>
>> DECLARE_ASM_CONST(16, uint32_t, mask[4]) = {0xFFFFFFFF, 0xFFFFFFFF,
>> 0xFFFFFFFF, 0x00000000};
>>
Err, this time without reinventing FFSWAP()...
-Vitor
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lp_synthesis3.diff
Type: text/x-patch
Size: 3315 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20091214/d9b8f7dd/attachment.bin>
More information about the ffmpeg-devel
mailing list