[FFmpeg-devel] [PATCH + RFC] Faster ff_celp_lp_synthesis_filterf() (and failed SSE SIMD version)

Vitor Sessak vitor1001
Mon Dec 14 19:53:00 CET 2009


Michael Niedermayer wrote:
> On Sun, Dec 13, 2009 at 08:55:08PM +0100, Vitor Sessak wrote:
> [...]
>> +            old_out3 = old_out2;
>> +            old_out2 = old_out1;
>> +            old_out1 = old_out0;
>> +            old_out0 = out[-i-1];
>> +
>> +            val = filter_coeffs[i];
>> +
>> +            out0 -= val * old_out0;
>> +            out1 -= val * old_out1;
>> +            out2 -= val * old_out2;
>> +            out3 -= val * old_out3;
> 
> old_out3 = out[-i-1];
> 
> val = filter_coeffs[i];
> out0 -= val * old_out3;
> out1 -= val * old_out0;
> out2 -= val * old_out1;
> out3 -= val * old_out2;
> 
> and similarly you can get rid of the other copies if you unroll it more

Indeed, done. New patch attached.

BTW, in my SSE code, there was a line of code missing:

> DECLARE_ASM_CONST(16, uint32_t, mask[4]) = {0xFFFFFFFF, 0xFFFFFFFF,
>                                             0xFFFFFFFF, 0x00000000};
> 

-Vitor
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lp_synthesis2.diff
Type: text/x-patch
Size: 3383 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20091214/a70c09d4/attachment.bin>



More information about the ffmpeg-devel mailing list