[FFmpeg-devel] [PATCH] h264pred16x16 plane sse2/ssse3 optimizations

Ronald S. Bultje rsbultje
Thu Sep 30 02:56:13 CEST 2010


Hi,

On Wed, Sep 29, 2010 at 8:51 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Tue, Sep 28, 2010 at 10:31:51PM -0400, Ronald S. Bultje wrote:
>> + ? ?lea ? ? ? ? ?r4, [r0+r2*8-1]
>> + ? ?lea ? ? ? ? ?r3, [r0+r2*4-1]
>> + ? ?add ? ? ? ? ?r4, r2
>> +
>> +%ifdef ARCH_X86_64
>> +%define e_reg r11
>> +%else
>> +%define e_reg r0
>> +%endif
>> +
>
> i see alot of r0-1 maybe r0 could be decreased by 1 somewhere?

Yes, this is actually both smaller/simpler and also faster. Changed.

>> + ? ?movzx ? ? e_reg, byte [r3+r1 ? ?]
>> + ? ?movzx ? ? ? ?r5, byte [r4+r2*2 ?]
>> + ? ?sub ? ? ? ? ?r5, e_reg
>> + ? ?shl ? ? ? ? ?r5, 2
>> +
>> + ? ?movzx ? ? e_reg, byte [r3 ? ? ? ]
>> + ? ?movzx ? ? ? ?r6, byte [r4+r2 ? ?]
>> + ? ?sub ? ? ? ? ?r6, e_reg
>> + ? ?lea ? ? ? ? ?r5, [r5+r6*4]
>> + ? ?sub ? ? ? ? ?r5, r6
>> +
>> + ? ?movzx ? ? e_reg, byte [r3+r2 ? ?]
>> + ? ?movzx ? ? ? ?r6, byte [r4 ? ? ? ]
>> + ? ?sub ? ? ? ? ?r6, e_reg
>> + ? ?lea ? ? ? ? ?r5, [r5+r6*2]
>> +
>> + ? ?movzx ? ? e_reg, byte [r3+r2*2 ?]
>> + ? ?movzx ? ? ? ?r6, byte [r4+r1 ? ?]
>> + ? ?sub ? ? ? ? ?r6, e_reg
>> + ? ?add ? ? ? ? ?r5, r6
>
> this and the shl 2 case look like they could be merged like
> add+shl->lea

Also changed.

>> + ? ?lea ? ? ? ? ?r3, [r4+r2*4 ?]
>> +
>> + ? ?movzx ? ? e_reg, byte [r0+r1 ?-1]
>> + ? ?movzx ? ? ? ?r6, byte [r3+r2*2 ?]
>> + ? ?sub ? ? ? ? ?r6, e_reg
>> + ? ?lea ? ? ? ? ?r5, [r5+r6*8]
>> +
>> + ? ?movzx ? ? e_reg, byte [r0 ? ? -1]
>> + ? ?movzx ? ? ? ?r6, byte [r3+r2 ? ?]
>> + ? ?sub ? ? ? ? ?r6, e_reg
>> + ? ?lea ? ? ? ? ?r5, [r5+r6*8]
>> + ? ?sub ? ? ? ? ?r5, r6
>
> the *7 with lea + sub can maybe be changed to a add into the *8 case and a
> subtract (replacing lea by add)
>
>> + ? ?movzx ? ? e_reg, byte [r0+r2 ?-1]
>> + ? ?movzx ? ? ? ?r6, byte [r3 ? ? ? ]
>> + ? ?sub ? ? ? ? ?r6, e_reg
>> + ? ?lea ? ? ? ? ?r5, [r5+r6*4]
>> + ? ?lea ? ? ? ? ?r5, [r5+r6*2]
>
> this could add into *4 and *2 cases to replace the 2 leas by 2 adds
> or to leas *2 into the *3 case redusing the 2 leas to 1
> similar tricks may be possible elsewhere

I didn't quite get these two, what exactly would you like me to try?

Ronald



More information about the ffmpeg-devel mailing list