[FFmpeg-devel] [PATCH] Heavy optimization of IFF decoder

Sebastian Vater cdgs.basty
Tue Apr 27 22:40:36 CEST 2010


Michael Niedermayer a ?crit :
> On Tue, Apr 27, 2010 at 09:59:13PM +0200, Sebastian Vater wrote:
>   
>> Ronald S. Bultje a ?crit :
>>     
>>> Hi,
>>>
>>> On Tue, Apr 27, 2010 at 3:05 PM, Sebastian Vater
>>> <cdgs.basty at googlemail.com> wrote:
>>>   
>>>       
>>>> M?ns Rullg?rd a ?crit :
>>>>     
>>>>         
>>>>> Sebastian Vater <cdgs.basty at googlemail.com> writes:
>>>>>       
>>>>>           
>>>>>>  {
>>>>>>      GetBitContext gb;
>>>>>>      unsigned int i;
>>>>>>      const unsigned b = (buf_size * 8) + bps - 1;
>>>>>> +    const unsigned b32 = b & ~3;
>>>>>>      init_get_bits(&gb, buf, buf_size * 8);
>>>>>> -    for(i = 0; i < b; i++) {
>>>>>> +    for(i = 0; i < b32; i += 4) {
>>>>>> +        const uint32_t v = decodeplane8_tab[plane][get_bits(&gb, 4)];
>>>>>> +        AV_WN32A(dst+i, AV_RN32A(dst+i) | v);
>>>>>> +    }
>>>>>>
>>>>>>         
>>>>>>             
>>>>> I suggest using a local variable here, like this:
>>>>>
>>>>>     const uint32_t *lut = decodeplane8_tab[plane];
>>>>>     [...]
>>>>>     uint32_t v = lut[get_bits(...)];
>>>>>
>>>>> I don't trust gcc to do that on its own.
>>>>>
>>>>>       
>>>>>           
>>>> Didn't change anything, i.e. still 20% slower...what now?
>>>>     
>>>>         
>>> With that, it becomes from 20% slower to 3-4% faster for me. Check
>>> your performance again, or increase the amount of cycles in
>>> decodeplane8() (as Mans suggested earlier in this thread).
>>>   
>>>       
>> Well, do you know what I think now?
>>
>> It's just that my GCC 4.2.4 doesn't optimize that as well as
>> yours...updated patch attached!
>>     
>
> or you are on x86_32 while mans/ronald are on x86_64 and gcc runs out
> of registers (i didnt look at the asm so i could be totally wrong)
>
> [..]
>   
>> -    for(i = 0; i < b; i++) {
>> +    for(i = 0; i < b32; i += 4) {
>> +        const uint32_t v = lut[get_bits(&gb, 4)];
>> +        AV_WN32A(dst+i, AV_RN32A(dst+i) | v);
>> +    }
>>     
>
> speaking of registers
> for(dst; dst<end; dst+=4)
>
> would reduce the number of variables by 1
> also
>  const uint32_t v = AV_RN32A(dst) | lut[get_bits(&gb, 4)];
>  AV_WN32A(dst, v);
> might be worth a try
>
> and make sure dst is 4 byte aligned
>   

Great! It works now like a charm now for me, too.
I also got the 3% speedup ;)

Patch attached!

But could you try out and compare the new patch, if it doesn't cause any
side effects?
Load several 8bpp IFF files and see if something is truncated, etc.

I'm far too tired right now, and just falling to bed!
Good night and nice dreams!

-- 

Best regards,
                   :-) Basty/CDGS (-:

-------------- next part --------------
A non-text attachment was scrubbed...
Name: iff-opt-dp8-static.patch
Type: text/x-patch
Size: 1985 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100427/324b4692/attachment.bin>



More information about the ffmpeg-devel mailing list