[FFmpeg-devel] [PATCH] Heavy optimization of IFF decoder

Ronald S. Bultje rsbultje
Tue Apr 27 22:06:02 CEST 2010


Hi,

On Tue, Apr 27, 2010 at 3:59 PM, Sebastian Vater
<cdgs.basty at googlemail.com> wrote:
> Ronald S. Bultje a ?crit :
>> On Tue, Apr 27, 2010 at 3:05 PM, Sebastian Vater
>> <cdgs.basty at googlemail.com> wrote:
>>> M?ns Rullg?rd a ?crit :
>>>
>>>> Sebastian Vater <cdgs.basty at googlemail.com> writes:
>>>>
>>>>> ?{
>>>>> ? ? ?GetBitContext gb;
>>>>> ? ? ?unsigned int i;
>>>>> ? ? ?const unsigned b = (buf_size * 8) + bps - 1;
>>>>> + ? ?const unsigned b32 = b & ~3;
>>>>> ? ? ?init_get_bits(&gb, buf, buf_size * 8);
>>>>> - ? ?for(i = 0; i < b; i++) {
>>>>> + ? ?for(i = 0; i < b32; i += 4) {
>>>>> + ? ? ? ?const uint32_t v = decodeplane8_tab[plane][get_bits(&gb, 4)];
>>>>> + ? ? ? ?AV_WN32A(dst+i, AV_RN32A(dst+i) | v);
>>>>> + ? ?}
>>>>>
>>>>>
>>>> I suggest using a local variable here, like this:
>>>>
>>>> ? ? const uint32_t *lut = decodeplane8_tab[plane];
>>>> ? ? [...]
>>>> ? ? uint32_t v = lut[get_bits(...)];
>>>>
>>>> I don't trust gcc to do that on its own.
>>>>
>>>>
>>> Didn't change anything, i.e. still 20% slower...what now?
>>>
>>
>> With that, it becomes from 20% slower to 3-4% faster for me. Check
>> your performance again, or increase the amount of cycles in
>> decodeplane8() (as Mans suggested earlier in this thread).
>>
>
> Well, do you know what I think now?
>
> It's just that my GCC 4.2.4 doesn't optimize that as well as
> yours...updated patch attached!

Patch is fine with me. I'll await Michael's review before I apply this
one. ;-). Also, it'd be good if you could post your performance data
here for us to review (mine went from 7.6k dezicycles in your original
patch to 7.4k dezicycles with this one, Mans had similar improvements
on ARM).

Ronald



More information about the ffmpeg-devel mailing list