[FFmpeg-devel] [PATCH] Heavy optimization of IFF decoder

Michael Niedermayer michaelni
Tue Apr 27 22:16:04 CEST 2010


On Tue, Apr 27, 2010 at 09:59:13PM +0200, Sebastian Vater wrote:
> Ronald S. Bultje a ?crit :
> > Hi,
> >
> > On Tue, Apr 27, 2010 at 3:05 PM, Sebastian Vater
> > <cdgs.basty at googlemail.com> wrote:
> >   
> >> M?ns Rullg?rd a ?crit :
> >>     
> >>> Sebastian Vater <cdgs.basty at googlemail.com> writes:
> >>>       
> >>>>  {
> >>>>      GetBitContext gb;
> >>>>      unsigned int i;
> >>>>      const unsigned b = (buf_size * 8) + bps - 1;
> >>>> +    const unsigned b32 = b & ~3;
> >>>>      init_get_bits(&gb, buf, buf_size * 8);
> >>>> -    for(i = 0; i < b; i++) {
> >>>> +    for(i = 0; i < b32; i += 4) {
> >>>> +        const uint32_t v = decodeplane8_tab[plane][get_bits(&gb, 4)];
> >>>> +        AV_WN32A(dst+i, AV_RN32A(dst+i) | v);
> >>>> +    }
> >>>>
> >>>>         
> >>> I suggest using a local variable here, like this:
> >>>
> >>>     const uint32_t *lut = decodeplane8_tab[plane];
> >>>     [...]
> >>>     uint32_t v = lut[get_bits(...)];
> >>>
> >>> I don't trust gcc to do that on its own.
> >>>
> >>>       
> >> Didn't change anything, i.e. still 20% slower...what now?
> >>     
> >
> > With that, it becomes from 20% slower to 3-4% faster for me. Check
> > your performance again, or increase the amount of cycles in
> > decodeplane8() (as Mans suggested earlier in this thread).
> >   
> 
> Well, do you know what I think now?
> 
> It's just that my GCC 4.2.4 doesn't optimize that as well as
> yours...updated patch attached!

or you are on x86_32 while mans/ronald are on x86_64 and gcc runs out
of registers (i didnt look at the asm so i could be totally wrong)

[..]
> -    for(i = 0; i < b; i++) {
> +    for(i = 0; i < b32; i += 4) {
> +        const uint32_t v = lut[get_bits(&gb, 4)];
> +        AV_WN32A(dst+i, AV_RN32A(dst+i) | v);
> +    }

speaking of registers
for(dst; dst<end; dst+=4)

would reduce the number of variables by 1
also
 const uint32_t v = AV_RN32A(dst) | lut[get_bits(&gb, 4)];
 AV_WN32A(dst, v);
might be worth a try

and make sure dst is 4 byte aligned

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

He who knows, does not speak. He who speaks, does not know. -- Lao Tsu
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100427/303e10c6/attachment.pgp>



More information about the ffmpeg-devel mailing list