[FFmpeg-devel] [PATCH] Faster ff_sqrt()

Michael Niedermayer michaelni
Sun Jan 13 20:49:17 CET 2008


On Sun, Jan 13, 2008 at 08:13:00PM +0100, Michael Niedermayer wrote:
> On Sun, Jan 13, 2008 at 05:50:52PM +0100, Vitor Sessak wrote:
> > Hi,
> >
> > I saw in the thread "Copy-on-write pages statistics" that roqaudioenc.c
> > is a big copy-on-write memory usage offender. I think that
> > increasing the memory usage of FFmpeg by 16kb just for a table used in a
> > game format encoder is not a good idea. So I'm sending a new, faster
> > ff_sqrt(), so I can remove the table completely without too much speed 
> > loss.
> >
> > Unfortunately, even with the new ff_sqrt, the encoder is not as fast as 
> > before :
> > (for a 3 min wav file input)
> >
> > Before:
> > 0m0.340s
> >
> > After:
> > 0m0.556s
> >
> > But it's the audio encoder of a VQ codec and it takes forever to encode the 
> > video. So I don't think a few ms more to encode the audio can make any 
> > difference...
> >
> > The code for ff_sqrt() was taken from public domain code at 
> > http://atoms.alife.co.uk/sqrt/ (Warning: original in java).
> [...]
> > +/**
> > + * Fast square-root. Completely accurate for x < 2147483648 (i.e. 2^31)...
> > + * Code from http://atoms.alife.co.uk/sqrt/
> > + */
> > +static inline int ff_sqrt(int x) {
> > +    int xn;
> >  
> ...
> > +    if (x >= 0x10000) {
> > +        if (x >= 0x1000000) {
> > +            if (x >= 0x10000000) {
> > +                if (x >= 0x40000000) {
> > +                    xn = ff_sqrt_tab[x >> 24] << 8;
> > +                } else {
> > +                    xn = ff_sqrt_tab[x >> 22] << 7;
> > +                }
> > +            } else {
> > +                if (x >= 0x4000000) {
> > +                    xn = ff_sqrt_tab[x >> 20] << 6;
> > +                } else {
> > +                    xn = ff_sqrt_tab[x >> 18] << 5;
> > +                }
> > +            }
> >  
> ...
> > +            xn = (xn + 1 + (x / xn)) >> 1;
> > +            xn = (xn + 1 + (x / xn)) >> 1;
> > +            return ((xn * xn) > x) ? --xn : xn;
> > +        } else {
> > +            if (x >= 0x100000) {
> > +                if (x >= 0x400000) {
> > +                    xn = ff_sqrt_tab[x >> 16] << 4;
> > +                } else {
> > +                    xn = ff_sqrt_tab[x >> 14] << 3;
> > +                }
> > +            } else {
> > +                if (x >= 0x40000) {
> > +                    xn = ff_sqrt_tab[x >> 12] << 2;
> > +                } else {
> > +                    xn = ff_sqrt_tab[x >> 10] << 1;
> > +                }
> > +            }
> > +
> > +            xn = (xn + 1 + (x / xn)) >> 1;
> > +
> > +            return ((xn * xn) > x) ? --xn : xn;
> >          }
> > +    } else {
> > +        if (x >= 0x100) {
> > +            if (x >= 0x1000) {
> > +                if (x >= 0x4000) {
> > +                    xn = (ff_sqrt_tab[x >> 8]) + 1;
> > +                } else {
> > +                    xn = (ff_sqrt_tab[x >> 6] >> 1) + 1;
> > +                }
> > +            } else {
> > +                if (x >= 0x400) {
> > +                    xn = (ff_sqrt_tab[x >> 4] >> 2) + 1;
> > +                } else {
> > +                    xn = (ff_sqrt_tab[x >> 2] >> 3) + 1;
> > +                }
> > +            }
> > +
> > +            return ((xn * xn) > x) ? --xn : xn;
> > +        } else {
> > +            if (x >= 0) {
> > +                return ff_sqrt_tab[x] >> 4;
> > +            }
> > +        }
> >      }
> > -    return ret;
> > +
> > +    return -1;
> 
> for a less idiotic and simpler variant i wrote a few years ago see
> http://guru.multimedia.cx/fast-integer-square-root/
> 
> if anyone can simplify this further that would be welcome :)

noone? well heres a simpler variant, though possibly slower (didnt benchmark)

static inline unsigned int sqrt4(unsigned int a)
{
    unsigned int b;

    if(a<(1<<16)){
        if(a<(1<<10)-3)      b=sqrt_tab[(a+ 3)>>2 ]>>3;
        else{
            if(a<(1<<14)-28) b=sqrt_tab[(a+28)>>6 ]>>1;
            else             b=sqrt_tab[ a    >>8 ];
        }
    }else{
        int s= (av_log2(a)-12)>>1;
        b= sqrt_tab[a>>(2*s+6)];
        b= (FASTDIV(a,b)>>s) + (b<<(s-2));
    }

    if(a<b*b) b--;

    return b;
}

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Many things microsoft did are stupid, but not doing something just because
microsoft did it is even more stupid. If everything ms did were stupid they
would be bankrupt already.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080113/af89725a/attachment.pgp>



More information about the ffmpeg-devel mailing list