[FFmpeg-devel] [PATCH] Faster ff_sqrt()
Michael Niedermayer
michaelni
Sun Jan 13 21:14:10 CET 2008
On Sun, Jan 13, 2008 at 08:49:17PM +0100, Michael Niedermayer wrote:
> On Sun, Jan 13, 2008 at 08:13:00PM +0100, Michael Niedermayer wrote:
> > On Sun, Jan 13, 2008 at 05:50:52PM +0100, Vitor Sessak wrote:
> > > Hi,
> > >
> > > I saw in the thread "Copy-on-write pages statistics" that roqaudioenc.c
> > > is a big copy-on-write memory usage offender. I think that
> > > increasing the memory usage of FFmpeg by 16kb just for a table used in a
> > > game format encoder is not a good idea. So I'm sending a new, faster
> > > ff_sqrt(), so I can remove the table completely without too much speed
> > > loss.
> > >
> > > Unfortunately, even with the new ff_sqrt, the encoder is not as fast as
> > > before :
> > > (for a 3 min wav file input)
> > >
> > > Before:
> > > 0m0.340s
> > >
> > > After:
> > > 0m0.556s
> > >
> > > But it's the audio encoder of a VQ codec and it takes forever to encode the
> > > video. So I don't think a few ms more to encode the audio can make any
> > > difference...
> > >
> > > The code for ff_sqrt() was taken from public domain code at
> > > http://atoms.alife.co.uk/sqrt/ (Warning: original in java).
> > [...]
> > > +/**
> > > + * Fast square-root. Completely accurate for x < 2147483648 (i.e. 2^31)...
> > > + * Code from http://atoms.alife.co.uk/sqrt/
> > > + */
> > > +static inline int ff_sqrt(int x) {
> > > + int xn;
> > >
> > ...
> > > + if (x >= 0x10000) {
> > > + if (x >= 0x1000000) {
> > > + if (x >= 0x10000000) {
> > > + if (x >= 0x40000000) {
> > > + xn = ff_sqrt_tab[x >> 24] << 8;
> > > + } else {
> > > + xn = ff_sqrt_tab[x >> 22] << 7;
> > > + }
> > > + } else {
> > > + if (x >= 0x4000000) {
> > > + xn = ff_sqrt_tab[x >> 20] << 6;
> > > + } else {
> > > + xn = ff_sqrt_tab[x >> 18] << 5;
> > > + }
> > > + }
> > >
> > ...
> > > + xn = (xn + 1 + (x / xn)) >> 1;
> > > + xn = (xn + 1 + (x / xn)) >> 1;
> > > + return ((xn * xn) > x) ? --xn : xn;
> > > + } else {
> > > + if (x >= 0x100000) {
> > > + if (x >= 0x400000) {
> > > + xn = ff_sqrt_tab[x >> 16] << 4;
> > > + } else {
> > > + xn = ff_sqrt_tab[x >> 14] << 3;
> > > + }
> > > + } else {
> > > + if (x >= 0x40000) {
> > > + xn = ff_sqrt_tab[x >> 12] << 2;
> > > + } else {
> > > + xn = ff_sqrt_tab[x >> 10] << 1;
> > > + }
> > > + }
> > > +
> > > + xn = (xn + 1 + (x / xn)) >> 1;
> > > +
> > > + return ((xn * xn) > x) ? --xn : xn;
> > > }
> > > + } else {
> > > + if (x >= 0x100) {
> > > + if (x >= 0x1000) {
> > > + if (x >= 0x4000) {
> > > + xn = (ff_sqrt_tab[x >> 8]) + 1;
> > > + } else {
> > > + xn = (ff_sqrt_tab[x >> 6] >> 1) + 1;
> > > + }
> > > + } else {
> > > + if (x >= 0x400) {
> > > + xn = (ff_sqrt_tab[x >> 4] >> 2) + 1;
> > > + } else {
> > > + xn = (ff_sqrt_tab[x >> 2] >> 3) + 1;
> > > + }
> > > + }
> > > +
> > > + return ((xn * xn) > x) ? --xn : xn;
> > > + } else {
> > > + if (x >= 0) {
> > > + return ff_sqrt_tab[x] >> 4;
> > > + }
> > > + }
> > > }
> > > - return ret;
> > > +
> > > + return -1;
> >
> > for a less idiotic and simpler variant i wrote a few years ago see
> > http://guru.multimedia.cx/fast-integer-square-root/
> >
> > if anyone can simplify this further that would be welcome :)
>
> noone? well heres a simpler variant, though possibly slower (didnt benchmark)
>
> static inline unsigned int sqrt4(unsigned int a)
> {
> unsigned int b;
>
> if(a<(1<<16)){
> if(a<(1<<10)-3) b=sqrt_tab[(a+ 3)>>2 ]>>3;
> else{
> if(a<(1<<14)-28) b=sqrt_tab[(a+28)>>6 ]>>1;
> else b=sqrt_tab[ a >>8 ];
> }
> }else{
> int s= (av_log2(a)-12)>>1;
> b= sqrt_tab[a>>(2*s+6)];
> b= (FASTDIV(a,b)>>s) + (b<<(s-2));
> }
>
> if(a<b*b) b--;
>
> return b;
> }
next one, just reordering the if() to make smaller values faster and some
cosmetics
static inline unsigned int sqrt3(unsigned int a)
{
unsigned int b;
if (a<(1<<10)- 3) b=sqrt_tab[(a+ 3)>>2 ]>>3;
else if(a<(1<<14)-28) b=sqrt_tab[(a+28)>>6 ]>>1;
else if(a<(1<<16) ) b=sqrt_tab[ a >>8 ] ;
else{
int s= av_log2(a)>>1;
b= sqrt_tab[a>>(2*s-6)];
b= (FASTDIV(a,b)>>(s-6)) + (b<<(s-8));
}
return b - (a<b*b);
}
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
I wish the Xiph folks would stop pretending they've got something they
do not. Somehow I fear this will remain a wish. -- M?ns Rullg?rd
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080113/50073997/attachment.pgp>
More information about the ffmpeg-devel
mailing list