[MPlayer-dev-eng] [OT] C-code Optimiation Contest

Tue Jul 15 14:37:45 CEST 2003

Felix Buenemann <atmosfear at users.sourceforge.net> writes:

> Hi,
>
> if you like to have some fun, try optimizing the attached simple matrix 
> multiply and post your results.
>
> The Rules:
> 1. you may only modify multiply_d.[ch] (NUM should stay 512 though)
> 2. you may change compiler and optims in Makefile
> 3. precision must stay the same
> 4. to compare, you should compile the orginal code:
>     make && copy matrix matrix.org && (./matrix.org >res.org)
> 5. then later you can compare results via:
>    make && (./matrix >res.txt) && diff -q res.org res.txt

only by changing the Algorithm (and using the -03-option) I got a
speedup of factor 14x on my Athlon. This should (more or less)
independent from the CPU (maybe some Pentium guys can check this). If
you now make specific optimzing you should get even more. Its just a
lousy implementation kind of Strasser-Algorithm. If you take the
original Algorithm you get even more speed, because, it needs only 7
recursions instead of 8 (I only wanted to do it fast and short, so I
took an easy variant of it).  So here is the code:

/* simple matrix multiply */
#include "multiply_d.h"

int i,j,k;

void rek_multiply(double a[][DIM], double b[][DIM], double c[][DIM],
		  int azeile, int aspalte, 
		  int bzeile, int bspalte, 
		  int czeile, int cspalte, 
		  int size)
{
  /* first rekursion finishen */
  if (size==8)
    {
      for(i=0; i<8; i++) 
	for(j=0; j<8; j++) 
	  for(k=0; k<8; k++) 
	    c[i+czeile][j+cspalte] += 
	      a[i+azeile][k+aspalte] * b[k+bzeile][j+bspalte];
      return ;
    }

  size/=2;

  rek_multiply(a,b,c, azeile, aspalte, bzeile, bspalte, 
	       czeile, cspalte, size);
  rek_multiply(a,b,c, azeile+size, aspalte, bzeile, bspalte, 
	       czeile+size, cspalte, size);
  rek_multiply(a,b,c, azeile, aspalte, bzeile, bspalte+size, 
		czeile, cspalte+size, size);
  rek_multiply(a,b,c, azeile+size, aspalte, bzeile, bspalte+size,
		czeile+size, cspalte+size, size);

  rek_multiply(a,b,c, azeile, aspalte+size, bzeile+size, bspalte, 
	       czeile, cspalte, size);
  rek_multiply(a,b,c, azeile+size, aspalte+size, bzeile+size, bspalte, 
	       czeile+size, cspalte, size);
  rek_multiply(a,b,c, azeile, aspalte+size, bzeile+size, bspalte+size, 
		czeile, cspalte+size, size);
  rek_multiply(a,b,c, azeile+size, aspalte+size, bzeile+size, bspalte+size,
		czeile+size, cspalte+size, size);

}

void multiply(double a[][DIM], double b[][DIM], double c[][DIM]) {

  rek_multiply(a,b,c, 0, 0, 0,0, 0,0, DIM);

}

Florian

PS: Is there somewhere in the mplayer-source a matrix-multiplication,
so that we arent offtopic at all?
-- 
int m,u,e=0;float l,_,I;main(){for(;1840-e;putchar((++e>907&&942>e?61-m:u)
["\t#*fg-pa.vwCh`lwp-e+#h`lwP##mbjqloE"]^3))for(u=_=l=0;79-(m=e%80)&&
I*l+_*_<6&&26-++u;_=2*l*_+e/80*.09-1,l=I)I=l*l-_*_-2+m/27.;}