[MPlayer-dev-eng] [OT] C-code Optimiation Contest

Florian-Wolfgang Stock f.stock at t-online.de
Wed Jul 16 15:06:12 CEST 2003


Hello,

Felix Buenemann <atmosfear at users.sourceforge.net> writes:

> On Tuesday 15 July 2003 17:04, Felix Buenemann wrote:
>> On Tuesday 15 July 2003 14:37, Florian-Wolfgang Stock wrote:
>> > Felix Buenemann <atmosfear at users.sourceforge.net> writes:
>>
>> [...florians code...]
>> Not very fast here and the results are wrong:
>> Cycles: 1445112965  5.760
>> Dateien res.org und res.txt sind verschieden.
>> (Files res.org and res.txt differ.)
> I found your bug:
> - rek_multiply(a,b,c, 0, 0, 0,0, 0,0, DIM);
> + rek_multiply(a,b,c, 0, 0, 0,0, 0,0, NUM);

Oh, the solution I posted, was the one where DIM=NUM=512, later as I
said, that I set DIM to 516 I changed of course the call too (And
moved the loop-vars i,j,k into the rek_multiply). Further I changed
the limit, when he uses the simple Matrix-Multiplikation. It looks now
like this:

#define START_SIMPLE 4

void rek_multiply(double a[][DIM], double b[][DIM], double c[][DIM],
                  int azeile, int aspalte,
                  int bzeile, int bspalte,
                  int czeile, int cspalte,
                  int size)
{
  int i,j,k;

  /* first rekursion finishen */
  if (size==START_SIMPLE)
    {
      for(i=0; i<START_SIMPLE; i++)
        for(j=0; j<START_SIMPLE; j++)
          for(k=0; k<START_SIMPLE; k++)
            c[i+czeile][j+cspalte] +=
              a[i+azeile][k+aspalte] * b[k+bzeile][j+bspalte];
      return ;
    }
.....

This START_SIMPLE value decides on which Matrix-Size the Algorithm
should fall back to the simple_multiply. If you use here one of the
other fast versions I think you can speed it up even more.
The Value must be determined experimentaly, but it should be 4, 8, 16
(maybe for a Pentium-Type a different value fits better - someone
should try). Wait...
I just tested on the computer of my girlfriend (she has a Pentium II),
there the value 16 results in a factor of 10x. Indeed there you cant
get so much wich as on Athlon, my Impression is, that the gcc Athlon
Code of the original is not as good as the original Pentium Code,
hence there is the winning in the Pentium Code lower.

Which Processor you use? My was an Athlon, and there the Factor of 33x
was amazing (esp. the speed up of factor 2 bye just changing the DIM).

Florian
-- 
int m,u,e=0;float l,_,I;main(){for(;1840-e;putchar((++e>907&&942>e?61-m:u)
["\t#*fg-pa.vwCh`lwp-e+#h`lwP##mbjqloE"]^3))for(u=_=l=0;79-(m=e%80)&&
I*l+_*_<6&&26-++u;_=2*l*_+e/80*.09-1,l=I)I=l*l-_*_-2+m/27.;}



More information about the MPlayer-dev-eng mailing list