[MPlayer-dev-eng] Cache optimized rotation.

Reimar Döffinger Reimar.Doeffinger at stud.uni-karlsruhe.de
Mon Apr 9 23:26:08 CEST 2007


Hello,

As said by others, unified diff is preferred, and if it is only because
this style is confusing when used to unified diff.

On Thu, Apr 05, 2007 at 01:55:12AM +0200, David Bateman wrote:
> diff -rNc mplayer-checkout-2007-04-04.orig/libmpcodecs/vf_rotate.c mplayer-checkout-2007-04-04/libmpcodecs/vf_rotate.c
> *** mplayer-checkout-2007-04-04.orig/libmpcodecs/vf_rotate.c	2007-04-04 17:13:21.198521895 +0200
> --- mplayer-checkout-2007-04-04/libmpcodecs/vf_rotate.c	2007-04-04 17:13:37.648679316 +0200
> ***************
> *** 16,23 ****
> --- 16,33 ----
>       int direction;
>   };
>   
> + 
> + #ifndef ROTATE_BLOCK_SIZE
> + /* This needs to be adapted on a platform by platform basis */

comments should be doxygen-compatible

> + #define ROTATE_BLOCK_SIZE 8
> + #endif

pointless ifndef

> + 
> + #define MIN_BLOCK_SIZE 4
> + static unsigned char rotate_block [4 * ROTATE_BLOCK_SIZE * ROTATE_BLOCK_SIZE];

Shouldn't use non-constant global variables in filters.

> + 
>   static void rotate(unsigned char* dst,unsigned char* src,int dststride,int srcstride,int w,int h,int bpp,int dir){
>       int y;
> + 
>       if(dir&1){

cosmetics

>   	src+=srcstride*(w-1);
>   	srcstride*=-1;
> ***************
> *** 27,32 ****
> --- 37,188 ----
>   	dststride*=-1;
>       }
>   
> +   if (! (h % MIN_BLOCK_SIZE) && ! (w % MIN_BLOCK_SIZE))

do not use division unless it is absolutely unavoidable.
Just require a power of two and you don't need any division.

> +     {
> +       int x, i, j;
> +       int N = MIN_BLOCK_SIZE;

Usually all-big-letter names are only used for defines.

> +       
> +       while (N < ROTATE_BLOCK_SIZE && !(h % (N << 1)) && !(w % (N << 1)))
> + 	N <<= 1;

Please use only tabs or only space for indentation, don't mix them.

> +       /* Use a block rotate algorithm where an NxN block is copied to  */
> +       /* temporary memory and then recopied to the destination. This   */
> +       /* avoids many of the cache misses of the un-blocked rotation.   */
> +       /* The optimal blocksize depends on the number of L1 cache lines */
> +       /* and so should be independent of the bits per pixel.           */
> +       switch(bpp){
> +       case 1:

These cases are similar enough that you should use a inline function or
macro instead of duplicating the code.

> + 	{
> + 	  for(y=0;y<h;y+=N){
> + 	    unsigned char *dp = dst + dststride * y;
> + 	    unsigned char *sp = src + y;
> + 	    for(x=0;x<w;x+=N){
> + 	      unsigned char *dq = dp;
> + 	      unsigned char *sq = sp;
> + 	      unsigned char *p = rotate_block;
> + 	      for (i=0;i<N;i++) {
> + 		for (j=0;j<N;j++) {
> + 		  *p++ = sq[j];
> + 		}

combining *?++ and [j] ways for accessing stuff usually produces
suboptimal code (even more so on 64 bit systems), might be worth
checking the generated code in a disassembler.
Should also try something like
p += N;
sq += N;
j = -N;
do {
  p[j] = sq[j];
} while (++j);
(this assumes you assure N >= 1)

> diff -rNc mplayer-checkout-2007-04-04.orig/libmpcodecs/vf_rotate.c mplayer-checkout-2007-04-04/libmpcodecs/vf_rotate.c
> *** mplayer-checkout-2007-04-04.orig/libmpcodecs/vf_rotate.c	2007-04-04 17:13:21.198521895 +0200
> --- mplayer-checkout-2007-04-04/libmpcodecs/vf_rotate.c	2007-04-04 18:15:43.128679316 +0200
> ***************
> *** 27,52 ****
>   	dststride*=-1;
>       }
>   
> !     for(y=0;y<h;y++){

I prefer waiting for a unified diff instead of trying to figure out how
to read that...

Greetings,
Reimar Döffinger



More information about the MPlayer-dev-eng mailing list