[FFmpeg-devel] pre discussion around Blackfin dct_quantize_bfin routine

Tue Jun 12 11:28:09 CEST 2007

Hello,
On Mon, Jun 11, 2007 at 10:58:00PM -0400, Marc Hoffman wrote:
> This is my rough draft it works well with the small precision errors  
> due to 16bit arithmetic, actually its the same (as far as I can tell)  
> as the MMX quantizer. I ran my reference fixpoint against the mmx and  
> it seems to work on the cases I tried, its 1 bit off which seems  
> about right considering the truncation of the coeffs from 22 to  
> 16bits.  Can someone look over my shoulder and give a peak. I know I  
> have to clean up the codes a bit but I wanted to get someone else to  
> review my work up until now.  I have used inline asm, Michael this  
> should be pretty straight forward for you to get your head around.   
> This doubles the performance roughly.

Hmm. Ok, most is cleanup suggestions so I might be telling you things
you already know, still...

> #define clock()      ({ int _t; asm volatile ("%0=cycles;" : "=d" (_t)); _t; })

If you leave that in the final version you should probably use a less
collision-freindly name like bfin_clock or so.
Also that ({ }) construction is a nasty gcc-ism, you could avoid it by
making it a static always_inline fuction, though that increases the risk
of the compiler messing up...
And actually, maybe you could just implement START_TIMER and STOP_TIMER
in libavutil (or does that do something else)?
If not, maybe naming it START_PROF and END_PROF is at least more
consistent, and least the EPROF name is not quite obvious in its meaning
;-)

[...]
>            av_log (0,0,"%-20s: %12.4f\t%12.4f\n", TelemNames[i],v,v/64);
>    av_log (0,0,"%-20s: %12.4f\t%12.4f\n%20.4f\t%d\n", "total",s/TelemCnt,s/TelemCnt/64,s,TelemCnt);

it might have the same result but using NULL and one of the loglevel
defines as first av_log arguments instead of 0 looks probably better

> #define abs(x) (((x)<0)?-(x):(x))

I think FFABS from libavutil does exactly the same...

> #define L1CODE __attribute__ ((l1_text))

for consistency, I'd name it e.g. attribute_l1_text

[...]
>    /*  for(i=start_i; i<64; i++) {                           */
>    /*      sign     = (block[i]>>15)|1;                      */
>    /*      level    = ((abs(block[i])+bias[0])*qmat[i])>>16; */
>    /*      if (level < 0) level = 0;                         */
>    /*      max     |= level;                                 */
>    /*      level    = level * sign;                          */
>    /*      block[i] = level;                                 */
>    /*  } */

below you put the C code to the right of the asm. I'd say make it
consistent (possibly just remove it, but if not C is not necessarily the
best way to annotate it).

Unfortunately I don't know what it does so I'm not sure it really makes
sense, but I'd suggest splitting the long asm lines, e.g.:
>                  "     r1=abs r1 (v)                                    || r2=[%2++];\n\t"

vs.
>                   "     r1=abs r1 (v)"
>                   "       || r2=[%2++];\n\t"

or

>                  "     r1.h=(a1 =r1.h*r2.h), r1.l=(a0 =r1.l*r2.l) (tfu);             \n\t"
vs.
>                  "     r1.h=(a1 =r1.h*r2.h),
>                  "     r1.l=(a0 =r1.l*r2.l)
>                  "(tfu);             \n\t"

The best way of doing this of course depends on the actual meaning of
this, and my suggestions might be rather stupid thus ;-)

Greetings,
Reimar Doeffinger