[MPlayer-dev-eng] spp deblocking GREAT optimization !!!
Nikolaj Poroshin
nialpof at pisem.net
Fri Sep 3 17:28:09 CEST 2004
Hello,
>> NP> You can get a 4-6x speedup of the SPP fiter by decomposing vert and
>> NP> horiz 1d dct/idct and decimating horizontal ones. This way, number of
>> NP> horiz passes will be 4 times for 4 & 5 levels, or 8 times for 6 level
>> NP> (which is rather useless, as noted in the original paper) lower.
>> NP> Vert passes are more suitable for optimization :)
>>
>> NP> Next, you can use implied non-flat treshold with AAN dct w/o scale.
>> NP> (BTW, it is an interesing question - which treshold matrix provides
>> NP> best psnr?)
> if u did implement it and it is faster while still giving the same result
> besides rounding differences, u really should submit the code
> if u didnt implement it, u should, instead of asking others, we really dont
> lack ideas but code & time to write it
It is very similar to the original code. And it is more than 4 times
faster.
Actually, it matches the original code results with this threshold :
static void hardthresh_c(DCTELEM dst[64], DCTELEM src[64], int qp, uint8_t *permutation){
int i,t;
int bias= 0; //FIXME
uint32_t thresholdm[64];
t=qp*((1<<4) - bias) - 1;
thresholdm[0]=t;
for (j=1;j<8;j++)
thresholdm[j]=(int)rint(t / (cosl(j*acosl(-1.0)/(long double)16.0)*sqrtl(2)));
for (i=1;i<8;i++) {
thresholdm[i*8]=(int)rint(t / (cosl(i*acosl(-1.0)/(long double)16.0)*sqrtl(2)));
for (j=1;j<8;j++) {
thresholdm[i*8+j]=(int)rint(t /
((cosl(i*acosl(-1.0)/(long double)16.0)*sqrtl(2))*
(cosl(j*acosl(-1.0)/(long double)16.0)*sqrtl(2))));
} }
memset(dst, 0, 64*sizeof(DCTELEM));
dst[0]= (src[0] + 4)>>3;
for(i=1; i<64; i++){
int level= src[i];
if(((unsigned)(level+thresholdm[i]))>2*thresholdm[i]){
const int j= permutation[i];
dst[j]= (level + 4)>>3;
}
}
}
(See the "Next, you can use implied ..." above)
For original threshold level of 100, it gives this threshold matrix:
100 72 77 85 100 127 185 362
72 52 55 61 72 92 133 261
77 55 59 65 77 97 141 277
85 61 65 72 85 108 157 308
100 72 77 85 100 127 185 362
127 92 97 108 127 162 235 461
185 133 141 157 185 235 341 670
362 261 277 308 362 461 670 1314
Can somebody compare PSNR of this code and original flat threshold ?
Or even determine optimum matrix (which will be multiplied by
quantizer) PSNR-wise? This will be very interesting. Probably genetic
algorithms can help :). I practically can't do this myself.
--
Best regards,
Nikolaj mailto:nialpof at pisem.net
More information about the MPlayer-dev-eng
mailing list