[MPlayer-dev-eng] spp deblocking GREAT optimization !!!

Michael Niedermayer michaelni at gmx.at
Fri Sep 3 13:00:18 CEST 2004


Hi

On Friday 03 September 2004 09:56, Nikolaj Poroshin wrote:
> Hello,
>
> NP> You can get a 4-6x speedup of the SPP fiter by decomposing vert and
> NP> horiz 1d dct/idct and decimating horizontal ones. This way, number of
> NP> horiz passes will be 4 times for 4 & 5 levels, or 8 times for 6 level
> NP> (which is rather useless, as noted in the original paper) lower.
> NP> Vert passes are more suitable for optimization :)
>
> NP> Next, you can use implied non-flat treshold with AAN dct w/o scale.
> NP> (BTW, it is an interesing question - which treshold matrix provides
> NP> best psnr?)
>
> A futher explanation of the major point:
>
> SPP deblocking is a series of 2D 8x8 dct->treshold->idct. For example,
> at 6 (maximal) level, this is performed for all possible 8x8 block
> locations, like row by row (or column by column), with step 1 both on X and
> Y. All results of overlapping idct's are summed.
>
> 2D (i)dct (8x8) can be decomposed into horizontal & vertical passes of
> 1D (i)dct's. Lets take 1 column of SPP process:
>
> -
> - <(1) - <(2)
> -      -
> -      -
>        -
>
> Here "-" is a 1D horizontal dct8 pass. Its obvious, that results of
> the (1) and (2) dct's are the same! (btw, not with all realizations).
> This means decimation of horizontal 1D dct passes.
>
> After vertical 1D dct ->treshold->idct (which I don't know how to
> decimate :) ) :
>
>
> ==> -
> ==> -  +  - <==
> ==> -  +  - <==
> ==> -  +  - <==
>           - <==
>
> Here, "-" is a 1D horizontal Idct8 pass. "==>" and "<==" are the
> 'dataflows' from the vertical part. "+" is a sum of Idct8 passes and a
> place, there the result is gets available. Due to Idct linearity,
> "==> -  +  - <==" can be rewritten as
>
> ==> \
>      +  -
> ==> /
>
> (So, vert passes accumulates data for single horiz idct pass).
> This means decimation of horizontal 1D Idct passes!
>
> Personally I'd like to see & compare _different_ realization, since I
> know, here exist great developers :) So, I rather give ideas, than
> ready code. However, the code is ready - on the Celeron 633 SPP level
> 4 is fully working at the ~512x288 Divx film.
if u did implement it and it is faster while still giving the same result 
besides rounding differences, u really should submit the code
if u didnt implement it, u should, instead of asking others, we really dont 
lack ideas but code & time to write it

-- 
Michael
level[i]= get_vlc(); i+=get_vlc();  (violates patent EP0266049)
median(mv[y-1][x], mv[y][x-1], mv[y+1][x+1]); (violates patent #5,905,535)
buf[i]= qp - buf[i-1];    (violates patent #?)
for more examples, see http://mplayerhq.hu/~michael/patent.html
stop it, see http://petition.eurolinux.org & http://petition.ffii.org/eubsa/en




More information about the MPlayer-dev-eng mailing list