[MPlayer-dev-eng] New opensourced video codec NODAEMON

Sun Jul 25 22:45:30 CEST 2004

On Thu, 22 Jul 2004 22:06:33 +0900
Attila Kinali <attila at kinali.ch> wrote:

> On Tue, May 11, 2004 at 02:57:16PM +0200, Tomas Mraz wrote:
> > http://sourceforge.net/projects/dirac
> Has someone had a look at this ?

It's a work in progress.
At the moment the quality is good (similar to divx, I'd say), but
the speed is very low because the algorithm is complex and the code
not optimized (less than 1fps in my tests).
Very promising, nonetheless.
In a few words: divx is motion compensation+DCT, dirac is
motion compensation+wavelet (a better motion compensation, indeed).

> And while we are at this, does someone know a good
> (meaning easy to understand but complete) introduction to
> wavelet based auido/video coding ?

I've some experience on the matter.
What I've found is that the wavelet theory can be approached
either in a purely mathematical style (talking about "mother wavelet"
and "infinite series") or in a digital signal processing style
(talking about "filters" and "resampling").

More documentation can be found on the first approach, but the
second approach is IMNSHO easier to understand for people with
a dsp background (that is, with knowledge on band-limited signals,
sampling, filters, frequency domain).

I don't have a link to a good introduction, but the idea is extremely
simple.

1) Consider a sampled monodimensional signal (e.g. audio); you have
N numbers.

2) Apply a low pass filter (let's call it L1) and a high pass filter (H1);
you have two sequences of N numbers.

3) As these two sequences are oversampled, discard every second sample;
you have two sequences of N/2 numbers (so N numbers again, overall).

Now the important part: the two sequences of N/2 numbers are equivalent
to the original sequence, because you can:

4) Oversample them inserting a zero after each of the samples;
you have two sequences of N numbers.

5) Apply a low pass filter (L2) to the first one, a high pass filter (H2)
to the second one; you have two sequences of N numbers.

6) Add the two sequences; you have N numbers.

The final result is identical to the original sequence if some conditions
are met by L1, H1, L2, H2 (this is were some reasoning in frequency domain
is necessary).

Step 1-3 are a wavelet transform, step 4-6 are an inverse wavelet transform.

But why is this useful for compression? Because:
a) the two N/2 sequences have different statistics (so custom entropy
coders can take advantage of that)
b) the first one is an approximation of the signal, while the second one
contains the extra details; you can heavily quantize the second without
much degradation

You usually apply the transformation again to the first N/2 sequence,
obtaining two N/4 sequences. So you have (N/4+N/4)+N/2 samples (brutal
approximation, big details, small details), and so on.

Example: we will use x0+x1 as low pass filter and x0-x1 as high pass filter.
A sequence of eight numbers:
  s= a b c d e f g h (8 samples)
After ome step of transformation:
  sl= a+b c+d e+f g+h (4 samples)
  sh= a-b c-d e-f g-h (4 samples)
After a second step transform:
  sll= a+b+c+d e+f+g+h (2 samples)
  slh= a+b-c-d e+f-g-h (2 samples)
  sh= a-b c-d e-f g-h (4 samples)
It is not difficult to understand that you can reverse
the calculation and find s.

This has been a very crude simplification, in real world we have
to consider that:

- the lenght of the filters is greater than 2 and the filters are
accurately designed
- for bidimension signals (images) you split a N*N into four N/2*N/2
(approximation, horizontal detail, vertical detail, diagonal detail);
this is usually done by filtering along x and then along y
- the number of steps can be higher (you could reduce the first
sequence to one sample only, if you want)
- you have to be careful when working near the edges because
longer filters may need samples number -1, -2 or N+1, N+2..., which
you don't have

The detail extraction strategy works very well on pictures, because
all smooth areas have almost zero details.

In conclusion you split the signal into particular sub bands and
quantize/encode each of them at your desire.
The real power of the wavelet approach is that you don't have
to choose a fixed number upon everything depends (such as size of
DCT blocks) and that you have no blocking artifacts at all.
The first thing means that you have no intrinsic limit in your
compression factor (a 8x8 DCT approach has a sort of 64:1 limit).
The second thing means that heavy compression just blurs the image
(DCT needs postprocessing to smooth block edges).

The whole process is a sort of DCT with adaptive block size
(e.g. 64x64 on the sky, 2x2 on the grass) and with partially
overlapped blocks.

I hope this tutorial was simple enough.

-- 
   Roberto Ragusa    mail at robertoragusa.it