[BoW] r14 - trunk/video_coding/qns.txt
Ramiro Polla
ramiro.polla at gmail.com
Wed Oct 15 17:20:07 CEST 2008
Hi,
On Wed, Oct 15, 2008 at 8:16 AM, darkshikari <subversion at mplayerhq.hu> wrote:
> Author: darkshikari
> Date: Wed Oct 15 12:16:49 2008
> New Revision: 14
>
> Log:
> Quantization noise shaping.
>
> Added:
> trunk/video_coding/qns.txt (contents, props changed)
>
> Added: trunk/video_coding/qns.txt
> ==============================================================================
> --- (empty file)
> +++ trunk/video_coding/qns.txt Wed Oct 15 12:16:49 2008
> @@ -0,0 +1,27 @@
> +Quantization Noise Shaping
> +
> +We have nice algorithms
> like trellis
What's trellis?
> to optimize quantization for optimal rate-distortion score
What's optimal rate-distortion and where does a score fit in?
> , using sum of squared error (PSNR) as the distortion metric.
What's PSNR and what data do I run the sum of squared error with?
> But what about other metrics? PSNR is a special case because the sum of squared error is the same in both frequency and spatial domain, according to Parseval's theorem
What's Parseval's theorem? What's the spatial domain? What's the
frequency domain? How do I get from one to the other?
> . However, this is not true of other metrics, and most spatial quality metrics cannot be measured in the frequency domain--in other words,
> each frequency component of the block
What block?
> is not independent of other frequency components with regard to effect on quality. This presents a problem, as it means the optimal quantization process is no longer polynomial-time, but rather NP, and thus completely intractable.
What's NP? What's intractable?
> +However, this is a solution: the greedy algorithm method known as QNS, or Quantization Noise Shaping. This works as follows:
> +
> +1. For each quantized
What's quantized?
> frequency coefficient, try raising or lowering the coefficient by 1.
> +2. Measure the RD score for each possibility by doing an inverse transform
Inverse of what transform on what?
> and taking the appropriate bit cost and spatial distortion measurements.
> +3. Pick the best possibility of all the ones tried above, and apply it to the quantized coefficients.
> +4. Repeat this process until none of the options give any benefit.
> +
> +There are many shortcuts to improve performance:
> +
> +1. Don't look at any coefficient beyond the last nonzero coefficient.
> +2. Don't look at any possible coefficient that isn't a "round up" or "round down" from the actual coefficient.
> +3. Use an optimized inverse transform that takes into account the fact that only a single coefficient has been changed.
> +4. Use a fast bit cost calculation algorithm, like the one used in trellis.
> +
> +The most common spatial distortion measurement, and the one used in ffmpeg, is
> SSD weighted by local 3x3 variance.
What?
> The purpose of this is to try to hide quantization noise
What's quantization noise?
> in areas with high detail, where it is less noticable.
Why is it less noticeable?
[...]
I don't mean for you to answer these here, but rather try to imagine
what someone will be thinking when reading this.
There is a target audience defined which expects a minimum knowledge
(so some of these questions I asked here don't really have to be
explained in details), but the text in this book is becoming very
terse.
Do we expect this book to be one that might be used by some professor
in a course or a reference for people that are already skilled in
multimedia coding?
Will it have a glossary and appendices? Will it first introduce a
topic, defining what the reader must know before reading a chapter,
what it is and what it is for? And then gradually explain things in
details? Or is the book going to be comprised of very specific and
terse information that can only be parsed by someone that already
knows the subject?
Text like this might scare people away...
Ramiro Polla
More information about the BoW
mailing list