[FFmpeg-devel] [PATCH] RealAudio 14.4K encoder

Sat May 22 15:18:45 CEST 2010

On Sat, 2010-05-22 at 03:27 +0200, Michael Niedermayer wrote:
> On Sun, May 16, 2010 at 04:46:37PM +0200, Francesco Lavra wrote:
> > > > > > +    /**
> > > > > > +     * TODO: orthogonalize the best entry of the adaptive codebook with the
> > > > > > +     * basis vectors of the first fixed codebook, and the best entry of the
> > > > > > +     * first fixed codebook with the basis vectors of the second fixed codebook.
> > > > > > +     */
> > > > > 
> > > > > yes, also shouldnt the search be iterative instead of just one pass?
> > > > 
> > > > I tried inserting several iteration runs to find the optimal entries of
> > > > the fixed codebooks, but rarely the entries found on the second and
> > > > subsequent iterations are different from the first chioces, and in any
> > > > case I couldn't hear any improvement in quality, so the iterative method
> > > > doesn't seem to bring any added value.
> > > 
> > > did you orthogonalize the entries?
> > 
> > I didn't, but now I have. 
> 
> how much PSNR improvement has this brought?

See below.

> > No iterative search is performed, since in the
> > algorithm at http://focus.ti.com/lit/an/spra136/spra136.pdf there is no
> > mention of multiple iterations.
> 
> are you serious?
> because some 16 year old paper that smells a bit like a description of a low
> complexity hardware encoder doesnt do it you dont even try?
> hell if we where designing video encoders like that ...
> 
> dont missunderstand me here, iam perfectly ok if you dont want to try this
> because its too much work or some other reason. but just because some
> paper doesnt mention it uhm

Once we agree on the best approach for encoding, I'll give it a try.
See below.

> [...]
> > +/**
> > + * Quantizes a value by searching a sorted table for the element with the
> > + * nearest value
> > + *
> > + * @param value value to quantize
> > + * @param table array containing the quantization table
> > + * @param size size of the quantization table
> > + * @return index of the quantization table corresponding to the element with the
> > + *         nearest value
> > + */
> > +static int quantize(int value, const int16_t *table, unsigned int size)
> > +{
> > +    int error;
> > +    int index;
> > +    unsigned int low = 0, high = size - 1;
> > +
> > +    while (1) {
> 
> > +        index = (low + high) >> 1;
> > +        error = table[index] - value;
> 
> declaration and initialization can be merged

If you meant declaring index and error inside the loop, fixed locally.

> > +        if (index == low)
> > +            return table[high] + error > value ? low : high;
> > +        if (error > 0) {
> > +            high = index;
> > +        } else {
> > +            low = index;
> > +        }
> > +    }
> > +}
> > +
> > +
> 
> > +/**
> > + * Orthogonalizes a vector to another vector
> > + *
> > + * @param v vector to orthogonalize
> > + * @param u vector against which orthogonalization is performed
> > + */
> > +static void orthogonalize(float *v, const float *u)
> 
> missing const

Vector v is not constant.
Or do you mean something like (float * const v, const float * const u)?

> > +{
> > +    int i;
> > +    float num = 0, den = 0;
> > +
> > +    for (i = 0; i < BLOCKSIZE; i++) {
> > +        num += v[i] * u[i];
> > +        den += u[i] * u[i];
> > +    }
> 
> > +    for (i = 0; i < BLOCKSIZE; i++)
> > +        v[i] -= (num * u[i]) / den;
> 
> num /= den;
> for (i = 0; i < BLOCKSIZE; i++)
>     v[i] -= num * u[i];

Fixed locally.

> [...]
> > +/**
> > + * Searches the adaptive codebook for the best entry and gain and removes its
> > + * contribution from input data
> > + *
> > + * @param adapt_cb array from which the adaptive codebook is extracted
> > + * @param work array used to calculate LPC-filtered vectors
> > + * @param coefs coefficients of the LPC filter
> > + * @param data input data
> > + * @return index of the best entry of the adaptive codebook
> > + */
> > +static int adaptive_cb_search(const int16_t *adapt_cb, float *work,
> > +                              const float *coefs, float *data)
> > +{
> > +    int i, j, best_vect;
> > +    float score, gain, best_score, best_gain;
> > +    float exc[BLOCKSIZE];
> > +
> > +    gain = best_score = 0;
> > +    for (i = BLOCKSIZE / 2; i <= BUFFERSIZE; i++) {
> > +        for (j = 0; j < FFMIN(BLOCKSIZE, i); j++)
> > +            exc[j] = adapt_cb[BUFFERSIZE - i + j];
> > +        if (i < BLOCKSIZE)
> > +            for (j = 0; j < BLOCKSIZE - i; j++)
> > +                exc[i + j] = adapt_cb[BUFFERSIZE - i + j];
> > +        get_match_score(work, coefs, exc, NULL, NULL, data, &score, &gain);
> > +        if (score > best_score) {
> > +            best_score = score;
> > +            best_vect = i;
> > +            best_gain = gain;
> > +        }
> > +    }
> > +    if (!best_score)
> > +        return 0;
> > +
> > +    /**
> > +     * Re-calculate the filtered vector from the vector with maximum match score
> > +     * and remove its contribution from input data.
> > +     */
> 
> > +    for (j = 0; j < FFMIN(BLOCKSIZE, best_vect); j++)
> > +        exc[j] = adapt_cb[BUFFERSIZE - best_vect + j];
> > +    if (i < BLOCKSIZE)
> > +        for (j = 0; j < BLOCKSIZE - best_vect; j++)
> > +            exc[best_vect + j] = adapt_cb[BUFFERSIZE - best_vect + j];
> 
> code duplication

Will fix it if you like the floating point approach.

> > +    ff_celp_lp_synthesis_filterf(work, coefs, exc, BLOCKSIZE, LPC_ORDER);
> 
> 
> 
> > +    for (i = 0; i < BLOCKSIZE; i++)
> > +        data[i] -= best_gain * work[i];
> > +    return (best_vect - BLOCKSIZE / 2 + 1);
> > +}
> > +
> > +
> 
> > +/**
> > + * Searches the two fixed codebooks for the best entry and gain
> > + *
> > + * @param work array used to calculate LPC-filtered vectors
> > + * @param coefs coefficients of the LPC filter
> > + * @param data input data
> > + * @param cba_idx index of the best entry of the adaptive codebook
> > + * @param cb1_idx pointer to variable where the index of the best entry of the
> > + *        first fixed codebook is returned
> > + * @param cb2_idx pointer to variable where the index of the best entry of the
> > + *        second fixed codebook is returned
> > + */
> > +static void fixed_cb_search(float *work, const float *coefs, float *data,
> > +                            int cba_idx, int *cb1_idx, int *cb2_idx)
> > +{
> > +    int i, j, ortho_cb1;
> > +    float score, gain, best_score, best_gain;
> > +    float cba_vect[BLOCKSIZE], cb1_vect[BLOCKSIZE];
> > +    float vect[BLOCKSIZE];
> > +
> > +    /**
> > +     * The filtered vector from the adaptive codebook can be retrieved from
> > +     * work, because this function is called just after adaptive_cb_search().
> > +     */
> > +    if (cba_idx)
> > +        memcpy(cba_vect, work, sizeof(cba_vect));
> > +
> > +    *cb1_idx = gain = best_score = best_gain = 0;
> > +    for (i = 0; i < FIXED_CB_SIZE; i++) {
> > +        for (j = 0; j < BLOCKSIZE; j++)
> > +            vect[j] = ff_cb1_vects[i][j];
> > +        get_match_score(work, coefs, vect, cba_idx ? cba_vect: NULL, NULL, data,
> > +                        &score, &gain);
> > +        if (score > best_score) {
> > +            best_score = score;
> > +            *cb1_idx = i;
> > +            best_gain = gain;
> > +        }
> > +    }
> > +
> > +    /**
> > +     * Re-calculate the filtered vector from the vector with maximum match score
> > +     * and remove its contribution from input data.
> > +     */
> > +    if (best_gain) {
> > +        for (j = 0; j < BLOCKSIZE; j++)
> > +            vect[j] = ff_cb1_vects[*cb1_idx][j];
> > +        ff_celp_lp_synthesis_filterf(work, coefs, vect, BLOCKSIZE, LPC_ORDER);
> > +        if (cba_idx)
> > +            orthogonalize(work, cba_vect);
> > +        for (i = 0; i < BLOCKSIZE; i++)
> > +            data[i] -= best_gain * work[i];
> > +        memcpy(cb1_vect, work, sizeof(cb1_vect));
> > +        ortho_cb1 = 1;
> > +    } else
> > +        ortho_cb1 = 0;
> > +
> 
> > +    *cb2_idx = best_score = best_gain = 0;
> > +    for (i = 0; i < FIXED_CB_SIZE; i++) {
> > +        for (j = 0; j < BLOCKSIZE; j++)
> > +            vect[j] = ff_cb2_vects[i][j];
> > +        get_match_score(work, coefs, vect, cba_idx ? cba_vect : NULL,
> > +                        ortho_cb1 ? cb1_vect : NULL, data, &score, &gain);
> > +        if (score > best_score) {
> > +            best_score = score;
> > +            *cb2_idx = i;
> > +            best_gain = gain;
> > +        }
> > +    }
> 
> duplicate

Ditto.

> > +}
> > +
> > +
> > +/**
> > + * Encode a subblock of the current frame
> > + *
> > + * @param ractx encoder context
> > + * @param sblock_data input data of the subblock
> > + * @param lpc_coefs coefficients of the LPC filter
> > + * @param rms RMS of the reflection coefficients
> > + * @param pb pointer to PutBitContext of the current frame
> > + */
> > +static void ra144_encode_subblock(RA144Context *ractx,
> > +                                  const int16_t *sblock_data,
> > +                                  const int16_t *lpc_coefs, unsigned int rms,
> > +                                  PutBitContext *pb)
> > +{
> > +#define NUM_BEST_GAINS  5
> > +
> > +    float data[BLOCKSIZE], work[LPC_ORDER + BLOCKSIZE];
> > +    float coefs[LPC_ORDER];
> > +    float zero[BLOCKSIZE], cba[BLOCKSIZE], cb1[BLOCKSIZE], cb2[BLOCKSIZE];
> > +    int16_t cba_vect[BLOCKSIZE], exc[BLOCKSIZE];
> > +    int cba_idx, cb1_idx, cb2_idx, gain;
> > +    int i, n, m[3];
> > +    float g[3];
> > +    float error, best_errors[NUM_BEST_GAINS];
> > +    int indexes[NUM_BEST_GAINS];
> > +
> > +    for (i = 0; i < LPC_ORDER; i++) {
> > +        work[i] = ractx->curr_sblock[BLOCKSIZE + i];
> > +        coefs[i] = lpc_coefs[i] / 4096.0;
> 
> * (1/4096.0);
> multiplies are faster than divides

Fixed locally.

> > +    }
> > +
> > +    /**
> > +     * Calculate the zero-input response of the LPC filter and subtract it from
> > +     * input data.
> > +     */
> > +    memset(data, 0, sizeof(data));
> > +    ff_celp_lp_synthesis_filterf(work + LPC_ORDER, coefs, data, BLOCKSIZE,
> > +                                 LPC_ORDER);
> > +    for (i = 0; i < BLOCKSIZE; i++) {
> > +        zero[i] = work[LPC_ORDER + i];
> > +        data[i] = sblock_data[i] - zero[i];
> > +    }
> > +
> > +    /**
> > +     * Codebook search is performed without taking into account the contribution
> > +     * of the previous subblock, since it has been just subtracted from input
> > +     * data.
> > +     */
> > +    memset(work, 0, LPC_ORDER * sizeof(*work));
> > +
> > +    cba_idx = adaptive_cb_search(ractx->adapt_cb, work + LPC_ORDER, coefs,
> > +                                 data);
> > +    if (cba_idx) {
> > +        /**
> > +         * The filtered vector from the adaptive codebook can be retrieved from
> > +         * work, see implementation of adaptive_cb_search().
> > +         */
> > +        memcpy(cba, work + LPC_ORDER, sizeof(cba));
> > +
> > +        ff_copy_and_dup(cba_vect, ractx->adapt_cb, cba_idx + BLOCKSIZE / 2 - 1);
> > +        m[0] = (ff_irms(cba_vect) * rms) >> 12;
> > +    }
> > +    fixed_cb_search(work + LPC_ORDER, coefs, data, cba_idx, &cb1_idx, &cb2_idx);
> > +    for (i = 0; i < BLOCKSIZE; i++) {
> > +        cb1[i] = ff_cb1_vects[cb1_idx][i];
> > +        cb2[i] = ff_cb2_vects[cb2_idx][i];
> > +    }
> > +    ff_celp_lp_synthesis_filterf(work + LPC_ORDER, coefs, cb1, BLOCKSIZE,
> > +                                 LPC_ORDER);
> > +    memcpy(cb1, work + LPC_ORDER, sizeof(cb1));
> > +    m[1] = (ff_cb1_base[cb1_idx] * rms) >> 8;
> > +    ff_celp_lp_synthesis_filterf(work + LPC_ORDER, coefs, cb2, BLOCKSIZE,
> > +                                 LPC_ORDER);
> > +    memcpy(cb2, work + LPC_ORDER, sizeof(cb2));
> > +    m[2] = (ff_cb2_base[cb2_idx] * rms) >> 8;
> > +
> > +    /**
> > +     * Gain quantization is performed taking the NUM_BEST_GAINS best entries
> > +     * obtained from floating point data and calculating for each entry the
> > +     * actual encoding error with fixed point data.
> > +     */
> > +    for (i = 0; i < NUM_BEST_GAINS; i++) {
> > +        best_errors[i] = FLT_MAX;
> > +        indexes[i] = -1;
> > +    }
> > +    for (n = 0; n < 256; n++) {
> > +        g[1] = ((ff_gain_val_tab[n][1] * m[1]) >> ff_gain_exp_tab[n]) / 4096.0;
> > +        g[2] = ((ff_gain_val_tab[n][2] * m[2]) >> ff_gain_exp_tab[n]) / 4096.0;
> > +        error = 0;
> > +        if (cba_idx) {
> > +            g[0] = ((ff_gain_val_tab[n][0] * m[0]) >> ff_gain_exp_tab[n]) /
> > +                   4096.0;
> > +            for (i = 0; i < BLOCKSIZE; i++) {
> > +                data[i] = zero[i] + g[0] * cba[i] + g[1] * cb1[i] +
> > +                          g[2] * cb2[i];
> > +                error += (data[i] - sblock_data[i]) *
> > +                         (data[i] - sblock_data[i]);
> > +            }
> > +        } else {
> > +            for (i = 0; i < BLOCKSIZE; i++) {
> > +                data[i] = zero[i] + g[1] * cb1[i] + g[2] * cb2[i];
> > +                error += (data[i] - sblock_data[i]) *
> > +                         (data[i] - sblock_data[i]);
> > +            }
> > +        }
> 
> > +        for (i = 0; i < NUM_BEST_GAINS; i++)
> > +            if (error < best_errors[i]) {
> > +                best_errors[i] = error;
> > +                indexes[i] = n;
> > +                break;
> > +            }
> 
> this does not keep the 5 best
> it only gurantees to keep the 1 best

Why? Perhaps you missed the break statement?

> you are testing your changes in terms of PSNR, arent you?
> if not, we need to go back to the last patch and test each change
> individually.
> I  very much prefer naive and slow code compared to optimized but
> untested and thus buggy code. we alraedy have a vorbis and aac encoder
> </rant>

I did test each individual change by measuring the resulting average
encoding error. Now I have re-tested them with tiny_psnr. Here are the
results with 7 different samples.

Fixed point, without orthogonalization, with brute force gain
quantization
stddev:  849.73 PSNR: 37.74 bytes:   200320/   200334
stddev:  983.24 PSNR: 36.48 bytes:   144000/   144014
stddev:  835.19 PSNR: 37.89 bytes:   745280/   745294
stddev: 3737.95 PSNR: 24.88 bytes:  5370880/  5370880
stddev: 2605.75 PSNR: 28.01 bytes:   814400/   814400
stddev: 3634.44 PSNR: 25.12 bytes:   432640/   432640
stddev: 2853.26 PSNR: 27.22 bytes:  1741440/  1741440

Floating point, without orthogonalization, with gain quantization done
the fast way
stddev:  940.92 PSNR: 36.86 bytes:   200320/   200334
stddev: 1010.57 PSNR: 36.24 bytes:   144000/   144014
stddev:  904.31 PSNR: 37.20 bytes:   745280/   745294
stddev: 3753.33 PSNR: 24.84 bytes:  5370880/  5370880
stddev: 2612.23 PSNR: 27.99 bytes:   814400/   814400
stddev: 3638.47 PSNR: 25.11 bytes:   432640/   432640
stddev: 2855.30 PSNR: 27.22 bytes:  1741440/  1741440

Floating point, with orthogonalization, with gain quantization done the
fast way
stddev:  818.14 PSNR: 38.07 bytes:   200320/   200334
stddev:  986.48 PSNR: 36.45 bytes:   144000/   144014
stddev:  811.68 PSNR: 38.14 bytes:   745280/   745294
stddev: 3762.86 PSNR: 24.82 bytes:  5370880/  5370880
stddev: 2635.10 PSNR: 27.91 bytes:   814400/   814400
stddev: 3647.02 PSNR: 25.09 bytes:   432640/   432640
stddev: 2862.79 PSNR: 27.19 bytes:  1741440/  1741440

Floating point, without orthogonalization, with gain quantization done
taking into account the rounding error of the 5 best entries
stddev:  902.10 PSNR: 37.22 bytes:   200320/   200334
stddev:  988.24 PSNR: 36.43 bytes:   144000/   144014
stddev:  854.40 PSNR: 37.70 bytes:   745280/   745294
stddev: 3742.23 PSNR: 24.87 bytes:  5370880/  5370880
stddev: 2611.16 PSNR: 27.99 bytes:   814400/   814400
stddev: 3638.93 PSNR: 25.11 bytes:   432640/   432640
stddev: 2855.00 PSNR: 27.22 bytes:  1741440/  1741440

Floating point, with orthogonalization, with gain quantization done
taking into account the rounding error of the 5 best entries
stddev:  784.07 PSNR: 38.44 bytes:   200320/   200334
stddev:  975.42 PSNR: 36.55 bytes:   144000/   144014
stddev:  788.15 PSNR: 38.40 bytes:   745280/   745294
stddev: 3757.66 PSNR: 24.83 bytes:  5370880/  5370880
stddev: 2632.70 PSNR: 27.92 bytes:   814400/   814400
stddev: 3652.46 PSNR: 25.08 bytes:   432640/   432640
stddev: 2863.04 PSNR: 27.19 bytes:  1741440/  1741440

Which approach do you prefer?