[FFmpeg-devel] [PATCH] RealAudio 14.4K encoder
Francesco Lavra
francescolavra
Sat May 22 15:18:45 CEST 2010
On Sat, 2010-05-22 at 03:27 +0200, Michael Niedermayer wrote:
> On Sun, May 16, 2010 at 04:46:37PM +0200, Francesco Lavra wrote:
> > > > > > + /**
> > > > > > + * TODO: orthogonalize the best entry of the adaptive codebook with the
> > > > > > + * basis vectors of the first fixed codebook, and the best entry of the
> > > > > > + * first fixed codebook with the basis vectors of the second fixed codebook.
> > > > > > + */
> > > > >
> > > > > yes, also shouldnt the search be iterative instead of just one pass?
> > > >
> > > > I tried inserting several iteration runs to find the optimal entries of
> > > > the fixed codebooks, but rarely the entries found on the second and
> > > > subsequent iterations are different from the first chioces, and in any
> > > > case I couldn't hear any improvement in quality, so the iterative method
> > > > doesn't seem to bring any added value.
> > >
> > > did you orthogonalize the entries?
> >
> > I didn't, but now I have.
>
> how much PSNR improvement has this brought?
See below.
> > No iterative search is performed, since in the
> > algorithm at http://focus.ti.com/lit/an/spra136/spra136.pdf there is no
> > mention of multiple iterations.
>
> are you serious?
> because some 16 year old paper that smells a bit like a description of a low
> complexity hardware encoder doesnt do it you dont even try?
> hell if we where designing video encoders like that ...
>
> dont missunderstand me here, iam perfectly ok if you dont want to try this
> because its too much work or some other reason. but just because some
> paper doesnt mention it uhm
Once we agree on the best approach for encoding, I'll give it a try.
See below.
> [...]
> > +/**
> > + * Quantizes a value by searching a sorted table for the element with the
> > + * nearest value
> > + *
> > + * @param value value to quantize
> > + * @param table array containing the quantization table
> > + * @param size size of the quantization table
> > + * @return index of the quantization table corresponding to the element with the
> > + * nearest value
> > + */
> > +static int quantize(int value, const int16_t *table, unsigned int size)
> > +{
> > + int error;
> > + int index;
> > + unsigned int low = 0, high = size - 1;
> > +
> > + while (1) {
>
> > + index = (low + high) >> 1;
> > + error = table[index] - value;
>
> declaration and initialization can be merged
If you meant declaring index and error inside the loop, fixed locally.
> > + if (index == low)
> > + return table[high] + error > value ? low : high;
> > + if (error > 0) {
> > + high = index;
> > + } else {
> > + low = index;
> > + }
> > + }
> > +}
> > +
> > +
>
> > +/**
> > + * Orthogonalizes a vector to another vector
> > + *
> > + * @param v vector to orthogonalize
> > + * @param u vector against which orthogonalization is performed
> > + */
> > +static void orthogonalize(float *v, const float *u)
>
> missing const
Vector v is not constant.
Or do you mean something like (float * const v, const float * const u)?
> > +{
> > + int i;
> > + float num = 0, den = 0;
> > +
> > + for (i = 0; i < BLOCKSIZE; i++) {
> > + num += v[i] * u[i];
> > + den += u[i] * u[i];
> > + }
>
> > + for (i = 0; i < BLOCKSIZE; i++)
> > + v[i] -= (num * u[i]) / den;
>
> num /= den;
> for (i = 0; i < BLOCKSIZE; i++)
> v[i] -= num * u[i];
Fixed locally.
> [...]
> > +/**
> > + * Searches the adaptive codebook for the best entry and gain and removes its
> > + * contribution from input data
> > + *
> > + * @param adapt_cb array from which the adaptive codebook is extracted
> > + * @param work array used to calculate LPC-filtered vectors
> > + * @param coefs coefficients of the LPC filter
> > + * @param data input data
> > + * @return index of the best entry of the adaptive codebook
> > + */
> > +static int adaptive_cb_search(const int16_t *adapt_cb, float *work,
> > + const float *coefs, float *data)
> > +{
> > + int i, j, best_vect;
> > + float score, gain, best_score, best_gain;
> > + float exc[BLOCKSIZE];
> > +
> > + gain = best_score = 0;
> > + for (i = BLOCKSIZE / 2; i <= BUFFERSIZE; i++) {
> > + for (j = 0; j < FFMIN(BLOCKSIZE, i); j++)
> > + exc[j] = adapt_cb[BUFFERSIZE - i + j];
> > + if (i < BLOCKSIZE)
> > + for (j = 0; j < BLOCKSIZE - i; j++)
> > + exc[i + j] = adapt_cb[BUFFERSIZE - i + j];
> > + get_match_score(work, coefs, exc, NULL, NULL, data, &score, &gain);
> > + if (score > best_score) {
> > + best_score = score;
> > + best_vect = i;
> > + best_gain = gain;
> > + }
> > + }
> > + if (!best_score)
> > + return 0;
> > +
> > + /**
> > + * Re-calculate the filtered vector from the vector with maximum match score
> > + * and remove its contribution from input data.
> > + */
>
> > + for (j = 0; j < FFMIN(BLOCKSIZE, best_vect); j++)
> > + exc[j] = adapt_cb[BUFFERSIZE - best_vect + j];
> > + if (i < BLOCKSIZE)
> > + for (j = 0; j < BLOCKSIZE - best_vect; j++)
> > + exc[best_vect + j] = adapt_cb[BUFFERSIZE - best_vect + j];
>
> code duplication
Will fix it if you like the floating point approach.
> > + ff_celp_lp_synthesis_filterf(work, coefs, exc, BLOCKSIZE, LPC_ORDER);
>
>
>
> > + for (i = 0; i < BLOCKSIZE; i++)
> > + data[i] -= best_gain * work[i];
> > + return (best_vect - BLOCKSIZE / 2 + 1);
> > +}
> > +
> > +
>
> > +/**
> > + * Searches the two fixed codebooks for the best entry and gain
> > + *
> > + * @param work array used to calculate LPC-filtered vectors
> > + * @param coefs coefficients of the LPC filter
> > + * @param data input data
> > + * @param cba_idx index of the best entry of the adaptive codebook
> > + * @param cb1_idx pointer to variable where the index of the best entry of the
> > + * first fixed codebook is returned
> > + * @param cb2_idx pointer to variable where the index of the best entry of the
> > + * second fixed codebook is returned
> > + */
> > +static void fixed_cb_search(float *work, const float *coefs, float *data,
> > + int cba_idx, int *cb1_idx, int *cb2_idx)
> > +{
> > + int i, j, ortho_cb1;
> > + float score, gain, best_score, best_gain;
> > + float cba_vect[BLOCKSIZE], cb1_vect[BLOCKSIZE];
> > + float vect[BLOCKSIZE];
> > +
> > + /**
> > + * The filtered vector from the adaptive codebook can be retrieved from
> > + * work, because this function is called just after adaptive_cb_search().
> > + */
> > + if (cba_idx)
> > + memcpy(cba_vect, work, sizeof(cba_vect));
> > +
> > + *cb1_idx = gain = best_score = best_gain = 0;
> > + for (i = 0; i < FIXED_CB_SIZE; i++) {
> > + for (j = 0; j < BLOCKSIZE; j++)
> > + vect[j] = ff_cb1_vects[i][j];
> > + get_match_score(work, coefs, vect, cba_idx ? cba_vect: NULL, NULL, data,
> > + &score, &gain);
> > + if (score > best_score) {
> > + best_score = score;
> > + *cb1_idx = i;
> > + best_gain = gain;
> > + }
> > + }
> > +
> > + /**
> > + * Re-calculate the filtered vector from the vector with maximum match score
> > + * and remove its contribution from input data.
> > + */
> > + if (best_gain) {
> > + for (j = 0; j < BLOCKSIZE; j++)
> > + vect[j] = ff_cb1_vects[*cb1_idx][j];
> > + ff_celp_lp_synthesis_filterf(work, coefs, vect, BLOCKSIZE, LPC_ORDER);
> > + if (cba_idx)
> > + orthogonalize(work, cba_vect);
> > + for (i = 0; i < BLOCKSIZE; i++)
> > + data[i] -= best_gain * work[i];
> > + memcpy(cb1_vect, work, sizeof(cb1_vect));
> > + ortho_cb1 = 1;
> > + } else
> > + ortho_cb1 = 0;
> > +
>
> > + *cb2_idx = best_score = best_gain = 0;
> > + for (i = 0; i < FIXED_CB_SIZE; i++) {
> > + for (j = 0; j < BLOCKSIZE; j++)
> > + vect[j] = ff_cb2_vects[i][j];
> > + get_match_score(work, coefs, vect, cba_idx ? cba_vect : NULL,
> > + ortho_cb1 ? cb1_vect : NULL, data, &score, &gain);
> > + if (score > best_score) {
> > + best_score = score;
> > + *cb2_idx = i;
> > + best_gain = gain;
> > + }
> > + }
>
> duplicate
Ditto.
> > +}
> > +
> > +
> > +/**
> > + * Encode a subblock of the current frame
> > + *
> > + * @param ractx encoder context
> > + * @param sblock_data input data of the subblock
> > + * @param lpc_coefs coefficients of the LPC filter
> > + * @param rms RMS of the reflection coefficients
> > + * @param pb pointer to PutBitContext of the current frame
> > + */
> > +static void ra144_encode_subblock(RA144Context *ractx,
> > + const int16_t *sblock_data,
> > + const int16_t *lpc_coefs, unsigned int rms,
> > + PutBitContext *pb)
> > +{
> > +#define NUM_BEST_GAINS 5
> > +
> > + float data[BLOCKSIZE], work[LPC_ORDER + BLOCKSIZE];
> > + float coefs[LPC_ORDER];
> > + float zero[BLOCKSIZE], cba[BLOCKSIZE], cb1[BLOCKSIZE], cb2[BLOCKSIZE];
> > + int16_t cba_vect[BLOCKSIZE], exc[BLOCKSIZE];
> > + int cba_idx, cb1_idx, cb2_idx, gain;
> > + int i, n, m[3];
> > + float g[3];
> > + float error, best_errors[NUM_BEST_GAINS];
> > + int indexes[NUM_BEST_GAINS];
> > +
> > + for (i = 0; i < LPC_ORDER; i++) {
> > + work[i] = ractx->curr_sblock[BLOCKSIZE + i];
> > + coefs[i] = lpc_coefs[i] / 4096.0;
>
> * (1/4096.0);
> multiplies are faster than divides
Fixed locally.
> > + }
> > +
> > + /**
> > + * Calculate the zero-input response of the LPC filter and subtract it from
> > + * input data.
> > + */
> > + memset(data, 0, sizeof(data));
> > + ff_celp_lp_synthesis_filterf(work + LPC_ORDER, coefs, data, BLOCKSIZE,
> > + LPC_ORDER);
> > + for (i = 0; i < BLOCKSIZE; i++) {
> > + zero[i] = work[LPC_ORDER + i];
> > + data[i] = sblock_data[i] - zero[i];
> > + }
> > +
> > + /**
> > + * Codebook search is performed without taking into account the contribution
> > + * of the previous subblock, since it has been just subtracted from input
> > + * data.
> > + */
> > + memset(work, 0, LPC_ORDER * sizeof(*work));
> > +
> > + cba_idx = adaptive_cb_search(ractx->adapt_cb, work + LPC_ORDER, coefs,
> > + data);
> > + if (cba_idx) {
> > + /**
> > + * The filtered vector from the adaptive codebook can be retrieved from
> > + * work, see implementation of adaptive_cb_search().
> > + */
> > + memcpy(cba, work + LPC_ORDER, sizeof(cba));
> > +
> > + ff_copy_and_dup(cba_vect, ractx->adapt_cb, cba_idx + BLOCKSIZE / 2 - 1);
> > + m[0] = (ff_irms(cba_vect) * rms) >> 12;
> > + }
> > + fixed_cb_search(work + LPC_ORDER, coefs, data, cba_idx, &cb1_idx, &cb2_idx);
> > + for (i = 0; i < BLOCKSIZE; i++) {
> > + cb1[i] = ff_cb1_vects[cb1_idx][i];
> > + cb2[i] = ff_cb2_vects[cb2_idx][i];
> > + }
> > + ff_celp_lp_synthesis_filterf(work + LPC_ORDER, coefs, cb1, BLOCKSIZE,
> > + LPC_ORDER);
> > + memcpy(cb1, work + LPC_ORDER, sizeof(cb1));
> > + m[1] = (ff_cb1_base[cb1_idx] * rms) >> 8;
> > + ff_celp_lp_synthesis_filterf(work + LPC_ORDER, coefs, cb2, BLOCKSIZE,
> > + LPC_ORDER);
> > + memcpy(cb2, work + LPC_ORDER, sizeof(cb2));
> > + m[2] = (ff_cb2_base[cb2_idx] * rms) >> 8;
> > +
> > + /**
> > + * Gain quantization is performed taking the NUM_BEST_GAINS best entries
> > + * obtained from floating point data and calculating for each entry the
> > + * actual encoding error with fixed point data.
> > + */
> > + for (i = 0; i < NUM_BEST_GAINS; i++) {
> > + best_errors[i] = FLT_MAX;
> > + indexes[i] = -1;
> > + }
> > + for (n = 0; n < 256; n++) {
> > + g[1] = ((ff_gain_val_tab[n][1] * m[1]) >> ff_gain_exp_tab[n]) / 4096.0;
> > + g[2] = ((ff_gain_val_tab[n][2] * m[2]) >> ff_gain_exp_tab[n]) / 4096.0;
> > + error = 0;
> > + if (cba_idx) {
> > + g[0] = ((ff_gain_val_tab[n][0] * m[0]) >> ff_gain_exp_tab[n]) /
> > + 4096.0;
> > + for (i = 0; i < BLOCKSIZE; i++) {
> > + data[i] = zero[i] + g[0] * cba[i] + g[1] * cb1[i] +
> > + g[2] * cb2[i];
> > + error += (data[i] - sblock_data[i]) *
> > + (data[i] - sblock_data[i]);
> > + }
> > + } else {
> > + for (i = 0; i < BLOCKSIZE; i++) {
> > + data[i] = zero[i] + g[1] * cb1[i] + g[2] * cb2[i];
> > + error += (data[i] - sblock_data[i]) *
> > + (data[i] - sblock_data[i]);
> > + }
> > + }
>
> > + for (i = 0; i < NUM_BEST_GAINS; i++)
> > + if (error < best_errors[i]) {
> > + best_errors[i] = error;
> > + indexes[i] = n;
> > + break;
> > + }
>
> this does not keep the 5 best
> it only gurantees to keep the 1 best
Why? Perhaps you missed the break statement?
> you are testing your changes in terms of PSNR, arent you?
> if not, we need to go back to the last patch and test each change
> individually.
> I very much prefer naive and slow code compared to optimized but
> untested and thus buggy code. we alraedy have a vorbis and aac encoder
> </rant>
I did test each individual change by measuring the resulting average
encoding error. Now I have re-tested them with tiny_psnr. Here are the
results with 7 different samples.
Fixed point, without orthogonalization, with brute force gain
quantization
stddev: 849.73 PSNR: 37.74 bytes: 200320/ 200334
stddev: 983.24 PSNR: 36.48 bytes: 144000/ 144014
stddev: 835.19 PSNR: 37.89 bytes: 745280/ 745294
stddev: 3737.95 PSNR: 24.88 bytes: 5370880/ 5370880
stddev: 2605.75 PSNR: 28.01 bytes: 814400/ 814400
stddev: 3634.44 PSNR: 25.12 bytes: 432640/ 432640
stddev: 2853.26 PSNR: 27.22 bytes: 1741440/ 1741440
Floating point, without orthogonalization, with gain quantization done
the fast way
stddev: 940.92 PSNR: 36.86 bytes: 200320/ 200334
stddev: 1010.57 PSNR: 36.24 bytes: 144000/ 144014
stddev: 904.31 PSNR: 37.20 bytes: 745280/ 745294
stddev: 3753.33 PSNR: 24.84 bytes: 5370880/ 5370880
stddev: 2612.23 PSNR: 27.99 bytes: 814400/ 814400
stddev: 3638.47 PSNR: 25.11 bytes: 432640/ 432640
stddev: 2855.30 PSNR: 27.22 bytes: 1741440/ 1741440
Floating point, with orthogonalization, with gain quantization done the
fast way
stddev: 818.14 PSNR: 38.07 bytes: 200320/ 200334
stddev: 986.48 PSNR: 36.45 bytes: 144000/ 144014
stddev: 811.68 PSNR: 38.14 bytes: 745280/ 745294
stddev: 3762.86 PSNR: 24.82 bytes: 5370880/ 5370880
stddev: 2635.10 PSNR: 27.91 bytes: 814400/ 814400
stddev: 3647.02 PSNR: 25.09 bytes: 432640/ 432640
stddev: 2862.79 PSNR: 27.19 bytes: 1741440/ 1741440
Floating point, without orthogonalization, with gain quantization done
taking into account the rounding error of the 5 best entries
stddev: 902.10 PSNR: 37.22 bytes: 200320/ 200334
stddev: 988.24 PSNR: 36.43 bytes: 144000/ 144014
stddev: 854.40 PSNR: 37.70 bytes: 745280/ 745294
stddev: 3742.23 PSNR: 24.87 bytes: 5370880/ 5370880
stddev: 2611.16 PSNR: 27.99 bytes: 814400/ 814400
stddev: 3638.93 PSNR: 25.11 bytes: 432640/ 432640
stddev: 2855.00 PSNR: 27.22 bytes: 1741440/ 1741440
Floating point, with orthogonalization, with gain quantization done
taking into account the rounding error of the 5 best entries
stddev: 784.07 PSNR: 38.44 bytes: 200320/ 200334
stddev: 975.42 PSNR: 36.55 bytes: 144000/ 144014
stddev: 788.15 PSNR: 38.40 bytes: 745280/ 745294
stddev: 3757.66 PSNR: 24.83 bytes: 5370880/ 5370880
stddev: 2632.70 PSNR: 27.92 bytes: 814400/ 814400
stddev: 3652.46 PSNR: 25.08 bytes: 432640/ 432640
stddev: 2863.04 PSNR: 27.19 bytes: 1741440/ 1741440
Which approach do you prefer?
More information about the ffmpeg-devel
mailing list