[FFmpeg-devel] [PATCH] RealAudio 14.4K encoder
Michael Niedermayer
michaelni
Sun May 23 00:51:39 CEST 2010
On Sat, May 22, 2010 at 07:33:13PM +0200, Francesco Lavra wrote:
> On Sat, 2010-05-22 at 16:00 +0200, Michael Niedermayer wrote:
> > On Sat, May 22, 2010 at 03:18:45PM +0200, Francesco Lavra wrote:
> > > > > + }
> > > > > +
> > > > > + /**
> > > > > + * Calculate the zero-input response of the LPC filter and subtract it from
> > > > > + * input data.
> > > > > + */
> > > > > + memset(data, 0, sizeof(data));
> > > > > + ff_celp_lp_synthesis_filterf(work + LPC_ORDER, coefs, data, BLOCKSIZE,
> > > > > + LPC_ORDER);
> > > > > + for (i = 0; i < BLOCKSIZE; i++) {
> > > > > + zero[i] = work[LPC_ORDER + i];
> > > > > + data[i] = sblock_data[i] - zero[i];
> > > > > + }
> > > > > +
> > > > > + /**
> > > > > + * Codebook search is performed without taking into account the contribution
> > > > > + * of the previous subblock, since it has been just subtracted from input
> > > > > + * data.
> > > > > + */
> > > > > + memset(work, 0, LPC_ORDER * sizeof(*work));
> > > > > +
> > > > > + cba_idx = adaptive_cb_search(ractx->adapt_cb, work + LPC_ORDER, coefs,
> > > > > + data);
> > > > > + if (cba_idx) {
> > > > > + /**
> > > > > + * The filtered vector from the adaptive codebook can be retrieved from
> > > > > + * work, see implementation of adaptive_cb_search().
> > > > > + */
> > > > > + memcpy(cba, work + LPC_ORDER, sizeof(cba));
> > > > > +
> > > > > + ff_copy_and_dup(cba_vect, ractx->adapt_cb, cba_idx + BLOCKSIZE / 2 - 1);
> > > > > + m[0] = (ff_irms(cba_vect) * rms) >> 12;
> > > > > + }
> > > > > + fixed_cb_search(work + LPC_ORDER, coefs, data, cba_idx, &cb1_idx, &cb2_idx);
> > > > > + for (i = 0; i < BLOCKSIZE; i++) {
> > > > > + cb1[i] = ff_cb1_vects[cb1_idx][i];
> > > > > + cb2[i] = ff_cb2_vects[cb2_idx][i];
> > > > > + }
> > > > > + ff_celp_lp_synthesis_filterf(work + LPC_ORDER, coefs, cb1, BLOCKSIZE,
> > > > > + LPC_ORDER);
> > > > > + memcpy(cb1, work + LPC_ORDER, sizeof(cb1));
> > > > > + m[1] = (ff_cb1_base[cb1_idx] * rms) >> 8;
> > > > > + ff_celp_lp_synthesis_filterf(work + LPC_ORDER, coefs, cb2, BLOCKSIZE,
> > > > > + LPC_ORDER);
> > > > > + memcpy(cb2, work + LPC_ORDER, sizeof(cb2));
> > > > > + m[2] = (ff_cb2_base[cb2_idx] * rms) >> 8;
> > > > > +
> > > > > + /**
> > > > > + * Gain quantization is performed taking the NUM_BEST_GAINS best entries
> > > > > + * obtained from floating point data and calculating for each entry the
> > > > > + * actual encoding error with fixed point data.
> > > > > + */
> > > > > + for (i = 0; i < NUM_BEST_GAINS; i++) {
> > > > > + best_errors[i] = FLT_MAX;
> > > > > + indexes[i] = -1;
> > > > > + }
> > > > > + for (n = 0; n < 256; n++) {
> > > > > + g[1] = ((ff_gain_val_tab[n][1] * m[1]) >> ff_gain_exp_tab[n]) / 4096.0;
> > > > > + g[2] = ((ff_gain_val_tab[n][2] * m[2]) >> ff_gain_exp_tab[n]) / 4096.0;
> > > > > + error = 0;
> > > > > + if (cba_idx) {
> > > > > + g[0] = ((ff_gain_val_tab[n][0] * m[0]) >> ff_gain_exp_tab[n]) /
> > > > > + 4096.0;
> > > > > + for (i = 0; i < BLOCKSIZE; i++) {
> > > > > + data[i] = zero[i] + g[0] * cba[i] + g[1] * cb1[i] +
> > > > > + g[2] * cb2[i];
> > > > > + error += (data[i] - sblock_data[i]) *
> > > > > + (data[i] - sblock_data[i]);
> > > > > + }
> > > > > + } else {
> > > > > + for (i = 0; i < BLOCKSIZE; i++) {
> > > > > + data[i] = zero[i] + g[1] * cb1[i] + g[2] * cb2[i];
> > > > > + error += (data[i] - sblock_data[i]) *
> > > > > + (data[i] - sblock_data[i]);
> > > > > + }
> > > > > + }
> > > >
> > > > > + for (i = 0; i < NUM_BEST_GAINS; i++)
> > > > > + if (error < best_errors[i]) {
> > > > > + best_errors[i] = error;
> > > > > + indexes[i] = n;
> > > > > + break;
> > > > > + }
> > > >
> > > > this does not keep the 5 best
> > > > it only gurantees to keep the 1 best
> > >
> > > Why? Perhaps you missed the break statement?
> >
> > if we feed the values 9,8,7,6,5,4,3,2,1 in then
> > the list will just contain 1 afterwards
>
> Ok, now fixed as follows (j is initialized to 0 outside the main loop):
>
> if (error >= best_errors[j])
> continue;
> best_errors[j] = error;
> indexes[j] = n;
> for (i = 0; i < NUM_BEST_GAINS; i++)
> if (best_errors[i] > best_errors[j])
> j = i;
>
> > > > you are testing your changes in terms of PSNR, arent you?
> > > > if not, we need to go back to the last patch and test each change
> > > > individually.
> > > > I very much prefer naive and slow code compared to optimized but
> > > > untested and thus buggy code. we alraedy have a vorbis and aac encoder
> > > > </rant>
> > >
> > > I did test each individual change by measuring the resulting average
> > > encoding error. Now I have re-tested them with tiny_psnr. Here are the
> > > results with 7 different samples.
> > >
> > > Fixed point, without orthogonalization, with brute force gain
> > > quantization
> > > stddev: 849.73 PSNR: 37.74 bytes: 200320/ 200334
> > > stddev: 983.24 PSNR: 36.48 bytes: 144000/ 144014
> > > stddev: 835.19 PSNR: 37.89 bytes: 745280/ 745294
> > > stddev: 3737.95 PSNR: 24.88 bytes: 5370880/ 5370880
> > > stddev: 2605.75 PSNR: 28.01 bytes: 814400/ 814400
> > > stddev: 3634.44 PSNR: 25.12 bytes: 432640/ 432640
> > > stddev: 2853.26 PSNR: 27.22 bytes: 1741440/ 1741440
> > >
> > > Floating point, without orthogonalization, with gain quantization done
> > > the fast way
> > > stddev: 940.92 PSNR: 36.86 bytes: 200320/ 200334
> > > stddev: 1010.57 PSNR: 36.24 bytes: 144000/ 144014
> > > stddev: 904.31 PSNR: 37.20 bytes: 745280/ 745294
> > > stddev: 3753.33 PSNR: 24.84 bytes: 5370880/ 5370880
> > > stddev: 2612.23 PSNR: 27.99 bytes: 814400/ 814400
> > > stddev: 3638.47 PSNR: 25.11 bytes: 432640/ 432640
> > > stddev: 2855.30 PSNR: 27.22 bytes: 1741440/ 1741440
> >
> > you change 2 things relative to the previous test, this makes it
> > hard to be certain which change causes the quality loss
>
> Tested the intermediate step too, from the results below you can see
> that quality loss is due to the fast gain quantization.
>
> > > Floating point, with orthogonalization, with gain quantization done the
> > > fast way
> > > stddev: 818.14 PSNR: 38.07 bytes: 200320/ 200334
> > > stddev: 986.48 PSNR: 36.45 bytes: 144000/ 144014
> > > stddev: 811.68 PSNR: 38.14 bytes: 745280/ 745294
> > > stddev: 3762.86 PSNR: 24.82 bytes: 5370880/ 5370880
> > > stddev: 2635.10 PSNR: 27.91 bytes: 814400/ 814400
> > > stddev: 3647.02 PSNR: 25.09 bytes: 432640/ 432640
> > > stddev: 2862.79 PSNR: 27.19 bytes: 1741440/ 1741440
> >
> > some files loose quality by enabling orthogonalization, thats odd but
> > possible.
> > assuming there is no bug in the orthogonalization then you could try to
> > run the quantization with both codebooks found with and without
> > orthogonalization, this should always be better. And or avoid codebook
> > choices that would need quantization factors that are far away from
> > available values
>
> The first 3 files are uncompressed recordings, while the last 4 files
> are RealAudio decoded samples, so statistics for the latter probably are
> not that meaningful.
> If you are wondering why PSNR values are so low for the last 4 files
> (ideally, they should approach infinity), the problem is that I couldn't
> come up with an exact method of calculating the frame energy (assuming
> one exists, because from the current decoder output I'm not sure we can
> reconstruct the encoded stream exactly as it was), so having an energy
> value different form what it ought to be influences negatively the
> codebook searches.
how far away is the correct value from what you choose?
(if its just +-1 maybe bruteforce search might be an option)
>
> Below are the latest results (after fixing the algorithm to find the 5
> best entries):
>
> Fixed point, without orthogonalization, with brute force gain
> quantization
> stddev: 849.73 PSNR: 37.74 bytes: 200320/ 200334
> stddev: 983.24 PSNR: 36.48 bytes: 144000/ 144014
> stddev: 835.19 PSNR: 37.89 bytes: 745280/ 745294
> stddev: 3737.95 PSNR: 24.88 bytes: 5370880/ 5370880
> stddev: 2605.75 PSNR: 28.01 bytes: 814400/ 814400
> stddev: 3634.44 PSNR: 25.12 bytes: 432640/ 432640
> stddev: 2853.26 PSNR: 27.22 bytes: 1741440/ 1741440
>
> Floating point, without orthogonalization, with brute force gain
> quantization
> stddev: 821.68 PSNR: 38.04 bytes: 200320/ 200334
> stddev: 979.00 PSNR: 36.51 bytes: 144000/ 144014
> stddev: 846.42 PSNR: 37.78 bytes: 745280/ 745294
> stddev: 3735.23 PSNR: 24.88 bytes: 5370880/ 5370880
> stddev: 2620.22 PSNR: 27.96 bytes: 814400/ 814400
> stddev: 3625.96 PSNR: 25.14 bytes: 432640/ 432640
> stddev: 2850.20 PSNR: 27.23 bytes: 1741440/ 1741440
>
> Floating point, without orthogonalization, with gain quantization done
> the fast way
> stddev: 940.92 PSNR: 36.86 bytes: 200320/ 200334
> stddev: 1010.57 PSNR: 36.24 bytes: 144000/ 144014
> stddev: 904.31 PSNR: 37.20 bytes: 745280/ 745294
> stddev: 3753.33 PSNR: 24.84 bytes: 5370880/ 5370880
> stddev: 2612.23 PSNR: 27.99 bytes: 814400/ 814400
> stddev: 3638.47 PSNR: 25.11 bytes: 432640/ 432640
> stddev: 2855.30 PSNR: 27.22 bytes: 1741440/ 1741440
>
> Floating point, without orthogonalization, with gain quantization done
> taking into account the rounding error of the 5 best entries
> stddev: 869.60 PSNR: 37.54 bytes: 200320/ 200334
> stddev: 992.83 PSNR: 36.39 bytes: 144000/ 144014
> stddev: 853.24 PSNR: 37.71 bytes: 745280/ 745294
> stddev: 3738.97 PSNR: 24.87 bytes: 5370880/ 5370880
> stddev: 2620.56 PSNR: 27.96 bytes: 814400/ 814400
> stddev: 3634.24 PSNR: 25.12 bytes: 432640/ 432640
> stddev: 2851.40 PSNR: 27.23 bytes: 1741440/ 1741440
>
> Floating point, with orthogonalization, with brute force gain
> quantization
> stddev: 768.34 PSNR: 38.62 bytes: 200320/ 200334
> stddev: 971.39 PSNR: 36.58 bytes: 144000/ 144014
> stddev: 778.60 PSNR: 38.50 bytes: 745280/ 745294
> stddev: 3753.48 PSNR: 24.84 bytes: 5370880/ 5370880
> stddev: 2622.78 PSNR: 27.95 bytes: 814400/ 814400
> stddev: 3645.04 PSNR: 25.10 bytes: 432640/ 432640
> stddev: 2861.43 PSNR: 27.20 bytes: 1741440/ 1741440
>
> Floating point, with orthogonalization, with gain quantization done the
> fast way
> stddev: 818.14 PSNR: 38.07 bytes: 200320/ 200334
> stddev: 986.48 PSNR: 36.45 bytes: 144000/ 144014
> stddev: 811.68 PSNR: 38.14 bytes: 745280/ 745294
> stddev: 3762.86 PSNR: 24.82 bytes: 5370880/ 5370880
> stddev: 2635.10 PSNR: 27.91 bytes: 814400/ 814400
> stddev: 3647.02 PSNR: 25.09 bytes: 432640/ 432640
> stddev: 2862.79 PSNR: 27.19 bytes: 1741440/ 1741440
>
> Floating point, with orthogonalization, with gain quantization done
> taking into account the rounding error of the 5 best entries
> stddev: 782.21 PSNR: 38.46 bytes: 200320/ 200334
> stddev: 975.64 PSNR: 36.54 bytes: 144000/ 144014
> stddev: 785.38 PSNR: 38.43 bytes: 745280/ 745294
> stddev: 3753.60 PSNR: 24.84 bytes: 5370880/ 5370880
> stddev: 2631.43 PSNR: 27.93 bytes: 814400/ 814400
> stddev: 3652.04 PSNR: 25.08 bytes: 432640/ 432640
> stddev: 2862.17 PSNR: 27.20 bytes: 1741440/ 1741440
>
> Disregarding the last 4 files, you can see that orthogonalization always
> leads to better performance.
> What do you suggest now?
orthogonalization is a win and should be done of course.
the 5 entry quantization needs work, there should be no quality
loss. What about 10 or 20 entries?
finding these 20 entries is possible very fast (quick sort
might be fastest here, note that with quicksort often only
one side needs to be sorted, that is if the "left" contains
more than 20 entries we know the right doesnt matter and doesnt
need to be sorted) it could also be done by radix sort or your
variant could be kept if its fast enough
beyond that iam pretty sure quality can be improved a bit
the problem is quite simple, its just a combinaton of 3 vectors of
3 "codebooks"
q[d][0]*cb0[a] + q[d][1]*cb1[b] + q[d][2]*cb2[c]
thats then run through a (linear) filter
and we need the signal thats closest (PSNR or psychoacoustic) to the
input
the variables a,b,c,d together are not that far away from bruteforceable
(29bit)
except that theres just the energy and the lpc coeffs left for the filter
that are stored.
i cant belive that this 1 pass naive parameter selection is th best that
can be done.
loren? jason? i know this codec is less than irrelevant but maybe one of
you 2 (or someone else) has an idea. This problem is somehow interresting
because of its simplicity and this may be useful for other audio encoders
too
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
The misfortune of the wise is better than the prosperity of the fool.
-- Epicurus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100523/6502d4fe/attachment.pgp>
More information about the ffmpeg-devel
mailing list