[FFmpeg-devel] [PATCH] RealAudio 14.4K encoder
Michael Niedermayer
michaelni
Mon May 10 12:16:06 CEST 2010
On Sat, May 08, 2010 at 08:57:39PM +0200, Francesco Lavra wrote:
> On Wed, 2010-05-05 at 18:26 +0200, Michael Niedermayer wrote:
> > On Sun, May 02, 2010 at 08:06:47PM +0200, Francesco Lavra wrote:
> > > Hi,
> > [...]
> > > + if (avctx->frame_size > 0) {
> > > + if (avctx->frame_size != NBLOCKS * BLOCKSIZE) {
> > > + av_log(avctx, AV_LOG_ERROR, "invalid block size: %d\n",
> > > + avctx->frame_size);
> > > + return -1;
> > > + }
> > > + } else {
> > > + avctx->frame_size = NBLOCKS * BLOCKSIZE;
> > > + }
> >
> > is this complexity needed?
>
> I took that piece of code from flacenc.c, but it seems I chose the worst
> file to copy from, since that check on avctx->frame_size is not done
> anywhere else. Removed.
>
[...]
> > > +/**
> > > + * Calculates match score and gain of an LPC-filtered vector with respect to
> > > + input data
> > > + * @param block array used to calculate the filtered vector
> > > + * @param coefs coefficients of the LPC filter
> > > + * @param vect original vector
> > > + * @param data input data
> > > + * @param score pointer to variable where match score is returned
> > > + * @param gain pointer to variable where gain is returned
> > > + */
> > > +static void get_match_score(int16_t *block, const int16_t *coefs,
> > > + const int16_t *vect, const int16_t *data,
> > > + float *score, int *gain)
> > > +{
> > > + float c, g;
> > > + int i;
> > > +
> > > + if (ff_celp_lp_synthesis_filter(block, coefs, vect, BLOCKSIZE, LPC_ORDER, 1,
> > > + 0x800)) {
> > > + *score = 0;
> > > + return;
> > > + }
> > > + c = g = 0;
> > > + for (i = 0; i < BLOCKSIZE; i++) {
> > > + g += block[i] * block[i];
> > > + c += data[i] * block[i];
> > > + }
> > > + if (!g || (c <= 0)) {
> >
> > the !g check is redundant
>
> Why? If a codebook vector gets zeroed by the LPC filter, g will be zero,
> and we don't want the match score to be NaN.
g can only be 0 if all block[i] are 0
if all block[i] are 0 then c is 0
>
> > [...]
> > > +
> > > +
> > > +/**
> > > + * Performs gain quantization
> > > + * @param block array used to calculate filtered vectors
> > > + * @param lpc_coefs coefficients of the LPC filter
> > > + * @param cba_vect vector containing the best entry from the adaptive codebook,
> > > + * or NULL if the adaptive codebook is not used
> > > + * @param cb1_idx index of the best entry of the first fixed codebook
> > > + * @param cb2_idx index of the best entry of the second fixed codebook
> > > + * @param rms RMS of the reflection coefficients
> > > + * @param data input data
> > > + * @return index of the best entry of the gain table
> > > + */
> > > +static int quantize_gains(int16_t *block, const int16_t *lpc_coefs,
> > > + const int16_t *cba_vect, int cb1_idx, int cb2_idx,
> > > + unsigned int rms, const int16_t* data)
> > > +{
> > > + float distance, best_distance;
> > > + int i, n, index;
> > > + unsigned int m[3];
> > > + int16_t exc[BLOCKSIZE]; /**< excitation vector */
> > > +
> > > + if (cba_vect)
> > > + m[0] = (irms(cba_vect) * rms) >> 12;
> > > + m[1] = (cb1_base[cb1_idx] * rms) >> 8;
> > > + m[2] = (cb2_base[cb2_idx] * rms) >> 8;
> > > + best_distance = -1;
> >
> > FLOAT_MAX
>
> If you meant MAXFLOAT, fixed.
i meant FLT_MAX as in the C spec
>
> > > + for (n = 0; n < 256; n++) {
> > > + distance = 0;
> > > + add_wav(exc, n, (int)cba_vect, m, cba_vect, cb1_vects[cb1_idx],
> > > + cb2_vects[cb2_idx]);
> > > + if (ff_celp_lp_synthesis_filter(block, lpc_coefs, exc, BLOCKSIZE,
> > > + LPC_ORDER, 1, 0xfff))
> > > + continue;
> > > + for (i = 0; i < BLOCKSIZE; i++)
> > > + distance += (block[i] - data[i]) * (block[i] - data[i]);
> > > + if ((distance < best_distance) || (best_distance < 0)) {
> > > + best_distance = distance;
> > > + index = n;
> > > + }
> > > + }
> >
> > id guess this could be done faster than by brute force
>
> I can't think of any algorithm which avoids searching the entire table
> without risking to miss the optimal entry;
if you ignore rounding then add_wav does a weighted sum of 3 vectors
x*A + y*B + z*C
ff_celp_lp_synthesis_filter performs a linear transformation on the result
(that could be viewed as a matrix multiplication)
M*(x*A + y*B + z*C)
and it adds the past samples padded with zero and filtered similarly
M*(x*A + y*B + z*C) + P
this can also be written as: (its just the distributive law)
x*M*A + y*M*B + z*M*C + P
that is to do something like ff_celp_lp_synthesis_filter()
1. on the past samples padded with zeros
2. on all the 3 code book vectors
but note this ff_celp_lp_synthesis_filter() must not clip values, so maybe
the float counterpart (ff_celp_lp_synthesis_filterf) would be easiest here
after that you only have to find the best x,y,z from the table
that minimize that (this can be done fast as well but lets look into this
once this optimization suceedded (or failed))
Also once you found the best x,y,z with unclipped floats, it makes
sense to run something like the current brute force loop on all
entries that scored well in the optimized case. So we do not skip
considering rounding
> however, I implemented an
> empirical method which reduces significantly the encoding time without
> audible quality degradation.
>
> >
> >
> > [...]
> > > + /**
> > > + * TODO: orthogonalize the best entry of the adaptive codebook with the
> > > + * basis vectors of the first fixed codebook, and the best entry of the
> > > + * first fixed codebook with the basis vectors of the second fixed codebook.
> > > + */
> >
> > yes, also shouldnt the search be iterative instead of just one pass?
>
> I tried inserting several iteration runs to find the optimal entries of
> the fixed codebooks, but rarely the entries found on the second and
> subsequent iterations are different from the first chioces, and in any
> case I couldn't hear any improvement in quality, so the iterative method
> doesn't seem to bring any added value.
did you orthogonalize the entries?
anyway, id like to mention that my reveiw here targets improving the encoder
and i wont reject it if some of my suggestion is more than you are willing
to try. (though of course true bugs and such must be fixed but my
suggestions for optimizations and encoder quality improvements are more
of the "would be nice to have" category)
>
> Daniel's remark about ff_ prefix of non-static symbols has been
> addressed, so here is the updated patch series:
> This first patch refactors the current code of the RealAudio decoder
> such that ra144dec.c will contain code specific to the decoder, ra144.c
> will contain code which can be shared between decoder and encoder, and
> ra144.h will contain declarations for stuff in ra144.c; this patch must
> be preceded by:
> svn mv libavcodec/ra144.c libavcodec/ra144dec.c
> svn cp libavcodec/ra144.h libavcodec/ra144.c
> The second patch adds the ff_ prefix to all non-static symbols resulting
> from patch #1.
> The third patch contains cosmetic changes deriving from patch #2.
> The fourth patch adds the function ff_subblock_synthesis() to ra144.c
> and inserts a call to that function in the decoder code, such that more
> code will be shared between decoder and encoder.
patches 1-4 are ok
> The fifth patch adds the encoder.
i will review that seperately in a moment
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Those who are too smart to engage in politics are punished by being
governed by those who are dumber. -- Plato
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100510/2a2c7121/attachment.pgp>
More information about the ffmpeg-devel
mailing list