[FFmpeg-devel] [PATCH] TwinVQ decoder

Sun Aug 2 16:30:14 CEST 2009

On Sun, Aug 02, 2009 at 03:18:02PM +0200, Vitor Sessak wrote:
> static void memset_float(float *buf, float val, int size)

Document that it is not performance relevant and thus not optimized at all.

> static void add_vec(int cont, const float *buf1, const float *buf2, float *buf3)

Same.

Maybe
>     const float *cos_tab = ff_cos_tabs[av_log2(size_s)-1];

cos_tab += 2;
>     for (i=0; i < size_s/2; i++) {
>         lpc[i]          = eval_lpc_spectrum(cos_vals,  cos_tab[4*i + 2],
>                                             mtab->n_lsp);
>         lpc[size_s-i-1] = eval_lpc_spectrum(cos_vals, -cos_tab[4*i + 2],
>                                             mtab->n_lsp);

*cos_tab and -*cos_tab
cos_tab += 4;

Or at least a tmp variable for the value of cos_tab[4*i + 2].
IMO it hurts readability a lot when the reader has to find out
by themselves if the terms are identical (the compiler probably
won't care).

>                               get_cos(i*4+2, part, cos_tab, 2*size),

I guess the answer is "no", but since cos_tab is usually accessed
with 4*i+2 as index I wonder if it would make sense to rearrange it
so you can use e.g. compact_cos_tab[i] directly...

>     for (i=step; i <= size - 2*step; i += step) {

Missing spaces.

>         if ((out[i + step] + out[i - step] >  1.95*out[i]) ||
>             (out[i - step]                 <=  out[i + step])) {

Useless ().
And how about

>         if (out[i + step] + out[i - step] >  1.95*out[i] ||
>             out[i + step]                 >=  out[i - step]) {

>             interpolate(out + i-step + 1, out[i], out[i-step], step - 1);
>             interpolate(out+i-step  +1, out[i-step/2], out[i-step  ], step/2-1);
>             interpolate(out+i-step/2+1, out[i       ], out[i-step/2], step/2-1);
>     interpolate(out +size - 2*step+1, out[size-step], out[size-2*step], step-1);

You don't place spaces particularly consistently...

> /**
>  * Rearrange the LSP coefficients so that they have a minimum distance of
>  * min_dist. This function does it exactly as described in section of 3.2.4
>  * of the G.729 specification (but interestingly is different from what the
>  * reference decoder actually does).
>  */
> static void rearrange_lsp(int order, float *lsp, float min_dist)
> {
>     int i;
>     for (i=1; i<order; i++)
>         if (lsp[i] - lsp[i-1] < min_dist) {
>             float new_prev = (lsp[i] + lsp[i-1] - min_dist) * 0.5;
>             float new_cur  = (lsp[i] + lsp[i-1] + min_dist) * 0.5;

Uh, that is not the same as the old code... Which one is right now?
This formula could be simplified to
float mindist2 = mindist * 0.5;
float avg = (lsp[i] + lsp[i-1]) * 0.5;
lsp[i-1] = avg - mindist2;
lsp[i  ] = avg + mindist2;
That wasn't possible with the formula you had before, since the new value
of lsp[i] depended on the new value of lsp[i-1];

>     /* sort lsp in ascending order. float bubble agorithm,
>        O(n) if data already sorted, O(n^2) - otherwise */
>     for (i=0; i<lp_order-1; i++)
>         for (j=i; j>=0 && lsp[j] > lsp[j+1]; j--)

Inconsistent spaces.

>     const float *cb  =  mtab->lspcodebook;
>     const float *cb2 =  mtab->lspcodebook + (1 << mtab->lsp_bit1)*mtab->n_lsp;
>     const float *cb3 =  mtab->lspcodebook + (1 << mtab->lsp_bit1)*mtab->n_lsp +
>                                             (1 << mtab->lsp_bit2)*mtab->n_lsp;

Huh?
cb2 = cb  + (1 << mtab->lsp_bit1)*mtab->n_lsp;
cb3 = cb2 + (1 << mtab->lsp_bit2)*mtab->n_lsp;

Seems nicer/simpler...

>     case 5:
>         wsize = size_m;
>         next_wsize = mtab->size;
>       break;

Indentation?

>     sizes[0] = (block_size-wsize)/2;
>     sizes[1] = wsize;
>     sizes[2] = (block_size-wsize)/2;

sizes[2] = sizes[0]; ?

I think that sizes is not really a good name anyway and making
it an array has no real advantage.

>         for (j=0; j < mtab->fmode[FT_MEDIUM].sub; j++) {
>             if (wtype == 4 && !j) {
>                 sub_wtype = 4;
>             } else if (wtype == 7 && j == mtab->fmode[FT_MEDIUM].sub-1) {
>                 sub_wtype = 7;
>             } else
>                 sub_wtype = 8;

Hm. I'd try to at least make it easier to optimize by writing it as e.g.
sub_wtype = 8;
if (!j || j == mtab->fmode[FT_MEDIUM].sub-1) {
   if (!j && wtype == 4) sub_wtype = 4;
   if ( j && wtype == 7) sub_wtype = 7;
}

Or something like that...
Not sure though.

>     for (i=0; i < fw_cb_len; i++)
>         for (j=0; j < bark_n_coef; j++) {
>             int idx = j + i*bark_n_coef;

>     int idx = 0;
>     for (i=0; i < fw_cb_len; i++)
>         for (j=0; j < bark_n_coef; j++, idx++) {

is the same and might avoid one multiply per iteration if the compiler is stupid?

>             float st = (use_hist) ?

Useless ()

>         for (j=0; j < mtab->fmode[ftype].sub; j++)
>             tctx->dsp.vector_fmul(chunk + j*block_size, tctx->tmp_buf,
>                                   block_size);

Could do
chunk += block_size;
Not sure if it is really better, it splits the complexity a bit across
two code lines...

>     init_get_bits(&gb, buf, buf_size * 8);

I haven't checked, but are you regularly checking that you don't actually
overread?

>         if ((num_blocks == 1) ||
>             (ftype == FT_LONG && num_vect % num_blocks) ||
>             (ftype != FT_LONG && num_vect & 1         ) ||
>             (i == line_len[1])) {

Quite a few () too many...