[FFmpeg-devel] Nellymoser encoder

Thu Aug 28 20:56:24 CEST 2008

Thursday 28 August 2008 14:43:23 Michael Niedermayer napisa?(a):
> On Thu, Aug 28, 2008 at 12:53:50PM +0200, Bartlomiej Wolowiec wrote:
> > Thursday 28 August 2008 00:11:20 Michael Niedermayer napisa?(a):
> > [...]
> >
> > > > +    DSPContext      dsp;
> > > > +    MDCTContext     mdct_ctx;
> > > > +    DECLARE_ALIGNED_16(float, mdct_out[NELLY_SAMPLES]);
> > > > +    DECLARE_ALIGNED_16(float, buf[2 * NELLY_SAMPLES]);     ///<
> > > > sample buffer +} NellyMoserEncodeContext;
> > > > +
> > > >
> > > > +static DECLARE_ALIGNED_16(float, sine_window[NELLY_SAMPLES]);
> > >
> > > duplicate of ff_sine_windows and sine_window form nellymoserdec
> >
> > not really, sine_window from nellymoserdec is just a half of it. I
> > haven't compared efficiency, but it seems to me that vector_fmul may be
> > quicker than overlap_and_window? from nellymoserdec using half of
> > sine_window. Or maybe I miss some details...
>
> you miss vector_fmul_reverse

yes, I missed it.

> > [...]
> >
> > > > +/**
> > > > + * Searching index in table with size table_size, where
> > > > + * |val-table[best_idx]| is minimal.
> > > > + * It assumes that table elements are in increasing order and uses
> > > > binary search. + */
> > > > +#define find_best_value(val, table, table_size, best_idx) \
> > > > +{ \
> > > > +    int first=0, last=table_size-1, mid; \
> > > > +    while(first<=last){ \
> > > > +        mid=(first+last)/2; \
> > > > +        if(val > table[mid]){ \
> > > > +            first = mid + 1; \
> > > > +        }else{ \
> > > > +            last = mid - 1; \
> > > > +        } \
> > > > +    } \
> > > > +    if(!first || (first!=table_size && table[first]-val <
> > > > val-table[last])) \ +        best_idx = first; \
> > > > +    else \
> > > > +        best_idx = last; \
> > > > +}
> > >
> > > This can be done faster with a look up table
> > > and a single right value vs. left value check
> >
> > Ok, I may do it for ff_nelly_init_table and ff_nelly_delta_table, but I
> > don't really now how to do it for float type
> > (ff_nelly_dequantization_table)
>
> idx= LUT[lrintf(val*A+B)]
> if(fabs(val - tab[idx]) > fabs(val - tab[idx+1])
>     idx++;
>
> with A and B being appropriate constants

I've done lookup tables.

> > > > +
> > > > +/**
> > > > + * Encodes NELLY_SAMPLES samples. It assumes, that samples contains
> > > > 3 * NELLY_BUF_LEN values + *  @param s               encoder context
> > > > + *  @param output          output buffer
> > > > + *  @param output_size     size of output buffer
> > > > + *  @param samples         input samples
> > > > + */
> > > > +static void encode_block(NellyMoserEncodeContext *s,
> > > > +                         unsigned char *output, int output_size,
> > > > float *samples) +{
> > > > +    PutBitContext pb;
> > > > +    int i, band, block, best_idx, power_idx = 0;
> > > > +    float power_val, power_candidate, coeff, coeff_sum;
> > > > +    int band_start, band_end;
> > > > +
> > > > +    apply_mdct(s, samples, s->mdct_out);
> > > > +    apply_mdct(s, samples + NELLY_BUF_LEN, s->mdct_out +
> > > > NELLY_BUF_LEN); +
> > > > +    init_put_bits(&pb, output, output_size * 8);
> > > > +
> > > > +    band_start = 0;
> > > > +    band_end = ff_nelly_band_sizes_table[0];
> > > > +    for (band = 0; band < NELLY_BANDS; band++) {
> > > > +        coeff_sum = 0;
> > > > +        for (i = band_start; i < band_end; i++) {
> > > >
> > > > +            for (block = 0; block < 2; block++) {
> > > > +                coeff = s->mdct_out[i + block * NELLY_BUF_LEN];
> > > > +                coeff_sum += coeff * coeff;
> > > > +            }
> > >
> > > id unroll that by hand to
> > > coeff_sum += s->mdct_out[i                ]*s->mdct_out[i              
> > >  ]; +s->mdct_out[i + NELLY_BUF_LEN]*s->mdct_out[i + NELLY_BUF_LEN];
> > >
> > > > +        }
> > > > +        power_candidate =
> > > > +            (log(FFMAX(64.0, coeff_sum /
> > > > (ff_nelly_band_sizes_table[band] << 1))) - +             log(64.0)) *
> > > > 1024.0 / M_LN2;
> > >
> > > log(FFMAX(1.0, coeff_sum / (ff_nelly_band_sizes_table[band] << 7))) *
> > > 1024.0 / M_LN2;
> > >
> > > also this is based on
> > > (sum(0..N) ABS(coeff)^2/N)^(1/2)
> > >
> > > it would be interresting to try
> > > C*(sum(0..N) ABS(coeff)^D/N)^(1/D) for different values of C and D
> > >
> > > maybe you could try
> > > C={0.9,1.0,1.1}
> > > D={1.9,2.0,2.1}
> > > at first and see if any improves distortion
> >
> > Hmm... How should I check distortion?
>
> listening is one option ...
>
> > I've listened to few recorgings and in
> > my opinion differences are insignificant - sometimes D=2.0 is better,
> > sometimes D=2.3... C!=1.0 in my opinion doesn't give better effects.
>
> have you tried different C for D=2.3 ?
> also what about larger D like 3.0 ?
> besides can you post a patch so other can (if they want) also experiment
> with this?

Enclosed patch contains C and D constants. 

> > > > +
> > > > +        if (band) {
> > > > +            power_candidate -= power_idx;
> > > > +            find_best_value(power_candidate, ff_nelly_delta_table,
> > > > 32, best_idx); +            put_bits(&pb, 5, best_idx);
> > > > +            power_idx += ff_nelly_delta_table[best_idx];
> > > > +        } else {
> > > > +            //base exponent
> > > > +            find_best_value(power_candidate, ff_nelly_init_table,
> > > > 64, best_idx); +            put_bits(&pb, 6, best_idx);
> > > > +            power_idx = ff_nelly_init_table[best_idx];
> > > > +        }
> > >
> > > I wish i knew how to optimally assign these values, sadly i do not.
> > > Suggestions would be welcome of course in case anyone has an idea on
> > > how to optimally select them, the tricky part is that these not only
> > > scale the signal, they also are the basis upon which the bits per band
> > > and thus encoding is selected.
> > >
> > > Still they could be made to closer match the "power_candidate" values
> > > from above using viterbi though arguably it would just be closer to a
> > > guess.
> > >
> > > An alternative may be to just retry the whole encode_block with
> > > slightly changed power_candidate values for each band and pick what end
> > > up with the least distortion (that is least difference to the input
> > > signal) This should be rather easy to try ...
> >
> > slightly change ? What do you exactly? mean?
>
> power_candidate += random

Ok, can I work on it when the rest of the code will be in svn? I think it will 
be easier.

> > And again a problem how to
> > measure distortion - common difference mdct won't give a good effect.
>
> Well, kostyas psy model if it where finished could be used ...
>
> > > > +
> > > > +        if (power_idx >= 0) {
> > > > +            power_val = pow_table[power_idx & 0x7FF] / (1 <<
> > > > (power_idx
> > > >
> > > > >> 11)); +        } else {
> > > >
> > > > +            power_val = -pow(2, -power_idx / 2048.0 - 3.0);
> > > > +        }
> > >
> > > power_idx can be <0 ?
> >
> > Yes. In this encoder code it can be, possibly in original decoder too.
>
> pow_table[power_idx & 0x7FF] / (1 << ((power_idx >> 11)+C));
>
> with appropriately changed pow_table may avoid it
>
> [...]

It's not so easy, because & doesn't work good for negative numbers - but just 
adding constants%0x800==0 helps :)

-- 
Bartlomiej Wolowiec
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nellymoser2.patch
Type: text/x-diff
Size: 12234 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080828/ccc9eb59/attachment.patch>