[FFmpeg-soc] [PATCH] AMR-WB Decoder
Marcelo Galvão Póvoa
marspeoplester at gmail.com
Tue Sep 7 02:05:39 CEST 2010
On 6 September 2010 11:31, Vitor Sessak <vitor1001 at gmail.com> wrote:
> On 09/06/2010 02:46 AM, Marcelo Galvăo Póvoa wrote:
>> Ok, fortunately I've found the bug!
>>
>
> [...]
>
>> So, here is how the decoder is now (I'm sending my commits to github also).
>>
>
>> +/** Get x bits in the index interval [lsb,lsb+len-1] inclusive */
>> +#define BIT_STR(x,lsb,len) (((x)>> (lsb))& ((1<< (len)) - 1))
>> +
>> +/** Get the bit at specified position */
>> +#define BIT_POS(x, p) (((x)>> (p))& 1)
>>
>
> If you move the #defines closer to where it is used, you improve the
> readability of your code.
>
Fixed
>> +
>> +/**
>> + * Decodes quantized ISF vectors using 36-bit indices (6K60 mode only)
>> + *
>> + * @param[in] ind Array of 5 indices
>> + * @param[out] isf_q Buffer for isf_q[LP_ORDER]
>> + *
>> + */
>> +static void decode_isf_indices_36b(uint16_t *ind, float *isf_q) {
>> + int i;
>> +
>> + for (i = 0; i< 9; i++) {
>> + isf_q[i] = dico1_isf[ind[0]][i] * (1.0f / (1<< 15));
>> + }
>> + for (i = 0; i< 7; i++) {
>> + isf_q[i + 9] = dico2_isf[ind[1]][i] * (1.0f / (1<< 15));
>> + }
>> + for (i = 0; i< 5; i++) {
>> + isf_q[i] += dico21_isf_36b[ind[2]][i] * (1.0f / (1<< 15));
>> + }
>> + for (i = 0; i< 4; i++) {
>> + isf_q[i + 5] += dico22_isf_36b[ind[3]][i] * (1.0f / (1<< 15));
>> + }
>> + for (i = 0; i< 7; i++) {
>> + isf_q[i + 9] += dico23_isf_36b[ind[4]][i] * (1.0f / (1<< 15));
>> + }
>> +}
>>
>
> I think this would be more readable if the (1.0f / (1 << 15)) rescaling
> would be done in isf_add_mean_and_past().
>
Hmm, it improves readability at this part but I think that way the
output for decode_isf_indices_36b() and the input for
isf_add_mean_and_past() would have a strange meaning, if you know what
I mean.
>> +
>> +/**
>> + * Find the pitch vector by interpolating the past excitation at the
>> + * pitch delay, which is obtained in this function
>> + *
>> + * @param[in,out] ctx The context
>> + * @param[in] amr_subframe Current subframe data
>> + * @param[in] subframe Current subframe index (0 to 3)
>> + */
>> +static void decode_pitch_vector(AMRWBContext *ctx,
>> + const AMRWBSubFrame *amr_subframe,
>> + const int subframe)
>> +{
>> + int pitch_lag_int, pitch_lag_frac;
>> + int i;
>> + float *exc = ctx->excitation;
>> + enum Mode mode = ctx->fr_cur_mode;
>> +
>> + if (mode<= MODE_8k85) {
>> + decode_pitch_lag_low(&pitch_lag_int,&pitch_lag_frac, amr_subframe->adap,
>> +&ctx->base_pitch_lag, subframe, mode);
>> + } else
>> + decode_pitch_lag_high(&pitch_lag_int,&pitch_lag_frac, amr_subframe->adap,
>> +&ctx->base_pitch_lag, subframe);
>> +
>> + ctx->pitch_lag_int = pitch_lag_int;
>>
>
>> + pitch_lag_int += (pitch_lag_frac< 0 ? -1 : 0) + (pitch_lag_frac ? 1 : 0);
>>
>
> pitch_lag_int += pitch_lag_frac ? FFSIGN(pitch_lag_frac) : 0;
>
I don't see if I understand this correctly, see my "truth table" below:
pitch_lag_frac | mine_RHS | your_RHS
<0 | 0 | -1
=0 | 0 | 0
>0 | 1 | 1
So, the simplified form would be this?
pitch_lag_int += pitch_lag_frac > 0 ? 1 : 0;
>
>> +/**
>> + * Reduce fixed vector sparseness by smoothing with one of three IR filters
>> + * Also known as "adaptive phase dispersion"
>> + *
>> + * @param[in] ctx The context
>> + * @param[in,out] fixed_vector Unfiltered fixed vector
>> + * @param[out] buf Space for modified vector if necessary
>> + *
>> + * @return The potentially overwritten filtered fixed vector address
>> + */
>> +static float *anti_sparseness(AMRWBContext *ctx,
>> + float *fixed_vector, float *buf)
>> +{
>> + int ir_filter_nr;
>> +
>> + if (ctx->fr_cur_mode> MODE_8k85) // no filtering in higher modes
>> + return fixed_vector;
>> +
>> + if (ctx->pitch_gain[0]< 0.6) {
>> + ir_filter_nr = 0; // strong filtering
>> + } else if (ctx->pitch_gain[0]< 0.9) {
>> + ir_filter_nr = 1; // medium filtering
>> + } else
>> + ir_filter_nr = 2; // no filtering
>> +
>> + /* detect 'onset' */
>> + if (ctx->fixed_gain[0]> 3.0 * ctx->fixed_gain[1]) {
>> + if (ir_filter_nr< 2)
>> + ir_filter_nr++;
>> + } else {
>> + int i, count = 0;
>> +
>> + for (i = 0; i< 6; i++)
>> + if (ctx->pitch_gain[i]< 0.6)
>> + count++;
>> +
>> + if (count> 2)
>> + ir_filter_nr = 0;
>> +
>> + if (ir_filter_nr> ctx->prev_ir_filter_nr + 1)
>> + ir_filter_nr--;
>> + }
>> +
>> + /* update ir filter strength history */
>> + ctx->prev_ir_filter_nr = ir_filter_nr;
>> +
>> + ir_filter_nr += (ctx->fr_cur_mode == MODE_8k85 ? 1 : 0);
>> +
>> + if (ir_filter_nr< 2) {
>> + int i, j;
>> + const float *coef = ir_filters_lookup[ir_filter_nr];
>> +
>> + /* Circular convolution code in the reference
>> + * decoder was modified to avoid using one
>> + * extra array. The filtered vector is given by:
>> + *
>> + * c2(n) = sum(i,0,len-1){ c(i) * coef( (n - i + len) % len ) }
>> + */
>> +
>> + memset(buf, 0, sizeof(float) * AMRWB_SFR_SIZE);
>> + for (i = 0; i< AMRWB_SFR_SIZE; i++)
>
>> + if (fixed_vector[i]) {
>> + int li = AMRWB_SFR_SIZE - i;
>> +
>> + for (j = 0; j< li; j++)
>> + buf[i + j] += fixed_vector[i] * coef[j];
>> +
>> + for (j = 0; j< i; j++)
>> + buf[j] += fixed_vector[i] * coef[j + li];
>> + }
>
> Can celp_filters.c:ff_celp_circ_addf() simplify this?
>
I already gave some thought to that but I couldn't figure out how to
use ff_celp_circ_addf() there. I wrote the algorithm the simplest way
I could think of.
>> +/**
>> + * Extrapolate a ISF vector to the 16kHz range (20th order LP)
>> + * used at mode 6k60 LP filter for the high frequency band
>> + *
>> + * @param[out] out Buffer for extrapolated isf
>> + * @param[in] isf Input isf vector
>> + */
>> +static void extrapolate_isf(float out[LP_ORDER_16k], float isf[LP_ORDER])
>> +{
>> + float diff_isf[LP_ORDER - 2], diff_mean;
>> + float *diff_hi = diff_isf - LP_ORDER + 1; // diff array for extrapolated indices
>> + float corr_lag[3];
>> + float est, scale;
>> + int i, i_max_corr;
>> +
>> + memcpy(out, isf, (LP_ORDER - 1) * sizeof(float));
>> + out[LP_ORDER_16k - 1] = isf[LP_ORDER - 1];
>> +
>> + /* Calculate the difference vector */
>> + for (i = 0; i< LP_ORDER - 2; i++)
>> + diff_isf[i] = isf[i + 1] - isf[i];
>> +
>> + diff_mean = 0.0;
>> + for (i = 2; i< LP_ORDER - 2; i++)
>> + diff_mean += diff_isf[i] / (LP_ORDER - 4);
>
> diff_mean += diff_isf[i] * (1.0f / (LP_ORDER - 4));
>
Fixed
>> + for (i = LP_ORDER - 1; i< LP_ORDER_16k - 1; i++)
>> + diff_hi[i] = scale * (out[i] - out[i - 1]);
>> +
>> + /* Stability insurance */
>> + for (i = LP_ORDER; i< LP_ORDER_16k - 1; i++)
>> + if (diff_hi[i] + diff_hi[i - 1]< 5.0) {
>> + if (diff_hi[i]> diff_hi[i - 1]) {
>> + diff_hi[i - 1] = 5.0 - diff_hi[i];
>> + } else
>> + diff_hi[i] = 5.0 - diff_hi[i - 1];
>> + }
>> +
>> + for (i = LP_ORDER - 1; i< LP_ORDER_16k - 1; i++)
>> + out[i] = out[i - 1] + diff_hi[i] * (1.0f / (1<< 15));
>
> Hmm, this hunk looks a lot like
>
> ff_sort_nearly_sorted_floats(out, LP_ORDER_16k);
> ff_set_min_dist_lsf(out, 5.0f / (1<< 15));
> for (i = LP_ORDER - 1; i< LP_ORDER_16k - 1; i++)
> out[i] = out[i - 1] + scale * (1.0f / (1<< 15)) * (out[i] - out[i-1]);
>
> or something like that.
>
Yes, but the min_dist_lsf would have to be applied between out[i] and
out[i-2] I guess. I can't think of how to simplify this hunk without
changing the output.
>> +/**
>> + * Apply to high-band samples a 15th order filter
>> + * The filter characteristic depends on the given coefficients
>> + *
>> + * @param[out] out Buffer for filtered output
>> + * @param[in] fir_coef Filter coefficients
>> + * @param[in,out] mem State from last filtering (updated)
>> + * @param[in] cp_gain Compensation gain (usually the filter gain)
>> + * @param[in] in Input speech data (high-band)
>> + *
>> + * @remark It is safe to pass the same array in in and out parameters
>> + */
>> +static void hb_fir_filter(float *out, const float fir_coef[HB_FIR_SIZE + 1],
>> + float mem[HB_FIR_SIZE], float cp_gain, const float *in)
>> +{
>> + int i, j;
>> + float data[AMRWB_SFR_SIZE_16k + HB_FIR_SIZE]; // past and current samples
>> +
>> + memcpy(data, mem, HB_FIR_SIZE * sizeof(float));
>> +
>> + for (i = 0; i< AMRWB_SFR_SIZE_16k; i++)
>> + data[i + HB_FIR_SIZE] = in[i] / cp_gain;
>> +
>> + for (i = 0; i< AMRWB_SFR_SIZE_16k; i++) {
>> + out[i] = 0.0;
>> + for (j = 0; j<= HB_FIR_SIZE; j++)
>> + out[i] += data[i + j] * fir_coef[j];
>> + }
>> +
>> + memcpy(mem, data + AMRWB_SFR_SIZE_16k, HB_FIR_SIZE * sizeof(float));
>> +}
>
> This function does too much memcpy: 30+30+80 elements. I suggest that
> you use a buffer HB_FIR_SIZE elements larger where is is called, so
> that the memcpy(data, mem, etc) is not needed.
>
The problem is that this function input is the output from synthesis
which is preceded by its memory. I cannot avoid the memcpy for both
functions. Although I can make the memory buffer 80 elements larger to
avoid the memcpy of 30 elements. This would also solve the problem
with the memory update that I mention later. It would use more memory
at the AMRWBContext though.
> Another thing is that the scaling by cp_gain is not needed. You can
> replace it by multiplying the bpf_6_7_coef table by 4.
>
Don't you mean dividing by 4? Fixed
> Finally, I would prefer if the function contain just the inner loop
> (leaving the memory updating for the caller). This way it is simpler
> to reuse it in other codecs in the future. Since it is also a very
> good target for optimization (AMRWB spend> 30% of the decoding time
> on it), it is nice to have a simple interface the day someone decides
> to optimize it.
>
Hmm, I guess I can't do it with the function as of now because the
inputs I need for the memory update may have been overwritten after
the function returns (if input == output).
>> +static int amrwb_decode_frame(AVCodecContext *avctx, void *data, int *data_size,
>> + AVPacket *avpkt)
>> +{
>> + AMRWBContext *ctx = avctx->priv_data;
>> + AMRWBFrame *cf =&ctx->frame;
>> + const uint8_t *buf = avpkt->data;
>> + int buf_size = avpkt->size;
>> + float *buf_out = data;
>> + float spare_vector[AMRWB_SFR_SIZE]; // extra stack space to hold result from anti-sparseness processing
>> + float fixed_gain_factor; // fixed gain correction factor (gamma)
>> + float *synth_fixed_vector; // pointer to the fixed vector that synthesis should use
>> + float synth_fixed_gain; // the fixed gain that synthesis should use
>> + float voice_fac, stab_fac; // parameters used for gain smoothing
>> + float synth_exc[AMRWB_SFR_SIZE]; // post-processed excitation for synthesis
>> + float hb_exc[AMRWB_SFR_SIZE_16k]; // excitation for the high frequency band
>> + float hb_samples[AMRWB_SFR_SIZE_16k]; // filtered high-band samples from synthesis
>> + float hb_gain;
>> + int sub, i;
>> +
>> + ctx->fr_cur_mode = unpack_bitstream(ctx, buf, buf_size);
>
> Missing a call like
>
>
> if (buf_size< expected_input_size) {
> av_log(avctx, AV_LOG_ERROR,
> "Frame too small (%d bytes). Truncated file?\n", buf_size);
> *data_size = 0;
> return buf_size;
> }
>
> To avoid segfaults for truncated files.
>
Fixed
>> + /* Add the low and high frequency bands */
>> + for (i = 0; i< AMRWB_SFR_SIZE_16k; i++) {
>> + // XXX: the low-band should really be upscaled by 2.0? This
>> + // way the output volume level seems doubled
>> + sub_buf[i] = (sub_buf[i] + hb_samples[i]) / 32768.0;
>
> * (1.0f / (1<< 15));
>
Fixed
--
Marcelo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: amrwb.patch
Type: application/octet-stream
Size: 167085 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-soc/attachments/20100906/970b8f3f/attachment.obj>
More information about the FFmpeg-soc
mailing list