[FFmpeg-soc] [PATCH] AMR-WB Decoder
Vitor Sessak
vitor1001 at gmail.com
Mon Aug 30 14:31:18 CEST 2010
On 08/27/2010 11:13 PM, Marcelo Galvão Póvoa wrote:
> On 26 August 2010 18:55, Vitor Sessak<vitor1001 at gmail.com> wrote:
>
>> Marcelo Galvão Póvoa wrote:
>>
>>> Fixed excitation array incorrect length.
>>>
>>> Should now be applicable cleanly to the latest ffmpeg revision
>>> (fd151a5f8bd152c456a).
>>>
>> First look:
>>
>>
>>
[...]
>>> +/**
>>> + * Convert an ISF vector into an ISP vector
>>> + *
>>> + * @param[in] isf Isf vector
>>> + * @param[out] isp Output isp vector
>>> + * @param[in] size Isf/isp size
>>> + */
>>> +static void isf2isp(const float *isf, double *isp, int size)
>>> +{
>>> + int i;
>>> +
>>> + for (i = 0; i< size - 1; i++)
>>> + isp[i] = cos(2.0 * M_PI * isf[i]);
>>> +
>>> + isp[size - 1] = cos(4.0 * M_PI * isf[size - 1]);
>>> +}
>>>
>> Almost duplication of amrnbdec.c:lsf2lsp().
>>
>>
> Yes, but this last element doubling diverges between AMR-NB and AMR-WB
> reference sources.
>
Can't you just do "isp[size - 1] *= 2" or it is used elsewhere?+
f[i] = val * f[i - 1] + 2 * f[i - 2];
[...]
>>> + for(j = i - 1; j> 1; j--)
>>> + f[j] += f[j - 1] * val + f[j - 2];
>>> + f[1] += 0.25 * val;
>>> + }
>>> +}
>>>
>> Hmm, can't this function be replaced by ff_lsp2polyf() with some rescaling?
>>
>>
> Actually I realize now that the difference between these functions is
> just the entire output scaled down by 0.25. That was silly because I
> was scaling by 4.0 after this function. It is weird why the reference
> decoder does this process. Maybe it should worth someone checking it
> out to be sure about this.
>
I suppose you checked that after this change the output of your decoder
didn't change, no?
>>> +/**
>>> + * Convert a ISP vector to LP coefficient domain {a_k}
>>> + * Equations from TS 26.190 section 5.2.4
>>> + *
>>> + * @param[in] isp ISP vector for a subframe
>>> + * @param[out] lp LP coefficients
>>> + * @param[in] lp_half_order Half the number of LPs to construct
>>> + */
>>> +static void isp2lp(const double *isp, float *lp, int lp_half_order) {
>>> + double pa[10 + 1], qa[10 + 1];
>>> + double last_isp = isp[2 * lp_half_order - 1];
>>> + double qa_old = 0.0;
>>> + float *lp2 =&lp[2 * lp_half_order];
>>> + int i;
>>> +
>>> + if (lp_half_order> 8) { // high-band specific
>>> + lsp2polyf_16k(isp, pa, lp_half_order);
>>> + lsp2polyf_16k(isp + 1, qa, lp_half_order - 1);
>>> +
>>> + for (i = 0; i<= lp_half_order; i++)
>>> + pa[i] *= 4.0;
>>> + for (i = 0; i< lp_half_order; i++)
>>> + qa[i] *= 4.0;
>>> + } else {
>>> + ff_lsp2polyf(isp, pa, lp_half_order);
>>> + ff_lsp2polyf(isp + 1, qa, lp_half_order - 1);
>>> + }
>>> +
>>> + for (i = 1; i< lp_half_order; i++) {
>>> + double paf = (1 + last_isp) * pa[i];
>>> + double qaf = (1 - last_isp) * (qa[i] - qa_old);
>>> +
>>> + qa_old = qa[i - 1];
>>> +
>>> + lp[i] = 0.5 * (paf + qaf);
>>> + lp2[-i] = 0.5 * (paf - qaf);
>>> + }
>>> +
>>> + lp[0] = 1.0;
>>> + lp[lp_half_order] = 0.5 * (1 + last_isp) * pa[lp_half_order];
>>> + lp2[0] = last_isp;
>>> +}
>>>
>> Please double-check if this is not a duplication of sipr.c:lsp2lpc_sipr().
>>
>>
> Interesting, I believe they are the same function except for the fact
> that in AMR-WB the first LP coefficient is always 1.0 but this should
> be easily adapted by passing lp + 1 to the function and setting lp[0]
> = 1.0. Also, the LP order is different between these codecs. To reuse
> this existing function it should be moved to another file (lsp.c I
> guess) and modified. Should I do it and include in the next patch?
>
You can send directly a patch to ffmpeg-devel to move this code to lsp.c.
>>> +static void decode_pitch_vector(AMRWBContext *ctx,
>>> + const AMRWBSubFrame *amr_subframe,
>>> + const int subframe)
>>> +{
>>> + int pitch_lag_int, pitch_lag_frac;
>>> + int i;
>>> + float *exc = ctx->excitation;
>>> + enum Mode mode = ctx->fr_cur_mode;
>>> +
>>> + if (mode<= MODE_8k85) {
>>> + decode_pitch_lag_low(&pitch_lag_int,&pitch_lag_frac,
>>> amr_subframe->ada
>>> p,
>>> +&ctx->base_pitch_lag, subframe, mode);
>>> + } else
>>> + decode_pitch_lag_high(&pitch_lag_int,&pitch_lag_frac,
>>> amr_subframe->ad
>>> ap,
>>> +&ctx->base_pitch_lag, subframe);
>>> +
>>> + ctx->pitch_lag_int = pitch_lag_int;
>>> + pitch_lag_int += (pitch_lag_frac< 0 ? -1 : 0) + (pitch_lag_frac
>>> ? 1 : 0);
>>> +
>>> +
>>> + /* Calculate the pitch vector by interpolating the past excitation at
>>> the
>>> + pitch lag using a hamming windowed sinc function */
>>> + ff_acelp_interpolatef(exc, exc + 1 - pitch_lag_int,
>>> + ac_inter, 4,
>>> + pitch_lag_frac + (pitch_lag_frac> 0 ? 0 : 4),
>>> + LP_ORDER, AMRWB_SFR_SIZE + 1);
>>>
>> ac_inter is yet another hamming function. Can you check if you cannot reuse
>> acelp_vectors.c:ff_b60_sinc or sipr16kdata.h:sinc_win?
>>
>>
> Well, I cannot tell precisely if it cannot be reused but I guess so.
> The number of coefficients in the table is related to the fractional
> resolution of the interpolation: in AMR-NB and SIPR it is 4 but in
> AMR-WB it is 6. Also, I don't see similarities between the table I
> extracted from the reference and these tables.
>
Ok, so leave it like this by now.
>>> +/**
>>> + * Reduce fixed vector sparseness by smoothing with one of three IR
>>> filters
>>> + * Also known as "adaptive phase dispersion"
>>> + *
>>> + * @param[in] ctx The context
>>> + * @param[in,out] fixed_vector Unfiltered fixed vector
>>> + * @param[out] buf Space for modified vector if necessary
>>> + *
>>> + * @return The potentially overwritten filtered fixed vector address
>>> + */
>>> +static float *anti_sparseness(AMRWBContext *ctx,
>>> + float *fixed_vector, float *buf)
>>>
>> amrnbedec.c has a function with the same name. Any code can be reused?
>>
>>
> The function skeletons are similar but there are differences
> everywhere. I can think about easily reusing only about 6 lines, which
> wouldn't worth much
>
Ok, leave it as-is then.
>>> +/**
>>> + * Apply to synthesis a 2nd order high-pass filter
>>> + *
>>> + * @param[out] out Buffer for filtered output
>>> + * @param[in] hpf_coef Filter coefficients as used below
>>> + * @param[in,out] mem State from last filtering (updated)
>>> + * @param[in] in Input speech data
>>> + *
>>> + * @remark It is safe to pass the same array in in and out parameters
>>> + */
>>> +static void high_pass_filter(float *out, const float hpf_coef[2][3],
>>> + float mem[4], const float *in)
>>> +{
>>> + int i;
>>> + float *x = mem - 1, *y = mem + 2; // previous inputs and outputs
>>> +
>>> + for (i = 0; i< AMRWB_SFR_SIZE; i++) {
>>> + float x0 = in[i];
>>> +
>>> + out[i] = hpf_coef[0][0] * x0 + hpf_coef[1][0] * y[0] +
>>> + hpf_coef[0][1] * x[1] + hpf_coef[1][1] * y[1] +
>>> + hpf_coef[0][2] * x[2];
>>> +
>>> + y[1] = y[0];
>>> + y[0] = out[i];
>>> +
>>> + x[2] = x[1];
>>> + x[1] = x0;
>>> + }
>>> +}
>>>
>> acelp_filter.c:ff_acelp_apply_order_2_transfer_function()
>>
>>
> Are you sure? I can't see them as equivalent, could you explain?
>
I'll give a better look at it soon...
>>> +/**
>>> + * Upsample a signal by 5/4 ratio (from 12.8kHz to 16kHz) using
>>> + * a FIR interpolation filter. Uses past data from before *in address
>>> + *
>>> + * @param[out] out Buffer for interpolated signal
>>> + * @param[in] in Current signal data (length
>>> 0.8*o_size)
>>> + * @param[in] o_size Output signal length
>>> + */
>>> +static void upsample_5_4(float *out, const float *in, int o_size)
>>> +{
>>> + const float *in0 = in - UPS_FIR_SIZE + 1;
>>> + int i;
>>> +
>>> + for (i = 0; i< o_size; i++) {
>>> + int int_part = (4 * i) / 5;
>>> + int frac_part = (4 * i) - 5 * int_part;
>>>
>> You can break this loop in two to avoid the division:
>>
>> i = 0;
>> for (j = 0; j< o_size/5; j++)
>> for (k = 0; k< 5; k++) {
>> ....
>> i++;
>> }
>>
>>
> I've done this in not so neat way, adding 4 and calculating a simple
> remainder by 5. Tell me if can be done in a better way.
>
Does something like the following work?
int int_part = 0, frac_part = 0;
i = 0;
for (j = 0; j< o_size / 5; j++) {
out[i] = in[int_part];
frac_part = 4;
i++;
for (k = 1; k< 5; k++) {
out[i] = ff_dot_productf(in0 + int_part, upsample_fir[4 - frac_part],
UPS_MEM_SIZE);
int_part++;
frac_part--;
i++;
}
}
>>> +/**
>>> + * Generate the high-band excitation with the same energy from the lower
>>> + * one and scaled by the given gain
>>> + *
>>> + * @param[in] ctx The context
>>> + * @param[out] hb_exc Buffer for the excitation
>>> + * @param[in] synth_exc Low-band excitation used for synthesis
>>> + * @param[in] hb_gain Wanted excitation gain
>>> + */
>>> +static void scaled_hb_excitation(AMRWBContext *ctx, float *hb_exc,
>>> + const float *synth_exc, float hb_gain)
>>> +{
>>> + int i;
>>> + float energy = ff_dot_productf(synth_exc, synth_exc, AMRWB_SFR_SIZE);
>>> +
>>> + /* Generate a white-noise excitation */
>>> + for (i = 0; i< AMRWB_SFR_SIZE_16k; i++)
>>> + hb_exc[i] = 32768.0 - (uint16_t) av_lfg_get(&ctx->prng);
>>> +
>>> + ff_scale_vector_to_given_sum_of_squares(hb_exc, hb_exc, energy,
>>> + AMRWB_SFR_SIZE_16k);
>>> +
>>> + for (i = 0; i< AMRWB_SFR_SIZE_16k; i++)
>>> + hb_exc[i] *= hb_gain;
>>> +}
>>>
>> Why are you scaling it twice?
>>
>>
> The first scaling is to match the energy with the lower band part. The
> second is the high_band gain itself. This can be done in only one loop
> using ff_scale_vector_to_given_sum_of_squares() ?
>
I think you will be obliged to duplicate a little of
ff_scale_vector_to_given_sum_of_squares(), but it will make your code
faster...
>>> +/**
>>> + * Apply to high-band samples a 15th order filter
>>> + * The filter characteristic depends on the given coefficients
>>> + *
>>> + * @param[out] out Buffer for filtered output
>>> + * @param[in] fir_coef Filter coefficients
>>> + * @param[in,out] mem State from last filtering (updated)
>>> + * @param[in] cp_gain Compensation gain (usually the filter
>>> gain)
>>> + * @param[in] in Input speech data (high-band)
>>> + *
>>> + * @remark It is safe to pass the same array in in and out parameters
>>> + */
>>> +static void hb_fir_filter(float *out, const float fir_coef[HB_FIR_SIZE +
>>> 1],
>>> + float mem[HB_FIR_SIZE], float cp_gain, const
>>> float *in)
>>> +{
>>> + int i, j;
>>> + float data[AMRWB_SFR_SIZE_16k + HB_FIR_SIZE]; // past and current
>>> samples
>>> +
>>> + memcpy(data, mem, HB_FIR_SIZE * sizeof(float));
>>> +
>>> + for (i = 0; i< AMRWB_SFR_SIZE_16k; i++)
>>> + data[i + HB_FIR_SIZE] = in[i] / cp_gain;
>>> +
>>> + for (i = 0; i< AMRWB_SFR_SIZE_16k; i++) {
>>> + out[i] = 0.0;
>>> + for (j = 0; j<= HB_FIR_SIZE; j++)
>>> + out[i] += data[i + j] * fir_coef[j];
>>> + }
>>> +
>>> + memcpy(mem, data + AMRWB_SFR_SIZE_16k, HB_FIR_SIZE * sizeof(float));
>>> +}
>>>
>> I think it is cleaner (and more consistent) to do like the synthesis filter
>> and get one single buffer for output and memory...
>>
>>
> The problem is that this filter memory consists of past inputs and not
> outputs. If I would do with a single buffer I would have to copy the
> input to it in order to avoid overwriting the last filter output.
>
ok
Also
> +/**
> + * Parses a speech frame, storing data in the Context
> + *
> + * @param[in,out] ctx The context
> + * @param[in] buf Pointer to the input buffer
> + * @param[in] buf_size Size of the input buffer
> + *
> + * @return The frame mode
> + */
> +static enum Mode unpack_bitstream(AMRWBContext *ctx, const uint8_t *buf,
> + int buf_size)
> +{
> + GetBitContext gb;
> + uint16_t *data;
> +
> + init_get_bits(&gb, buf, buf_size * 8);
> +
> + /* decode frame header (1st octet) */
> + skip_bits(&gb, 1); // padding bit
> + ctx->fr_cur_mode = get_bits(&gb, 4);
> + ctx->fr_quality = get_bits1(&gb);
> + skip_bits(&gb, 2); // padding bits
> +
> + // XXX: We are using only the "MIME/storage" format
> + // used by libopencore. This format is simpler and
> + // does not carry the auxiliary information of the frame
> +
> + data = (uint16_t *)&ctx->frame;
> + memset(data, 0, sizeof(AMRWBFrame));
> + buf++;
> +
> + if (ctx->fr_cur_mode< MODE_SID) { /* Normal speech frame */
> + const uint16_t *perm = amr_bit_orderings_by_mode[ctx->fr_cur_mode];
> + int field_size;
> +
> + while ((field_size = *perm++)) {
> + int field = 0;
> + int field_offset = *perm++;
> + while (field_size--) {
> + uint16_t bit_idx = *perm++;
> + field<<= 1;
> + /* The bit index inside the byte is reversed (MSB->LSB) */
> + field |= BIT_POS(buf[bit_idx>> 3], 7 - (bit_idx& 7));
> + }
> + data[field_offset] = field;
> + }
> + }
> +
> + return ctx->fr_cur_mode;
> +}
>
Can't you reorder the bit indexes in a way it is in the LSB->MSB so you
can use just the usual bitstream functions?
-Vitor
More information about the FFmpeg-soc
mailing list