[FFmpeg-soc] [PATCH] AMR-WB Decoder
Marcelo Galvão Póvoa
marspeoplester at gmail.com
Mon Aug 30 16:44:41 CEST 2010
On 30 August 2010 09:31, Vitor Sessak <vitor1001 at gmail.com> wrote:
> On 08/27/2010 11:13 PM, Marcelo Galvão Póvoa wrote:
>>
>> On 26 August 2010 18:55, Vitor Sessak<vitor1001 at gmail.com> wrote:
>>
>>>
>>> Marcelo Galvão Póvoa wrote:
>>>
>>>>
>>>> Fixed excitation array incorrect length.
>>>>
>>>> Should now be applicable cleanly to the latest ffmpeg revision
>>>> (fd151a5f8bd152c456a).
>>>>
>>>
>>> First look:
>>>
>>>
>>>
>
> [...]
>
>>>> +/**
>>>> + * Convert an ISF vector into an ISP vector
>>>> + *
>>>> + * @param[in] isf Isf vector
>>>> + * @param[out] isp Output isp vector
>>>> + * @param[in] size Isf/isp size
>>>> + */
>>>> +static void isf2isp(const float *isf, double *isp, int size)
>>>> +{
>>>> + int i;
>>>> +
>>>> + for (i = 0; i< size - 1; i++)
>>>> + isp[i] = cos(2.0 * M_PI * isf[i]);
>>>> +
>>>> + isp[size - 1] = cos(4.0 * M_PI * isf[size - 1]);
>>>> +}
>>>>
>>>
>>> Almost duplication of amrnbdec.c:lsf2lsp().
>>>
>>>
>>
>> Yes, but this last element doubling diverges between AMR-NB and AMR-WB
>> reference sources.
>>
>
> Can't you just do "isp[size - 1] *= 2" or it is used elsewhere?
>
It's not the last element output that is doubled but the input. Since
it is a cosine, I would have to calculate it again for the last
element I guess.
> [...]
>
>>>> + for(j = i - 1; j> 1; j--)
>>>> + f[j] += f[j - 1] * val + f[j - 2];
>>>> + f[1] += 0.25 * val;
>>>> + }
>>>> +}
>>>>
>>>
>>> Hmm, can't this function be replaced by ff_lsp2polyf() with some
>>> rescaling?
>>>
>>>
>>
>> Actually I realize now that the difference between these functions is
>> just the entire output scaled down by 0.25. That was silly because I
>> was scaling by 4.0 after this function. It is weird why the reference
>> decoder does this process. Maybe it should worth someone checking it
>> out to be sure about this.
>>
>
> I suppose you checked that after this change the output of your decoder
> didn't change, no?
>
Actually I cannot compare the output bitwise because the high band
involves a random excitation, I am afraid that it is not the same for
each run (or it is?). I did several comparisons manually between the
resulting vectors using gdb and they were exactly the same. Also, I
thought about it and I'm convinced that the code does the same as
before. But it is weird, maybe someone should verify it.
>>>> +/**
>>>> + * Convert a ISP vector to LP coefficient domain {a_k}
>>>> + * Equations from TS 26.190 section 5.2.4
>>>> + *
>>>> + * @param[in] isp ISP vector for a subframe
>>>> + * @param[out] lp LP coefficients
>>>> + * @param[in] lp_half_order Half the number of LPs to construct
>>>> + */
>>>> +static void isp2lp(const double *isp, float *lp, int lp_half_order) {
>>>> + double pa[10 + 1], qa[10 + 1];
>>>> + double last_isp = isp[2 * lp_half_order - 1];
>>>> + double qa_old = 0.0;
>>>> + float *lp2 =&lp[2 * lp_half_order];
>>>> + int i;
>>>> +
>>>> + if (lp_half_order> 8) { // high-band specific
>>>> + lsp2polyf_16k(isp, pa, lp_half_order);
>>>> + lsp2polyf_16k(isp + 1, qa, lp_half_order - 1);
>>>> +
>>>> + for (i = 0; i<= lp_half_order; i++)
>>>> + pa[i] *= 4.0;
>>>> + for (i = 0; i< lp_half_order; i++)
>>>> + qa[i] *= 4.0;
>>>> + } else {
>>>> + ff_lsp2polyf(isp, pa, lp_half_order);
>>>> + ff_lsp2polyf(isp + 1, qa, lp_half_order - 1);
>>>> + }
>>>> +
>>>> + for (i = 1; i< lp_half_order; i++) {
>>>> + double paf = (1 + last_isp) * pa[i];
>>>> + double qaf = (1 - last_isp) * (qa[i] - qa_old);
>>>> +
>>>> + qa_old = qa[i - 1];
>>>> +
>>>> + lp[i] = 0.5 * (paf + qaf);
>>>> + lp2[-i] = 0.5 * (paf - qaf);
>>>> + }
>>>> +
>>>> + lp[0] = 1.0;
>>>> + lp[lp_half_order] = 0.5 * (1 + last_isp) * pa[lp_half_order];
>>>> + lp2[0] = last_isp;
>>>> +}
>>>>
>>>
>>> Please double-check if this is not a duplication of
>>> sipr.c:lsp2lpc_sipr().
>>>
>>>
>>
>> Interesting, I believe they are the same function except for the fact
>> that in AMR-WB the first LP coefficient is always 1.0 but this should
>> be easily adapted by passing lp + 1 to the function and setting lp[0]
>> = 1.0. Also, the LP order is different between these codecs. To reuse
>> this existing function it should be moved to another file (lsp.c I
>> guess) and modified. Should I do it and include in the next patch?
>>
>
> You can send directly a patch to ffmpeg-devel to move this code to lsp.c.
>
Ok, I will try it.
[...]
>
>>>> +/**
>>>> + * Upsample a signal by 5/4 ratio (from 12.8kHz to 16kHz) using
>>>> + * a FIR interpolation filter. Uses past data from before *in address
>>>> + *
>>>> + * @param[out] out Buffer for interpolated signal
>>>> + * @param[in] in Current signal data (length
>>>> 0.8*o_size)
>>>> + * @param[in] o_size Output signal length
>>>> + */
>>>> +static void upsample_5_4(float *out, const float *in, int o_size)
>>>> +{
>>>> + const float *in0 = in - UPS_FIR_SIZE + 1;
>>>> + int i;
>>>> +
>>>> + for (i = 0; i< o_size; i++) {
>>>> + int int_part = (4 * i) / 5;
>>>> + int frac_part = (4 * i) - 5 * int_part;
>>>>
>>>
>>> You can break this loop in two to avoid the division:
>>>
>>> i = 0;
>>> for (j = 0; j< o_size/5; j++)
>>> for (k = 0; k< 5; k++) {
>>> ....
>>> i++;
>>> }
>>>
>>>
>>
>> I've done this in not so neat way, adding 4 and calculating a simple
>> remainder by 5. Tell me if can be done in a better way.
>>
>
> Does something like the following work?
>
> int int_part = 0, frac_part = 0;
>
> i = 0;
> for (j = 0; j< o_size / 5; j++) {
> out[i] = in[int_part];
> frac_part = 4;
> i++;
> for (k = 1; k< 5; k++) {
> out[i] = ff_dot_productf(in0 + int_part, upsample_fir[4 -
> frac_part],
> UPS_MEM_SIZE);
> int_part++;
> frac_part--;
> i++;
> }
> }
>
Yes, I guess it works. Despite being very specific to the 5/4 case, it
should be more efficient. Fixed
[...]
>> +/**
>> + * Parses a speech frame, storing data in the Context
>> + *
>> + * @param[in,out] ctx The context
>> + * @param[in] buf Pointer to the input buffer
>> + * @param[in] buf_size Size of the input buffer
>> + *
>> + * @return The frame mode
>> + */
>> +static enum Mode unpack_bitstream(AMRWBContext *ctx, const uint8_t *buf,
>> + int buf_size)
>> +{
>> + GetBitContext gb;
>> + uint16_t *data;
>> +
>> + init_get_bits(&gb, buf, buf_size * 8);
>> +
>> + /* decode frame header (1st octet) */
>> + skip_bits(&gb, 1); // padding bit
>> + ctx->fr_cur_mode = get_bits(&gb, 4);
>> + ctx->fr_quality = get_bits1(&gb);
>> + skip_bits(&gb, 2); // padding bits
>> +
>> + // XXX: We are using only the "MIME/storage" format
>> + // used by libopencore. This format is simpler and
>> + // does not carry the auxiliary information of the frame
>> +
>> + data = (uint16_t *)&ctx->frame;
>> + memset(data, 0, sizeof(AMRWBFrame));
>> + buf++;
>> +
>> + if (ctx->fr_cur_mode< MODE_SID) { /* Normal speech frame */
>> + const uint16_t *perm =
>> amr_bit_orderings_by_mode[ctx->fr_cur_mode];
>> + int field_size;
>> +
>> + while ((field_size = *perm++)) {
>> + int field = 0;
>> + int field_offset = *perm++;
>> + while (field_size--) {
>> + uint16_t bit_idx = *perm++;
>> + field<<= 1;
>> + /* The bit index inside the byte is reversed (MSB->LSB) */
>> + field |= BIT_POS(buf[bit_idx>> 3], 7 - (bit_idx& 7));
>> + }
>> + data[field_offset] = field;
>> + }
>> + }
>> +
>> + return ctx->fr_cur_mode;
>> +}
>>
>
> Can't you reorder the bit indexes in a way it is in the LSB->MSB so you can
> use just the usual bitstream functions?
>
Which bitstream functions? If the tables were in LSB->MSB order, it
would be the same algorithm as in AMR-NB, the only difference is that
I do (7 - index) to reverse the order.
On 30 August 2010 10:00, Rob <robert.swain at gmail.com> wrote:
> On 30 August 2010 14:31, Vitor Sessak <vitor1001 at gmail.com> wrote:
>> On 08/27/2010 11:13 PM, Marcelo Galvão Póvoa wrote:
>>> On 26 August 2010 18:55, Vitor Sessak<vitor1001 at gmail.com> wrote:
>>>> Marcelo Galvão Póvoa wrote:
>
> [...]
>
>>>>> +/**
>>>>> + * Generate the high-band excitation with the same energy from the
>>>>> lower
>>>>> + * one and scaled by the given gain
>>>>> + *
>>>>> + * @param[in] ctx The context
>>>>> + * @param[out] hb_exc Buffer for the excitation
>>>>> + * @param[in] synth_exc Low-band excitation used for
>>>>> synthesis
>>>>> + * @param[in] hb_gain Wanted excitation gain
>>>>> + */
>>>>> +static void scaled_hb_excitation(AMRWBContext *ctx, float *hb_exc,
>>>>> + const float *synth_exc, float hb_gain)
>>>>> +{
>>>>> + int i;
>>>>> + float energy = ff_dot_productf(synth_exc, synth_exc,
>>>>> AMRWB_SFR_SIZE);
>>>>> +
>>>>> + /* Generate a white-noise excitation */
>>>>> + for (i = 0; i< AMRWB_SFR_SIZE_16k; i++)
>>>>> + hb_exc[i] = 32768.0 - (uint16_t) av_lfg_get(&ctx->prng);
>>>>> +
>>>>> + ff_scale_vector_to_given_sum_of_squares(hb_exc, hb_exc, energy,
>>>>> + AMRWB_SFR_SIZE_16k);
>>>>> +
>>>>> + for (i = 0; i< AMRWB_SFR_SIZE_16k; i++)
>>>>> + hb_exc[i] *= hb_gain;
>>>>> +}
>>>>
>>>> Why are you scaling it twice?
>>>
>>> The first scaling is to match the energy with the lower band part. The
>>> second is the high_band gain itself. This can be done in only one loop
>>> using ff_scale_vector_to_given_sum_of_squares() ?
>>
>> I think you will be obliged to duplicate a little of
>> ff_scale_vector_to_given_sum_of_squares(), but it will make your code
>> faster...
>
> The point is to avoid iterating over the subframe to multiply by some
> value twice by merging the two beforehand and then applying to the
> vector. Looking at the code:
>
> void ff_scale_vector_to_given_sum_of_squares(float *out, const float *in,
> float sum_of_squares, const int n)
> {
> int i;
> float scalefactor = ff_dot_productf(in, in, n);
> if (scalefactor)
> scalefactor = sqrt(sum_of_squares / scalefactor);
> for (i = 0; i < n; i++)
> out[i] = in[i] * scalefactor;
> }
>
> We want:
> scalefactor * hb_gain =
> sqrt(sum_of_squares / ff_dot_productf(in, in, n)) * hb_gain =
> sqrt(sum_of_squares * hb_gain * hb_gain / ff_dot_productf())
> hb_gain is [0.1, 1.0] according to find_hb_gain() so there's no sign
> issue
>
> So you should just be able to pass in energy * hb_gain * hb_gain I
> think. Or duplicate the code.
>
Yes, in fact it is better to pass energy * hb_gain * hb_gain. Fixed
--
Marcelo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: amrwb.patch
Type: application/octet-stream
Size: 155416 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-soc/attachments/20100830/10a06362/attachment.obj>
More information about the FFmpeg-soc
mailing list