[FFmpeg-soc] [PATCH] AMR-WB Decoder
Vitor Sessak
vitor1001 at gmail.com
Mon Aug 30 17:39:33 CEST 2010
On 08/30/2010 04:44 PM, Marcelo Galvão Póvoa wrote:
> On 30 August 2010 09:31, Vitor Sessak<vitor1001 at gmail.com> wrote:
>
>> On 08/27/2010 11:13 PM, Marcelo Galvão Póvoa wrote:
>>
>>> On 26 August 2010 18:55, Vitor Sessak<vitor1001 at gmail.com> wrote:
>>>
>>>
>>>> Marcelo Galvão Póvoa wrote:
>>>>
>>>>
>>>>> Fixed excitation array incorrect length.
>>>>>
>>>>> Should now be applicable cleanly to the latest ffmpeg revision
>>>>> (fd151a5f8bd152c456a).
>>>>>
>>>>>
>>>> First look:
>>>>
>>>>
>>>>
>>>>
>> [...]
>>
>>
>>>>> +/**
>>>>> + * Convert an ISF vector into an ISP vector
>>>>> + *
>>>>> + * @param[in] isf Isf vector
>>>>> + * @param[out] isp Output isp vector
>>>>> + * @param[in] size Isf/isp size
>>>>> + */
>>>>> +static void isf2isp(const float *isf, double *isp, int size)
>>>>> +{
>>>>> + int i;
>>>>> +
>>>>> + for (i = 0; i< size - 1; i++)
>>>>> + isp[i] = cos(2.0 * M_PI * isf[i]);
>>>>> +
>>>>> + isp[size - 1] = cos(4.0 * M_PI * isf[size - 1]);
>>>>> +}
>>>>>
>>>>>
>>>> Almost duplication of amrnbdec.c:lsf2lsp().
>>>>
>>>>
>>>>
>>> Yes, but this last element doubling diverges between AMR-NB and AMR-WB
>>> reference sources.
>>>
>>>
>> Can't you just do "isp[size - 1] *= 2" or it is used elsewhere?
>>
>>
> It's not the last element output that is doubled but the input. Since
> it is a cosine, I would have to calculate it again for the last
> element I guess.
>
I mean doubling the last element of the input before calling the
function, if it doesn't make the rest of the code more complex.
>> [...]
>>
>>
>>>>> + for(j = i - 1; j> 1; j--)
>>>>> + f[j] += f[j - 1] * val + f[j - 2];
>>>>> + f[1] += 0.25 * val;
>>>>> + }
>>>>> +}
>>>>>
>>>>>
>>>> Hmm, can't this function be replaced by ff_lsp2polyf() with some
>>>> rescaling?
>>>>
>>>>
>>>>
>>> Actually I realize now that the difference between these functions is
>>> just the entire output scaled down by 0.25. That was silly because I
>>> was scaling by 4.0 after this function. It is weird why the reference
>>> decoder does this process. Maybe it should worth someone checking it
>>> out to be sure about this.
>>>
>>>
>> I suppose you checked that after this change the output of your decoder
>> didn't change, no?
>>
>>
> Actually I cannot compare the output bitwise because the high band
> involves a random excitation, I am afraid that it is not the same for
> each run (or it is?).
Yes, you always send 1 as seed, so the prng output is always the same
(and this is a good thing), but since you use floats output might not be
bitexact. I suggest you do:
./tests/tiny_psnr output_ref.wav output_patched.wav 2
And see if stddev is ~ 0.1
[...]
> +/**
>>> + * Parses a speech frame, storing data in the Context
>>> + *
>>> + * @param[in,out] ctx The context
>>> + * @param[in] buf Pointer to the input buffer
>>> + * @param[in] buf_size Size of the input buffer
>>> + *
>>> + * @return The frame mode
>>> + */
>>> +static enum Mode unpack_bitstream(AMRWBContext *ctx, const uint8_t *buf,
>>> + int buf_size)
>>> +{
>>> + GetBitContext gb;
>>> + uint16_t *data;
>>> +
>>> + init_get_bits(&gb, buf, buf_size * 8);
>>> +
>>> + /* decode frame header (1st octet) */
>>> + skip_bits(&gb, 1); // padding bit
>>> + ctx->fr_cur_mode = get_bits(&gb, 4);
>>> + ctx->fr_quality = get_bits1(&gb);
>>> + skip_bits(&gb, 2); // padding bits
>>> +
>>> + // XXX: We are using only the "MIME/storage" format
>>> + // used by libopencore. This format is simpler and
>>> + // does not carry the auxiliary information of the frame
>>> +
>>> + data = (uint16_t *)&ctx->frame;
>>> + memset(data, 0, sizeof(AMRWBFrame));
>>> + buf++;
>>> +
>>> + if (ctx->fr_cur_mode< MODE_SID) { /* Normal speech frame */
>>> + const uint16_t *perm =
>>> amr_bit_orderings_by_mode[ctx->fr_cur_mode];
>>> + int field_size;
>>> +
>>> + while ((field_size = *perm++)) {
>>> + int field = 0;
>>> + int field_offset = *perm++;
>>> + while (field_size--) {
>>> + uint16_t bit_idx = *perm++;
>>> + field<<= 1;
>>> + /* The bit index inside the byte is reversed (MSB->LSB) */
>>> + field |= BIT_POS(buf[bit_idx>> 3], 7 - (bit_idx& 7));
>>> + }
>>> + data[field_offset] = field;
>>> + }
>>> + }
>>> +
>>> + return ctx->fr_cur_mode;
>>> +}
>>>
>>>
>> Can't you reorder the bit indexes in a way it is in the LSB->MSB so you can
>> use just the usual bitstream functions?
>>
>>
> Which bitstream functions?
Those in libavcodec/get_bits.h (get_bits(), etc).
> If the tables were in LSB->MSB order, it
> would be the same algorithm as in AMR-NB, the only difference is that
> I do (7 - index) to reverse the order.
>
Can't you change the tables in a way you can reuse the unpack function
of AMRNB?
-Vitor
More information about the FFmpeg-soc
mailing list