[FFmpeg-soc] [PATCH] AMR-WB Decoder

Marcelo Galvão Póvoa marspeoplester at gmail.com
Thu Sep 2 21:30:19 CEST 2010


Sorry about the delay to reply

On 30 August 2010 12:39, Vitor Sessak <vitor1001 at gmail.com> wrote:
> On 08/30/2010 04:44 PM, Marcelo Galvão Póvoa wrote:
>>
>> On 30 August 2010 09:31, Vitor Sessak<vitor1001 at gmail.com>  wrote:
>>
>>>
>>> On 08/27/2010 11:13 PM, Marcelo Galvão Póvoa wrote:
>>>
>>>>
>>>> On 26 August 2010 18:55, Vitor Sessak<vitor1001 at gmail.com>    wrote:
>>>>
>>>>
>>>>>
>>>>> Marcelo Galvão Póvoa wrote:
>>>>>
>>>>>
>>>>>>
>>>>>> Fixed excitation array incorrect length.
>>>>>>
>>>>>> Should now be applicable cleanly to the latest ffmpeg revision
>>>>>> (fd151a5f8bd152c456a).
>>>>>>
>>>>>>
>>>>>
>>>>> First look:
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>> [...]
>>>
>>>
>>>>>>
>>>>>> +/**
>>>>>> + * Convert an ISF vector into an ISP vector
>>>>>> + *
>>>>>> + * @param[in] isf                  Isf vector
>>>>>> + * @param[out] isp                 Output isp vector
>>>>>> + * @param[in] size                 Isf/isp size
>>>>>> + */
>>>>>> +static void isf2isp(const float *isf, double *isp, int size)
>>>>>> +{
>>>>>> +    int i;
>>>>>> +
>>>>>> +    for (i = 0; i<    size - 1; i++)
>>>>>> +        isp[i] = cos(2.0 * M_PI * isf[i]);
>>>>>> +
>>>>>> +    isp[size - 1] = cos(4.0 * M_PI * isf[size - 1]);
>>>>>> +}
>>>>>>
>>>>>>
>>>>>
>>>>> Almost duplication of amrnbdec.c:lsf2lsp().
>>>>>
>>>>>
>>>>>
>>>>
>>>> Yes, but this last element doubling diverges between AMR-NB and AMR-WB
>>>> reference sources.
>>>>
>>>>
>>>
>>> Can't you just do "isp[size - 1] *=  2" or it is used elsewhere?
>>>
>>>
>>
>> It's not the last element output that is doubled but the input. Since
>> it is a cosine, I would have to calculate it again for the last
>> element I guess.
>>
>
> I mean doubling the last element of the input before calling the function,
> if it doesn't make the rest of the code more complex.
>

Well, I only call this function in two places, so it seems to worth
reusing this way

>>> [...]
>>>
>>>
>>>>>>
>>>>>> +        for(j = i - 1; j>    1; j--)
>>>>>> +            f[j] += f[j - 1] * val + f[j - 2];
>>>>>> +        f[1] += 0.25 * val;
>>>>>> +    }
>>>>>> +}
>>>>>>
>>>>>>
>>>>>
>>>>> Hmm, can't this function be replaced by ff_lsp2polyf() with some
>>>>> rescaling?
>>>>>
>>>>>
>>>>>
>>>>
>>>> Actually I realize now that the difference between these functions is
>>>> just the entire output scaled down by 0.25. That was silly because I
>>>> was scaling by 4.0 after this function. It is weird why the reference
>>>> decoder does this process. Maybe it should worth someone checking it
>>>> out to be sure about this.
>>>>
>>>>
>>>
>>> I suppose you checked that after this change the output of your decoder
>>> didn't change, no?
>>>
>>>
>>
>> Actually I cannot compare the output bitwise because the high band
>> involves a random excitation, I am afraid that it is not the same for
>> each run (or it is?).
>
> Yes, you always send 1 as seed, so the prng output is always the same (and
> this is a good thing), but since you use floats output might not be
> bitexact. I suggest you do:
>
> ./tests/tiny_psnr output_ref.wav output_patched.wav 2
>
> And see if stddev is ~ 0.1
>

By "output_ref.wav" you mean the earlier output from my decoder or the
fixed-point reference implementation? I did both tests comparing my
recent decoder with a 1kHz sine wave to:
- 3GPP:
stddev:  340.60 PSNR: 45.68 MAXDIFF: 1130 bytes:   959360/   959360
- Mine before the reviews:
stddev:    0.00 PSNR:999.99 MAXDIFF:    0 bytes:   959360/   959360

I don't know how to analyze all these values

> [...]
>
>> +/**
>>>>
>>>> + * Parses a speech frame, storing data in the Context
>>>> + *
>>>> + * @param[in,out] ctx              The context
>>>> + * @param[in] buf                  Pointer to the input buffer
>>>> + * @param[in] buf_size             Size of the input buffer
>>>> + *
>>>> + * @return The frame mode
>>>> + */
>>>> +static enum Mode unpack_bitstream(AMRWBContext *ctx, const uint8_t
>>>> *buf,
>>>> +                                  int buf_size)
>>>> +{
>>>> +    GetBitContext gb;
>>>> +    uint16_t *data;
>>>> +
>>>> +    init_get_bits(&gb, buf, buf_size * 8);
>>>> +
>>>> +    /* decode frame header (1st octet) */
>>>> +    skip_bits(&gb, 1);  // padding bit
>>>> +    ctx->fr_cur_mode  = get_bits(&gb, 4);
>>>> +    ctx->fr_quality   = get_bits1(&gb);
>>>> +    skip_bits(&gb, 2);  // padding bits
>>>> +
>>>> +    // XXX: We are using only the "MIME/storage" format
>>>> +    // used by libopencore. This format is simpler and
>>>> +    // does not carry the auxiliary information of the frame
>>>> +
>>>> +    data = (uint16_t *)&ctx->frame;
>>>> +    memset(data, 0, sizeof(AMRWBFrame));
>>>> +    buf++;
>>>> +
>>>> +    if (ctx->fr_cur_mode<    MODE_SID) { /* Normal speech frame */
>>>> +        const uint16_t *perm =
>>>> amr_bit_orderings_by_mode[ctx->fr_cur_mode];
>>>> +        int field_size;
>>>> +
>>>> +        while ((field_size = *perm++)) {
>>>> +            int field = 0;
>>>> +            int field_offset = *perm++;
>>>> +            while (field_size--) {
>>>> +               uint16_t bit_idx = *perm++;
>>>> +               field<<= 1;
>>>> +               /* The bit index inside the byte is reversed (MSB->LSB)
>>>> */
>>>> +               field |= BIT_POS(buf[bit_idx>>    3], 7 - (bit_idx&
>>>>  7));
>>>> +            }
>>>> +            data[field_offset] = field;
>>>> +        }
>>>> +    }
>>>> +
>>>> +    return ctx->fr_cur_mode;
>>>> +}
>>>>
>>>>
>>>
>>> Can't you reorder the bit indexes in a way it is in the LSB->MSB so you
>>> can
>>> use just the usual bitstream functions?
>>>
>>>
>>
>> Which bitstream functions?
>
> Those in libavcodec/get_bits.h (get_bits(), etc).
>

I am following the unpack_bitstream() from AMR-NB

>> If the tables were in LSB->MSB order, it
>> would be the same algorithm as in AMR-NB, the only difference is that
>> I do (7 - index) to reverse the order.
>>
>
> Can't you change the tables in a way you can reuse the unpack function of
> AMRNB?
>

I guess so, after all it's an interesting method of reordering and
assigning the variables

So, until now we have three functions that can be reused and would
need to move their code to an appropriate header file:
- sipr.c:lsp2lpc_sipr() -> lsp.h: Do you have a suggestion to a new
name for this? There is already a function called ff_acelp_lsp2lpc()

- amrnbdec.c:lsf2lsp() -> lsp.h

- internal part of amrnbdec.c:unpack_bitstream() -> get_bits.h (I guess)

I will try to create patches for these modifications and send to ffmpeg-devel

-- 
Marcelo


More information about the FFmpeg-soc mailing list