[FFmpeg-soc] [PATCH] AMR-WB Decoder

Vitor Sessak vitor1001 at gmail.com
Mon Aug 30 17:39:33 CEST 2010


On 08/30/2010 04:44 PM, Marcelo Galvão Póvoa wrote:
> On 30 August 2010 09:31, Vitor Sessak<vitor1001 at gmail.com>  wrote:
>    
>> On 08/27/2010 11:13 PM, Marcelo Galvão Póvoa wrote:
>>      
>>> On 26 August 2010 18:55, Vitor Sessak<vitor1001 at gmail.com>    wrote:
>>>
>>>        
>>>> Marcelo Galvão Póvoa wrote:
>>>>
>>>>          
>>>>> Fixed excitation array incorrect length.
>>>>>
>>>>> Should now be applicable cleanly to the latest ffmpeg revision
>>>>> (fd151a5f8bd152c456a).
>>>>>
>>>>>            
>>>> First look:
>>>>
>>>>
>>>>
>>>>          
>> [...]
>>
>>      
>>>>> +/**
>>>>> + * Convert an ISF vector into an ISP vector
>>>>> + *
>>>>> + * @param[in] isf                  Isf vector
>>>>> + * @param[out] isp                 Output isp vector
>>>>> + * @param[in] size                 Isf/isp size
>>>>> + */
>>>>> +static void isf2isp(const float *isf, double *isp, int size)
>>>>> +{
>>>>> +    int i;
>>>>> +
>>>>> +    for (i = 0; i<    size - 1; i++)
>>>>> +        isp[i] = cos(2.0 * M_PI * isf[i]);
>>>>> +
>>>>> +    isp[size - 1] = cos(4.0 * M_PI * isf[size - 1]);
>>>>> +}
>>>>>
>>>>>            
>>>> Almost duplication of amrnbdec.c:lsf2lsp().
>>>>
>>>>
>>>>          
>>> Yes, but this last element doubling diverges between AMR-NB and AMR-WB
>>> reference sources.
>>>
>>>        
>> Can't you just do "isp[size - 1] *=  2" or it is used elsewhere?
>>
>>      
> It's not the last element output that is doubled but the input. Since
> it is a cosine, I would have to calculate it again for the last
> element I guess.
>    

I mean doubling the last element of the input before calling the 
function, if it doesn't make the rest of the code more complex.

>> [...]
>>
>>      
>>>>> +        for(j = i - 1; j>    1; j--)
>>>>> +            f[j] += f[j - 1] * val + f[j - 2];
>>>>> +        f[1] += 0.25 * val;
>>>>> +    }
>>>>> +}
>>>>>
>>>>>            
>>>> Hmm, can't this function be replaced by ff_lsp2polyf() with some
>>>> rescaling?
>>>>
>>>>
>>>>          
>>> Actually I realize now that the difference between these functions is
>>> just the entire output scaled down by 0.25. That was silly because I
>>> was scaling by 4.0 after this function. It is weird why the reference
>>> decoder does this process. Maybe it should worth someone checking it
>>> out to be sure about this.
>>>
>>>        
>> I suppose you checked that after this change the output of your decoder
>> didn't change, no?
>>
>>      
> Actually I cannot compare the output bitwise because the high band
> involves a random excitation, I am afraid that it is not the same for
> each run (or it is?).

Yes, you always send 1 as seed, so the prng output is always the same 
(and this is a good thing), but since you use floats output might not be 
bitexact. I suggest you do:

./tests/tiny_psnr output_ref.wav output_patched.wav 2

And see if stddev is ~ 0.1

[...]

> +/**
>>> + * Parses a speech frame, storing data in the Context
>>> + *
>>> + * @param[in,out] ctx              The context
>>> + * @param[in] buf                  Pointer to the input buffer
>>> + * @param[in] buf_size             Size of the input buffer
>>> + *
>>> + * @return The frame mode
>>> + */
>>> +static enum Mode unpack_bitstream(AMRWBContext *ctx, const uint8_t *buf,
>>> +                                  int buf_size)
>>> +{
>>> +    GetBitContext gb;
>>> +    uint16_t *data;
>>> +
>>> +    init_get_bits(&gb, buf, buf_size * 8);
>>> +
>>> +    /* decode frame header (1st octet) */
>>> +    skip_bits(&gb, 1);  // padding bit
>>> +    ctx->fr_cur_mode  = get_bits(&gb, 4);
>>> +    ctx->fr_quality   = get_bits1(&gb);
>>> +    skip_bits(&gb, 2);  // padding bits
>>> +
>>> +    // XXX: We are using only the "MIME/storage" format
>>> +    // used by libopencore. This format is simpler and
>>> +    // does not carry the auxiliary information of the frame
>>> +
>>> +    data = (uint16_t *)&ctx->frame;
>>> +    memset(data, 0, sizeof(AMRWBFrame));
>>> +    buf++;
>>> +
>>> +    if (ctx->fr_cur_mode<    MODE_SID) { /* Normal speech frame */
>>> +        const uint16_t *perm =
>>> amr_bit_orderings_by_mode[ctx->fr_cur_mode];
>>> +        int field_size;
>>> +
>>> +        while ((field_size = *perm++)) {
>>> +            int field = 0;
>>> +            int field_offset = *perm++;
>>> +            while (field_size--) {
>>> +               uint16_t bit_idx = *perm++;
>>> +               field<<= 1;
>>> +               /* The bit index inside the byte is reversed (MSB->LSB) */
>>> +               field |= BIT_POS(buf[bit_idx>>    3], 7 - (bit_idx&    7));
>>> +            }
>>> +            data[field_offset] = field;
>>> +        }
>>> +    }
>>> +
>>> +    return ctx->fr_cur_mode;
>>> +}
>>>
>>>        
>> Can't you reorder the bit indexes in a way it is in the LSB->MSB so you can
>> use just the usual bitstream functions?
>>
>>      
> Which bitstream functions?

Those in libavcodec/get_bits.h (get_bits(), etc).

> If the tables were in LSB->MSB order, it
> would be the same algorithm as in AMR-NB, the only difference is that
> I do (7 - index) to reverse the order.
>    

Can't you change the tables in a way you can reuse the unpack function 
of AMRNB?

-Vitor


More information about the FFmpeg-soc mailing list