[FFmpeg-soc] [PATCH] AMR-WB Decoder
Vitor Sessak
vitor1001 at gmail.com
Thu Sep 2 23:11:35 CEST 2010
On 09/02/2010 09:30 PM, Marcelo Galvão Póvoa wrote:
> Sorry about the delay to reply
>
> On 30 August 2010 12:39, Vitor Sessak<vitor1001 at gmail.com> wrote:
>
>> On 08/30/2010 04:44 PM, Marcelo Galvão Póvoa wrote:
>>
>>> On 30 August 2010 09:31, Vitor Sessak<vitor1001 at gmail.com> wrote:
>>>
>>>
>>>> On 08/27/2010 11:13 PM, Marcelo Galvão Póvoa wrote:
>>>>
>>>>
>>>>> On 26 August 2010 18:55, Vitor Sessak<vitor1001 at gmail.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>>> Marcelo Galvão Póvoa wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Fixed excitation array incorrect length.
>>>>>>>
>>>>>>> Should now be applicable cleanly to the latest ffmpeg revision
>>>>>>> (fd151a5f8bd152c456a).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> First look:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>> [...]
>>>>
>>>>
>>>>
>>>>>>> +/**
>>>>>>> + * Convert an ISF vector into an ISP vector
>>>>>>> + *
>>>>>>> + * @param[in] isf Isf vector
>>>>>>> + * @param[out] isp Output isp vector
>>>>>>> + * @param[in] size Isf/isp size
>>>>>>> + */
>>>>>>> +static void isf2isp(const float *isf, double *isp, int size)
>>>>>>> +{
>>>>>>> + int i;
>>>>>>> +
>>>>>>> + for (i = 0; i< size - 1; i++)
>>>>>>> + isp[i] = cos(2.0 * M_PI * isf[i]);
>>>>>>> +
>>>>>>> + isp[size - 1] = cos(4.0 * M_PI * isf[size - 1]);
>>>>>>> +}
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> Almost duplication of amrnbdec.c:lsf2lsp().
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> Yes, but this last element doubling diverges between AMR-NB and AMR-WB
>>>>> reference sources.
>>>>>
>>>>>
>>>>>
>>>> Can't you just do "isp[size - 1] *= 2" or it is used elsewhere?
>>>>
>>>>
>>>>
>>> It's not the last element output that is doubled but the input. Since
>>> it is a cosine, I would have to calculate it again for the last
>>> element I guess.
>>>
>>>
>> I mean doubling the last element of the input before calling the function,
>> if it doesn't make the rest of the code more complex.
>>
>>
> Well, I only call this function in two places, so it seems to worth
> reusing this way
>
nice
>>>> [...]
>>>>
>>>>
>>>>
>>>>>>> + for(j = i - 1; j> 1; j--)
>>>>>>> + f[j] += f[j - 1] * val + f[j - 2];
>>>>>>> + f[1] += 0.25 * val;
>>>>>>> + }
>>>>>>> +}
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> Hmm, can't this function be replaced by ff_lsp2polyf() with some
>>>>>> rescaling?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> Actually I realize now that the difference between these functions is
>>>>> just the entire output scaled down by 0.25. That was silly because I
>>>>> was scaling by 4.0 after this function. It is weird why the reference
>>>>> decoder does this process. Maybe it should worth someone checking it
>>>>> out to be sure about this.
>>>>>
>>>>>
>>>>>
>>>> I suppose you checked that after this change the output of your decoder
>>>> didn't change, no?
>>>>
>>>>
>>>>
>>> Actually I cannot compare the output bitwise because the high band
>>> involves a random excitation, I am afraid that it is not the same for
>>> each run (or it is?).
>>>
>> Yes, you always send 1 as seed, so the prng output is always the same (and
>> this is a good thing), but since you use floats output might not be
>> bitexact. I suggest you do:
>>
>> ./tests/tiny_psnr output_ref.wav output_patched.wav 2
>>
>> And see if stddev is ~ 0.1
>>
>>
> By "output_ref.wav" you mean the earlier output from my decoder or the
> fixed-point reference implementation? I did both tests comparing my
> recent decoder with a 1kHz sine wave to:
> - 3GPP:
> stddev: 340.60 PSNR: 45.68 MAXDIFF: 1130 bytes: 959360/ 959360
> - Mine before the reviews:
> stddev: 0.00 PSNR:999.99 MAXDIFF: 0 bytes: 959360/ 959360
>
> I don't know how to analyze all these values
>
stddev = 1/N sqrt(sum[i=0,...,N-1] (file1[i] - file2[i])*(file1[i] -
file2[i]))
PSNR = log(stddev)
thus,
stddev == 0 means the files are identical
stddev < 0.04 means the files are really close, just a rare +- 1 difference
stddev < 1 means the files are very close, no audible difference
But I suggest you do not test with a sine-wave, it might amplify some
problems of minor differences adding up to make a big difference.
>> [...]
>>
>>
>>> +/**
>>>
>>>>> + * Parses a speech frame, storing data in the Context
>>>>> + *
>>>>> + * @param[in,out] ctx The context
>>>>> + * @param[in] buf Pointer to the input buffer
>>>>> + * @param[in] buf_size Size of the input buffer
>>>>> + *
>>>>> + * @return The frame mode
>>>>> + */
>>>>> +static enum Mode unpack_bitstream(AMRWBContext *ctx, const uint8_t
>>>>> *buf,
>>>>> + int buf_size)
>>>>> +{
>>>>> + GetBitContext gb;
>>>>> + uint16_t *data;
>>>>> +
>>>>> + init_get_bits(&gb, buf, buf_size * 8);
>>>>> +
>>>>> + /* decode frame header (1st octet) */
>>>>> + skip_bits(&gb, 1); // padding bit
>>>>> + ctx->fr_cur_mode = get_bits(&gb, 4);
>>>>> + ctx->fr_quality = get_bits1(&gb);
>>>>> + skip_bits(&gb, 2); // padding bits
>>>>> +
>>>>> + // XXX: We are using only the "MIME/storage" format
>>>>> + // used by libopencore. This format is simpler and
>>>>> + // does not carry the auxiliary information of the frame
>>>>> +
>>>>> + data = (uint16_t *)&ctx->frame;
>>>>> + memset(data, 0, sizeof(AMRWBFrame));
>>>>> + buf++;
>>>>> +
>>>>> + if (ctx->fr_cur_mode< MODE_SID) { /* Normal speech frame */
>>>>> + const uint16_t *perm =
>>>>> amr_bit_orderings_by_mode[ctx->fr_cur_mode];
>>>>> + int field_size;
>>>>> +
>>>>> + while ((field_size = *perm++)) {
>>>>> + int field = 0;
>>>>> + int field_offset = *perm++;
>>>>> + while (field_size--) {
>>>>> + uint16_t bit_idx = *perm++;
>>>>> + field<<= 1;
>>>>> + /* The bit index inside the byte is reversed (MSB->LSB)
>>>>> */
>>>>> + field |= BIT_POS(buf[bit_idx>> 3], 7 - (bit_idx&
>>>>> 7));
>>>>> + }
>>>>> + data[field_offset] = field;
>>>>> + }
>>>>> + }
>>>>> +
>>>>> + return ctx->fr_cur_mode;
>>>>> +}
>>>>>
>>>>>
>>>>>
>>>> Can't you reorder the bit indexes in a way it is in the LSB->MSB so you
>>>> can
>>>> use just the usual bitstream functions?
>>>>
>>>>
>>>>
>>> Which bitstream functions?
>>>
>> Those in libavcodec/get_bits.h (get_bits(), etc).
>>
>>
> I am following the unpack_bitstream() from AMR-NB
>
Sorry, I remembered wrong how unpack_bitstream() was coded in amrnb.
>>> If the tables were in LSB->MSB order, it
>>> would be the same algorithm as in AMR-NB, the only difference is that
>>> I do (7 - index) to reverse the order.
>>>
>>>
>> Can't you change the tables in a way you can reuse the unpack function of
>> AMRNB?
>>
>>
> I guess so, after all it's an interesting method of reordering and
> assigning the variables
>
> So, until now we have three functions that can be reused and would
> need to move their code to an appropriate header file:
>
Not only the header, but the actual code should be moved to the
associated .c files.
> - sipr.c:lsp2lpc_sipr() -> lsp.h: Do you have a suggestion to a new
> name for this? There is already a function called ff_acelp_lsp2lpc()
>
I suggest some AMR-centric name like ff_amrwb_lsp2lpc(). SIPR probably
copied from AMRWB and not the other way around.
> - amrnbdec.c:lsf2lsp() -> lsp.h
>
Looks reasonable. I suggest ff_acelp_lsf2lspd() as name.
> - internal part of amrnbdec.c:unpack_bitstream() -> get_bits.h (I guess)
>
I think the internal part of unpack_bitstream() is still too AMR
specific to be in such widely used file. Unless someone came up with a
better idea, I'd say to create a amr.c for code shared between both.
> I will try to create patches for these modifications and send to ffmpeg-devel
>
Great!
-Vitor
More information about the FFmpeg-soc
mailing list