[FFmpeg-soc] [PATCH] AMR-WB Decoder

Mon Sep 13 13:37:19 CEST 2010

On 12 September 2010 21:00, Vitor Sessak <vitor1001 at gmail.com> wrote:
> On 09/13/2010 12:21 AM, Marcelo Galvão Póvoa wrote:
>>
>> On 12 September 2010 14:39, Vitor Sessak<vitor1001 at gmail.com>  wrote:
>>>
>>> On 09/12/2010 04:34 PM, Marcelo Galvão Póvoa wrote:
>>>>
>>>> On 12 September 2010 06:28, Vitor Sessak<vitor1001 at gmail.com>    wrote:
>>>>>
>>>>> On 09/12/2010 12:25 AM, Marcelo Galvão Póvoa wrote:
>>>>>>
>>>>>> On 11 September 2010 18:54, Vitor Sessak<vitor1001 at gmail.com>
>>>>>>  wrote:
>>>>>>>
>>>>>>> On 09/11/2010 06:08 PM, Marcelo Galvão Póvoa wrote:
>>>>>>>>
>>>>>>>> 2010/9/11 Marcelo Galvão Póvoa<marspeoplester at gmail.com>:
>>>>>>>>>
>>>>>>>>> On 11 September 2010 12:37, Vitor Sessak<vitor1001 at gmail.com>
>>>>>>>>>  wrote:
>>>>>>>>>>
>>>>>>>>>> On 09/11/2010 04:47 PM, Marcelo Galvão Póvoa wrote:
>>>>>>>>>>>
>>>>>>>>>>> On 9 September 2010 16:11, Vitor Sessak<vitor1001 at gmail.com>
>>>>>>>>>>>  wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On 09/07/2010 02:05 AM, Marcelo Galvão Póvoa wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 6 September 2010 11:31, Vitor Sessak<vitor1001 at gmail.com>
>>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 09/06/2010 02:46 AM, Marcelo Galvăo Póvoa wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> [...]
>>>>>>>>>>
>>>>>>>>>>>>>> Can celp_filters.c:ff_celp_circ_addf() simplify this?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I already gave some thought to that but I couldn't figure out
>>>>>>>>>>>>> how
>>>>>>>>>>>>> to
>>>>>>>>>>>>> use ff_celp_circ_addf() there. I wrote the algorithm the
>>>>>>>>>>>>> simplest
>>>>>>>>>>>>> way
>>>>>>>>>>>>> I could think of.
>>>>>>>>>>>>
>>>>>>>>>>>> Try
>>>>>>>>>>>>
>>>>>>>>>>>>        for (i = 0; i<          AMRWB_SFR_SIZE; i++)
>>>>>>>>>>>>            if (fixed_vector[i])
>>>>>>>>>>>>                ff_celp_circ_addf(buf, buf, coef, i,
>>>>>>>>>>>> fixed_vector[i],
>>>>>>>>>>>>                                  AMRWB_SFR_SIZE);
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> That's right indeed, thanks!
>>>>>>>>>>
>>>>>>>>>>> /*
>>>>>>>>>>>  * AMR wideband decoder
>>>>>>>>>>>  * Copyright (c) 2010 Marcelo Galvao Povoa
>>>>>>>>>>>  *
>>>>>>>>>>>  * This file is part of FFmpeg.
>>>>>>>>>>>  *
>>>>>>>>>>>  * FFmpeg is free software; you can redistribute it and/or
>>>>>>>>>>>  * modify it under the terms of the GNU Lesser General Public
>>>>>>>>>>>  * License as published by the Free Software Foundation; either
>>>>>>>>>>>  * version 2.1 of the License, or (at your option) any later
>>>>>>>>>>> version.
>>>>>>>>>>>  *
>>>>>>>>>>>  * FFmpeg is distributed in the hope that it will be useful,
>>>>>>>>>>>  * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>>>>>>>>>  * MERCHANTABILITY or FITNESS FOR A particular PURPOSE.  See the
>>>>>>>>>>> GNU
>>>>>>>>>>>  * Lesser General Public License for more details.
>>>>>>>>>>>  *
>>>>>>>>>>>  * You should have received a copy of the GNU Lesser General
>>>>>>>>>>> Public
>>>>>>>>>>>  * License along with FFmpeg; if not, write to the Free Software
>>>>>>>>>>>  * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
>>>>>>>>>>> 02110-1301 USA
>>>>>>>>>>>  */
>>>>>>>>>>>
>>>>>>>>>>> #include "libavutil/lfg.h"
>>>>>>>>>>>
>>>>>>>>>>> #include "avcodec.h"
>>>>>>>>>>> #include "get_bits.h"
>>>>>>>>>>> #include "lsp.h"
>>>>>>>>>>> #include "celp_math.h"
>>>>>>>>>>> #include "celp_filters.h"
>>>>>>>>>>> #include "acelp_filters.h"
>>>>>>>>>>> #include "acelp_vectors.h"
>>>>>>>>>>> #include "acelp_pitch_delay.h"
>>>>>>>>>>>
>>>>>>>>>>> #include "amrwbdata.h"
>>>>>>>>>>
>>>>>>>>>> No need to include amr.h?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Oops, fixed
>>>>>>>>>
>>>>>>>>
>>>>>>>> Sorry, I forgot to define the table type before the include. Fixed
>>>>>>>
>>>>>>> Just one last think I just saw:
>>>>>>>
>>>>>>>
>>>>>>>>    ctx->fr_cur_mode = unpack_bitstream(ctx, buf, buf_size);
>>>>>>>>    expected_fr_size = ((cf_sizes_wb[ctx->fr_cur_mode] + 7)>>      3)
>>>>>>>> +
>>>>>>>> 1;
>>>>>>>>
>>>>>>>>    if (buf_size<      expected_fr_size) {
>>>>>>>>        av_log(avctx, AV_LOG_ERROR,
>>>>>>>>            "Frame too small (%d bytes). Truncated file?\n",
>>>>>>>> buf_size);
>>>>>>>>        *data_size = 0;
>>>>>>>>        return buf_size;
>>>>>>>>    }
>>>>>>>
>>>>>>> This check is not useful like that. Imagine that buf_size == 1. Then,
>>>>>>> unpack_bitstream will try to read buf[2] and will segfault. This
>>>>>>> check
>>>>>>> should be done before the call of ff_amr_bit_reorder().
>>>>>>>
>>>>>>
>>>>>> I thought about that, but how I would calculate expected_fr_size if I
>>>>>> don't know of which mode the frame is? One solution would be doing
>>>>>> this check inside ff_amr_bit_reorder() just after decoding the header.
>>>>>
>>>>> Another solution would be to get rid of the unpack_bitstrem() function
>>>>> and
>>>>> move its code to amrwb_decode_frame(). It is already almost a wrapper
>>>>> around
>>>>> ff_amr_bit_reorder(). It would also allows to avoid checking
>>>>> (ctx->fr_cur_mode<    MODE_SID) twice...
>>>>>
>>>>
>>>> Interesting, but how about instead of moving all the code to
>>>> amrwb_decode_frame(), we keep a function to decode the header since
>>>> there are more than one type of it. I wrote the simpler (and most
>>>> common I guess) "MIME/storage" header format but in case someone wants
>>>> to implement the others this way would be easier. See the patch I
>>>> attached.
>>>
>>> I like it. Just a last nitpick:
>>>
>>>> +    header_size      = decode_mime_header(ctx, buf);
>>>> +    expected_fr_size = ((cf_sizes_wb[ctx->fr_cur_mode] + 7)>>  3) + 1;
>>>> +
>>>> +    if (buf_size<  expected_fr_size) {
>>>> +        av_log(avctx, AV_LOG_ERROR,
>>>> +            "Frame too small (%d bytes). Truncated file?\n", buf_size);
>>>> +        *data_size = 0;
>>>> +        return buf_size;
>>>> +    }
>>>> +
>>>
>>>> +    if (!ctx->fr_quality || ctx->fr_cur_mode>  MODE_SID) {
>>>> +        av_log(avctx, AV_LOG_ERROR, "Encountered a bad or corrupted
>>>> frame\n");
>>>> +    }
>>>
>>> Is there any chance of decoding that frame if ctx->fr_cur_mode>
>>>  MODE_SID?
>>> Better to bail out with an error in this case.
>>>
>>>> +    if (ctx->fr_cur_mode<  MODE_SID) { /* Normal speech frame */
>>>> +        ff_amr_bit_reorder((uint16_t *)&ctx->frame, sizeof(AMRWBFrame),
>>>> +            buf + header_size,
>>>> amr_bit_orderings_by_mode[ctx->fr_cur_mode]);
>>>> +    }
>>>> +    else if (ctx->fr_cur_mode == MODE_SID) { /* Comfort noise frame */
>>>> +        av_log_missing_feature(avctx, "SID mode", 1);
>>>> +        return -1;
>>>> +    }
>>>
>>> I think it all be clearer in the following way:
>>>
>>>    header_size      = decode_mime_header(ctx, buf);
>>>    expected_fr_size = ((cf_sizes_wb[ctx->fr_cur_mode] + 7)>>  3) + 1;
>>>
>>>    if (buf_size<  expected_fr_size) {
>>>        av_log(avctx, AV_LOG_ERROR,
>>>            "Frame too small (%d bytes). Truncated file?\n", buf_size);
>>>        *data_size = 0;
>>>        return buf_size;
>>>    }
>>>
>>>    if (!ctx->fr_quality || ctx->fr_cur_mode>  MODE_SID)
>>>        av_log(avctx, AV_LOG_ERROR, "Encountered a bad or corrupted
>>> frame\n");
>>>
>>>    if (ctx->fr_cur_mode == MODE_SID) /* Comfort noise frame */
>>>        av_log_missing_feature(avctx, "SID mode", 1);
>>>
>>>    if (ctx->fr_cur_mode>= MODE_SID)
>>>        return -1;
>>>
>>>   ff_amr_bit_reorder((uint16_t *)&ctx->frame, sizeof(AMRWBFrame),
>>>        buf + header_size, amr_bit_orderings_by_mode[ctx->fr_cur_mode]);
>>>
>>> In that way, the ff_amr_bit_reorder() call is not in the middle of the
>>> error
>>> handling. Finally, after fixing this, if Rob agree, I think you can move
>>> this discussion to ffmpeg-devel.
>>>
>>
>> Fixed.
>>
>> Ok, there are still some of those "XXX" notes however, did you check them?
>
> Most of them looks like differences between the spec and the ref decoder.
> Such things happens all the time, and one should follow the ref decoder in
> those cases. About voice_factor(), I understand that you tested this

Ok, already did that

> function thoroughly when tracking that old quality-affecting bug and found
> no difference from the reference. Finally, about the truncation of the

About voice_factor(), the values it returns are not quite right, maybe
it can be improved. But the stddev of the output between using my
stddev and the one from reference is small: 0.38

> excitation, this was done in AMRNB more to reduce the differences due to our
> implementation been floating-point based while the reference being
> fixed-point. But for AMRNB those differences were causing audible artifacts.
> If you want to be sure, you can check if the stddev with the pre-encoding
> input increases if you add such a truncation (this time using the right
> shift of -190)
>

Actually, the sample I've sent you (soc_pod.wav) was already using
truncation. Testing with -190, this version is slightly better: 643.05
against 644.60 I think I'll keep it.

-- 
Marcelo