[FFmpeg-soc] [PATCH] AMR-WB Decoder

Fri Sep 17 21:49:30 CEST 2010

On 09/13/2010 01:37 PM, Marcelo Galvão Póvoa wrote:
> On 12 September 2010 21:00, Vitor Sessak<vitor1001 at gmail.com>  wrote:
>> On 09/13/2010 12:21 AM, Marcelo Galvão Póvoa wrote:
>>>
>>> On 12 September 2010 14:39, Vitor Sessak<vitor1001 at gmail.com>    wrote:
>>>>
>>>> On 09/12/2010 04:34 PM, Marcelo Galvão Póvoa wrote:
>>>>>
>>>>> On 12 September 2010 06:28, Vitor Sessak<vitor1001 at gmail.com>      wrote:
>>>>>>
>>>>>> On 09/12/2010 12:25 AM, Marcelo Galvão Póvoa wrote:
>>>>>>>
>>>>>>> On 11 September 2010 18:54, Vitor Sessak<vitor1001 at gmail.com>
>>>>>>>   wrote:
>>>>>>>>
>>>>>>>> On 09/11/2010 06:08 PM, Marcelo Galvão Póvoa wrote:
>>>>>>>>>
>>>>>>>>> 2010/9/11 Marcelo Galvão Póvoa<marspeoplester at gmail.com>:
>>>>>>>>>>
>>>>>>>>>> On 11 September 2010 12:37, Vitor Sessak<vitor1001 at gmail.com>
>>>>>>>>>>   wrote:
>>>>>>>>>>>
>>>>>>>>>>> On 09/11/2010 04:47 PM, Marcelo Galvão Póvoa wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On 9 September 2010 16:11, Vitor Sessak<vitor1001 at gmail.com>
>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 09/07/2010 02:05 AM, Marcelo Galvão Póvoa wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 6 September 2010 11:31, Vitor Sessak<vitor1001 at gmail.com>
>>>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 09/06/2010 02:46 AM, Marcelo Galvăo Póvoa wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> [...]
>>>>>>>>>>>
>>>>>>>>>>>>>>> Can celp_filters.c:ff_celp_circ_addf() simplify this?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I already gave some thought to that but I couldn't figure out
>>>>>>>>>>>>>> how
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> use ff_celp_circ_addf() there. I wrote the algorithm the
>>>>>>>>>>>>>> simplest
>>>>>>>>>>>>>> way
>>>>>>>>>>>>>> I could think of.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Try
>>>>>>>>>>>>>
>>>>>>>>>>>>>         for (i = 0; i<            AMRWB_SFR_SIZE; i++)
>>>>>>>>>>>>>             if (fixed_vector[i])
>>>>>>>>>>>>>                 ff_celp_circ_addf(buf, buf, coef, i,
>>>>>>>>>>>>> fixed_vector[i],
>>>>>>>>>>>>>                                   AMRWB_SFR_SIZE);
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> That's right indeed, thanks!
>>>>>>>>>>>
>>>>>>>>>>>> /*
>>>>>>>>>>>>   * AMR wideband decoder
>>>>>>>>>>>>   * Copyright (c) 2010 Marcelo Galvao Povoa
>>>>>>>>>>>>   *
>>>>>>>>>>>>   * This file is part of FFmpeg.
>>>>>>>>>>>>   *
>>>>>>>>>>>>   * FFmpeg is free software; you can redistribute it and/or
>>>>>>>>>>>>   * modify it under the terms of the GNU Lesser General Public
>>>>>>>>>>>>   * License as published by the Free Software Foundation; either
>>>>>>>>>>>>   * version 2.1 of the License, or (at your option) any later
>>>>>>>>>>>> version.
>>>>>>>>>>>>   *
>>>>>>>>>>>>   * FFmpeg is distributed in the hope that it will be useful,
>>>>>>>>>>>>   * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>>>>>>>>>>   * MERCHANTABILITY or FITNESS FOR A particular PURPOSE.  See the
>>>>>>>>>>>> GNU
>>>>>>>>>>>>   * Lesser General Public License for more details.
>>>>>>>>>>>>   *
>>>>>>>>>>>>   * You should have received a copy of the GNU Lesser General
>>>>>>>>>>>> Public
>>>>>>>>>>>>   * License along with FFmpeg; if not, write to the Free Software
>>>>>>>>>>>>   * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
>>>>>>>>>>>> 02110-1301 USA
>>>>>>>>>>>>   */
>>>>>>>>>>>>
>>>>>>>>>>>> #include "libavutil/lfg.h"
>>>>>>>>>>>>
>>>>>>>>>>>> #include "avcodec.h"
>>>>>>>>>>>> #include "get_bits.h"
>>>>>>>>>>>> #include "lsp.h"
>>>>>>>>>>>> #include "celp_math.h"
>>>>>>>>>>>> #include "celp_filters.h"
>>>>>>>>>>>> #include "acelp_filters.h"
>>>>>>>>>>>> #include "acelp_vectors.h"
>>>>>>>>>>>> #include "acelp_pitch_delay.h"
>>>>>>>>>>>>
>>>>>>>>>>>> #include "amrwbdata.h"
>>>>>>>>>>>
>>>>>>>>>>> No need to include amr.h?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Oops, fixed
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Sorry, I forgot to define the table type before the include. Fixed
>>>>>>>>
>>>>>>>> Just one last think I just saw:
>>>>>>>>
>>>>>>>>
>>>>>>>>>     ctx->fr_cur_mode = unpack_bitstream(ctx, buf, buf_size);
>>>>>>>>>     expected_fr_size = ((cf_sizes_wb[ctx->fr_cur_mode] + 7)>>        3)
>>>>>>>>> +
>>>>>>>>> 1;
>>>>>>>>>
>>>>>>>>>     if (buf_size<        expected_fr_size) {
>>>>>>>>>         av_log(avctx, AV_LOG_ERROR,
>>>>>>>>>             "Frame too small (%d bytes). Truncated file?\n",
>>>>>>>>> buf_size);
>>>>>>>>>         *data_size = 0;
>>>>>>>>>         return buf_size;
>>>>>>>>>     }
>>>>>>>>
>>>>>>>> This check is not useful like that. Imagine that buf_size == 1. Then,
>>>>>>>> unpack_bitstream will try to read buf[2] and will segfault. This
>>>>>>>> check
>>>>>>>> should be done before the call of ff_amr_bit_reorder().
>>>>>>>>
>>>>>>>
>>>>>>> I thought about that, but how I would calculate expected_fr_size if I
>>>>>>> don't know of which mode the frame is? One solution would be doing
>>>>>>> this check inside ff_amr_bit_reorder() just after decoding the header.
>>>>>>
>>>>>> Another solution would be to get rid of the unpack_bitstrem() function
>>>>>> and
>>>>>> move its code to amrwb_decode_frame(). It is already almost a wrapper
>>>>>> around
>>>>>> ff_amr_bit_reorder(). It would also allows to avoid checking
>>>>>> (ctx->fr_cur_mode<      MODE_SID) twice...
>>>>>>
>>>>>
>>>>> Interesting, but how about instead of moving all the code to
>>>>> amrwb_decode_frame(), we keep a function to decode the header since
>>>>> there are more than one type of it. I wrote the simpler (and most
>>>>> common I guess) "MIME/storage" header format but in case someone wants
>>>>> to implement the others this way would be easier. See the patch I
>>>>> attached.
>>>>
>>>> I like it. Just a last nitpick:
>>>>
>>>>> +    header_size      = decode_mime_header(ctx, buf);
>>>>> +    expected_fr_size = ((cf_sizes_wb[ctx->fr_cur_mode] + 7)>>    3) + 1;
>>>>> +
>>>>> +    if (buf_size<    expected_fr_size) {
>>>>> +        av_log(avctx, AV_LOG_ERROR,
>>>>> +            "Frame too small (%d bytes). Truncated file?\n", buf_size);
>>>>> +        *data_size = 0;
>>>>> +        return buf_size;
>>>>> +    }
>>>>> +
>>>>
>>>>> +    if (!ctx->fr_quality || ctx->fr_cur_mode>    MODE_SID) {
>>>>> +        av_log(avctx, AV_LOG_ERROR, "Encountered a bad or corrupted
>>>>> frame\n");
>>>>> +    }
>>>>
>>>> Is there any chance of decoding that frame if ctx->fr_cur_mode>
>>>>   MODE_SID?
>>>> Better to bail out with an error in this case.
>>>>
>>>>> +    if (ctx->fr_cur_mode<    MODE_SID) { /* Normal speech frame */
>>>>> +        ff_amr_bit_reorder((uint16_t *)&ctx->frame, sizeof(AMRWBFrame),
>>>>> +            buf + header_size,
>>>>> amr_bit_orderings_by_mode[ctx->fr_cur_mode]);
>>>>> +    }
>>>>> +    else if (ctx->fr_cur_mode == MODE_SID) { /* Comfort noise frame */
>>>>> +        av_log_missing_feature(avctx, "SID mode", 1);
>>>>> +        return -1;
>>>>> +    }
>>>>
>>>> I think it all be clearer in the following way:
>>>>
>>>>     header_size      = decode_mime_header(ctx, buf);
>>>>     expected_fr_size = ((cf_sizes_wb[ctx->fr_cur_mode] + 7)>>    3) + 1;
>>>>
>>>>     if (buf_size<    expected_fr_size) {
>>>>         av_log(avctx, AV_LOG_ERROR,
>>>>             "Frame too small (%d bytes). Truncated file?\n", buf_size);
>>>>         *data_size = 0;
>>>>         return buf_size;
>>>>     }
>>>>
>>>>     if (!ctx->fr_quality || ctx->fr_cur_mode>    MODE_SID)
>>>>         av_log(avctx, AV_LOG_ERROR, "Encountered a bad or corrupted
>>>> frame\n");
>>>>
>>>>     if (ctx->fr_cur_mode == MODE_SID) /* Comfort noise frame */
>>>>         av_log_missing_feature(avctx, "SID mode", 1);
>>>>
>>>>     if (ctx->fr_cur_mode>= MODE_SID)
>>>>         return -1;
>>>>
>>>>    ff_amr_bit_reorder((uint16_t *)&ctx->frame, sizeof(AMRWBFrame),
>>>>         buf + header_size, amr_bit_orderings_by_mode[ctx->fr_cur_mode]);
>>>>
>>>> In that way, the ff_amr_bit_reorder() call is not in the middle of the
>>>> error
>>>> handling. Finally, after fixing this, if Rob agree, I think you can move
>>>> this discussion to ffmpeg-devel.
>>>>
>>>
>>> Fixed.
>>>
>>> Ok, there are still some of those "XXX" notes however, did you check them?
>>
>> Most of them looks like differences between the spec and the ref decoder.
>> Such things happens all the time, and one should follow the ref decoder in
>> those cases. About voice_factor(), I understand that you tested this
>
> Ok, already did that
>
>> function thoroughly when tracking that old quality-affecting bug and found
>> no difference from the reference. Finally, about the truncation of the
>
> About voice_factor(), the values it returns are not quite right, maybe
> it can be improved. But the stddev of the output between using my
> stddev and the one from reference is small: 0.38

We can not hope of anything much better than 0.38 when comparing 
fixed-point with floats...

>> excitation, this was done in AMRNB more to reduce the differences due to our
>> implementation been floating-point based while the reference being
>> fixed-point. But for AMRNB those differences were causing audible artifacts.
>> If you want to be sure, you can check if the stddev with the pre-encoding
>> input increases if you add such a truncation (this time using the right
>> shift of -190)
>>
>
> Actually, the sample I've sent you (soc_pod.wav) was already using
> truncation. Testing with -190, this version is slightly better: 643.05
> against 644.60 I think I'll keep it.

So I'm favorable to it.

Also, please post your newest patch to -devel, let's get this reviewed 
and committed :-)

-Vitor