[FFmpeg-devel] [PATCH] G.722 decoder

Mon Aug 23 11:49:50 CEST 2010

On Fri, 6 Aug 2010, Martin Storsj? wrote:

> Hi,
> 
> The G.722 decoder patch that Kenan Gillet took a far way through the 
> review process last year seemed to have stalled, I'm trying to pick it up 
> and get it committed.
> 
> On Fri, 10 Apr 2009, Michael Niedermayer wrote:
> 
> > On Thu, Apr 09, 2009 at 08:14:58AM -0700, Kenan Gillet wrote:
> > [...]
> > > + * @note For the 56000bps and 48000bps bitrates, the respectively 7 and 6 bits
> > > + *       codeword might be packed, so unpacking might be needed either
> > > + *       internally or as a separate parser.
> > 
> > do non packed cases exist anywhere in the wild?
> > do packed cases exist anywhere in the wild?
> > 
> > it seems to me that filling 30% of the bits with 0 is not particlarely
> > likely in a real world environment
> 
> According to the spec, the point of the lower bitrate versions is to be 
> able to use 1 or 2 bits per byte as an auxillary data channel. The encoder 
> always outputs the 64 kbps mode, but you can choose to overwrite the 
> lowest two bits if you want to.
> 
> The decoder is able to cope with these bits being overwritten with random 
> bits, but you get slightly better results if you tell the decoder that 
> this has happened and so it should disregard those bits.
> 
> With that in mind, as far as I can interpret the spec, it's the lowest 
> bits that should be skipped for those modes, but the source for this 
> decoder seems to think otherwise. The implementation in spandsp (that 
> has some common heritage with this code base, iirc, does things this 
> way, too.
> 
> > > +    /**
> > > +     * The band[0] and band[1] correspond respectively to the lower band and higher band.
> > > +     */
> > > +    struct G722Band {
> > 
> > > +        int16_t s_predictor;         ///< predictor output value
> > 
> > ok that makes sense
> > 
> > 
> > > +        int32_t s_zero;              ///< zero section output signal
> > 
> > what is that?
> > 
> > 
> > > +        int8_t  part_reconst_mem[2]; ///< partially reconstructed signal memory
> > 
> > and that?
> 
> Tried to clarify these doxy comments, to the best of my understanding of 
> the spec.
> 
> > > +        int16_t prev_qtzd_reconst;   ///< previous quantized reconstructed signal
> > 
> > this is not true, this uses a different dequantization table in some cases
> > i even wonder if this is a bug
> 
> No, it's explicitly stated that it should be done this way in the spec. I 
> figure it's part of the "scalability" feature - the predictor updating is 
> always done using the 4-bit dequantizer so that it evolves in the same way 
> regardless of the mode it is decoded in. Each individual output sample is 
> calculated using the more precise dequantizer if that is chosen, but the 
> predictor is always updated using the lower precision dequantizer.
> 
> > 
> > > +        int16_t pole_mem[2];         ///< second-order pole section coefficient buffer
> > > +        int32_t diff_mem[6];         ///< quantizer difference signal memory
> > > +        int16_t zero_mem[6];         ///< Seventh-order zero section coefficient buffer
> > 
> > > +        int16_t log_factor;          ///< delayed logarithmic quantizer factor
> > 
> > log what? log_e log_2 log_10 ?
> 
> log_2, clarified
> 
> > 
> > > +        int16_t scale_factor;        ///< delayed quantizer scale factor
> > > +    } band[2];
> > > +} G722Context;
> > > +
> > > +
> > > +static const int8_t sign_lookup[2] = { -1, 1 };
> > > +
> > > +static const int16_t ilb[32] = {
> > > +  2048, 2093, 2139, 2186, 2233, 2282, 2332, 2383,
> > > +  2435, 2489, 2543, 2599, 2656, 2714, 2774, 2834,
> > > +  2896, 2960, 3025, 3091, 3158, 3228, 3298, 3371,
> > > +  3444, 3520, 3597, 3676, 3756, 3838, 3922, 4008
> > > +};
> > 
> > > +static const int16_t wh[2]   = { 798, -214 };
> > 
> > 2 letters is better than 1 ... well sometimes its not much better though
> 
> Tried to give this table a better name
> 
> > > +static const int16_t qm2[4]  = { -7408, -1616,  7408,   1616 };
> > 
> > > +/**
> > > + * qm3[index] == wl[rl42[index]]
> > > + */
> > > +static const int16_t qm3[16] = {
> > > +   -60, 3042, 1198, 538, 334, 172,  58, -30,
> > > +  3042, 1198,  538, 334, 172,  58, -30, -60
> > > +};
> > > +static const int16_t qm4[16] = {
> > > +      0, -20456, -12896,  -8968, -6288,  -4240,  -2584,  -1200,
> > > +  20456,  12896,   8968,  6288,   4240,   2584,   1200,      0
> > > +};
> > 
> > what is qm?
> > i know these are dequantization tables, and at that poorly designed ones
> > (yeah the duplicate entries)
> > but i cant relate "qm" to "dequantization"
> 
> Tried to give better names to these tables, too
> 
> > > +
> > > +/**
> > > + * quadrature mirror filter (QMF) coefficients
> > > + *
> > > + * ITU-T G.722 Table 11
> > > + */
> > > +static const int16_t qmf_coeffs[12] = {
> > > +  3, -11, 12, 32, -210, 951, 3876, -805, 362, -156, 53, -11,
> > > +};
> > > +
> > > +
> > 
> > > +/**
> > > + * adaptive predictor
> > > + *
> > > + * @note On x86 using the MULL macro in a loop is slower than not using the macro.
> > > + */
> > > +static void do_adaptive_prediction(struct G722Band *band, const int cur_diff)
> > 
> > missing doxy for cur_diff
> 
> Added some sort of doxy for this parameter
> 
> > [...]
> > > +static int inline scale(const int log_factor, int shift)
> > > +{
> > > +    const int wd1 = ilb[(log_factor >> 6) & 31];
> > > +    shift -= log_factor >> 11;
> > > +    return (shift < 0 ? wd1 << -shift : wd1 >> shift) << 2;
> > > +}
> > 
> > i belive we already have some integer exp2()
> 
> Didn't find any common method to do shifts where the shift can be 
> negative - only found av_shr_i in libavutil/integer.h, that works on 
> arbitrary precision integers.
> 
> > also this one lacks documentation and a name that is related to what it 
> > does if there is a reason why common code cannot be used
> 
> Renamed it to a slightly more descriptive name
> 
> > [...]
> > > +static void apply_qmf(int16_t *prev_samples, int *xout1, int *xout2)
> > > +{
> > > +    int i;
> > > +
> > > +    *xout1 = 0;
> > > +    *xout2 = 0;
> > > +    for (i = 0;  i < 12;  i++) {
> > > +        MAC16(*xout2, prev_samples[2*i  ], qmf_coeffs[i   ]);
> > > +        MAC16(*xout1, prev_samples[2*i+1], qmf_coeffs[11-i]);
> > > +    }
> > 
> > > +    memmove(prev_samples, prev_samples + 2, 22*sizeof(prev_samples[0]));
> > 
> > please find a solution that does not need to move the array by 2 samples
> > after each 2 samples.
> 
> Changed to use a longer internal buffer for these samples so that memmoves 
> are done much more seldom (every 1024 samples).
> 
> > [...]
> > > +static int g722_decode_frame(AVCodecContext *avctx, void *data, int *data_size, AVPacket *avpkt)
> > > +{
> > > +    G722Context *c = avctx->priv_data;
> > > +    int16_t *out_buf = data;
> > > +    const uint8_t *buf = avpkt->data;
> > > +    int j, out_len = 0;
> > > +    const int shift = 8 - c->bits_per_sample;
> > > +    const int16_t *quantizer_table = qms[shift];
> > > +
> > > +    for (j = 0;  j < avpkt->size; j++) {
> > > +        const int ilow = buf[j] & (0x3F >> shift);
> > > +        const int rlow = av_clip(MULL(c->band[0].scale_factor, quantizer_table[ilow], 15) +
> > > +                                 c->band[0].s_predictor, -16384, 16383);
> > > +
> > > +        update_low_predictor(&c->band[0], ilow >> (2 - shift));
> > > +
> > > +        if (avctx->sample_rate == 16000) {
> > > +            const int ihigh = (buf[j] >> (6 - shift)) &  0x03;
> > 
> > all that looks a litte obfuscated, get_bits() seems like the proper choice
> 
> Changed to use get_bits

Ping

// Martin