[FFmpeg-devel] [PATCH] updated LGPL AC-3 decoder

Fri May 4 06:46:25 CEST 2007

Michael Niedermayer wrote:
> Hi
> 
> On Thu, May 03, 2007 at 12:42:09AM -0400, Justin Ruggles wrote:
> 
>>Hi,
>>
>>First of all, I don't have the original emails for this thread, so I'm
>>starting a new thread with the same name...sorry.
>>
>>Here is an update of the AC-3 decoder patch.  I've included most of
>>Michael's suggestions.  Also, the downmixing is much simpler now.
>>
>>I think this should be ready for merging after another review.  A
>>codec-independent channel reordering and downmixing patch can always be
>>added later for more advanced mixing.  For now, the downmixing
>>functionality mirrors that of the liba52 and dca decoders.
> 
> [...]
> 
> 
>>+/** adjustments in dB gain */
>>+#define LEVEL_ZERO              (0.0000000000000000f)
>>+#define LEVEL_ONE               (1.0000000000000000f)
>>+#define LEVEL_MINUS_3DB         (0.7071067811865476f) /* sqrt(2)/2 */
>>+#define LEVEL_MINUS_4POINT5DB   (0.5946035575013605f)
>>+#define LEVEL_MINUS_6DB         (0.5000000000000000f)
>>+#define LEVEL_MINUS_9DB         (0.3548133892335755f)
> 
> 
> are you sure these are correct?
> 
> LEVEL_MINUS_xDB 2^(-x/6)
> matches all except LEVEL_MINUS_9DB

You're right.  I was forgetting that the decibel values are an
approximation.  The above value is correct for -9dB, but it's actually
supposed to be for about -9.0309 dB, which is sqrt(2)/4.  I'll change it
and add a comment as well.

> 
> [...]
> 
> 
>>+static inline float
>>+symmetric_dequant(int code, int levels)
>>+{
>>+    float m = (code - (levels >> 1)) << 1;
>>+    return (m / levels);
>>+}
> 
> 
> as level is always a constant this could be simplified to
> 
> return (code - (levels >> 1)) * (2.0/levels);
> 
> gcc should here be able to change levels >> 1 and 2.0/levels to
> constants

I almost get it...  Is it because the function is inlined and all the
function calls use constants for levels that gcc will be smart enough to
know to treat it this way?

I can also create tables for the bap=3 and bap=5 dequantization so that
this function will only need to be called during initialization.

> 
> 
>>+
>>+static void ac3_decoder_tables_init(void)
>>+{
>>+    int i;
>>+
>>+    /* generate mantissa tables */
>>+    for(i=0; i<32; i++) {
>>+        b1_mantissas[i][0] = symmetric_dequant( i / 9     , 3);
>>+        b1_mantissas[i][1] = symmetric_dequant((i % 9) / 3, 3);
>>+        b1_mantissas[i][2] = symmetric_dequant((i % 9) % 3, 3);
>>+    }
>>+    for(i=0; i<128; i++) {
>>+        b2_mantissas[i][0] = symmetric_dequant( i / 25     , 5);
>>+        b2_mantissas[i][1] = symmetric_dequant((i % 25) / 5, 5);
>>+        b2_mantissas[i][2] = symmetric_dequant((i % 25) % 5, 5);
>>+    }
>>+    for(i=0; i<128; i++) {
>>+        b4_mantissas[i][0] = symmetric_dequant(i / 11, 11);
>>+        b4_mantissas[i][1] = symmetric_dequant(i % 11, 11);
>>+    }
>>+
>>+    /* generate dynamic range table */
>>+    for(i=0; i<256; i++) {
>>+        int v = i >> 5;
>>+        if(v > 3) v -= 7;
>>+        else v++;
>>+        dynrng_tbl[i] = powf(2.0f, v);
>>+        v = (i & 0x1F) | 0x20;
>>+        dynrng_tbl[i] *= v / 64.0f;
>>+    }
>>+
>>+    /* generate scale factors */
>>+    for(i=0; i<25; i++)
>>+        scale_factors[i] = pow(2.0, -i);
>>+
>>+    /* generate exponent tables */
>>+    for(i=0; i<128; i++) {
>>+        exp_ungroup_tbl[i][0] =  i / 25;
>>+        exp_ungroup_tbl[i][1] = (i % 25) / 5;
>>+        exp_ungroup_tbl[i][2] = (i % 25) % 5;
>>+    }
> 
> 
> all the loops with 128 iteratons could be merged though iam unsure if this
> would be a good idea, maybe its clearer if its left as it is

It's just in the init function, so it's not speed-critical.  But I could
maybe merge the 2 mantissa table loops and add comments indicating the
bap=2 vs. bap=4 table.  I think the exponents table should be separate
though.

> 
> [...]
> 
>>+/**
>>+ * Parses the 'sync info' and 'bit stream info' from the AC-3 bitstream.
>>+ * GetBitContext within AC3DecodeContext must point to start of the
>>+ * synchronized AC-3 bitstream.
>>+ */
>>+static int parse_header(AC3DecodeContext *ctx)
>>+{
>>+    AC3HeaderInfo hdr;
>>+    GetBitContext *gb = &ctx->gb;
>>+    int err, i;
>>+    float cmix, smix, smix1r;
>>+
>>+    err = ff_ac3_parse_header(gb->buffer, &hdr);
>>+    if(err)
>>+        return err;
>>+
>>+    /* get decoding parameters from header info */
>>+    ctx->crc1 = hdr.crc1;
>>+    ctx->fscod = hdr.fscod;
>>+    ctx->bit_alloc_params.fscod = hdr.fscod;
>>+    ctx->acmod = hdr.acmod;
>>+    ctx->cmixlev = hdr.cmixlev;
>>+    ctx->surmixlev = hdr.surmixlev;
>>+    ctx->lfeon = hdr.lfeon;
>>+    ctx->halfratecod = hdr.halfratecod;
>>+    ctx->bit_alloc_params.halfratecod = hdr.halfratecod;
>>+    ctx->sample_rate = hdr.sample_rate;
>>+    ctx->bit_rate = hdr.bit_rate / 1000;
>>+    ctx->nchans = hdr.channels;
>>+    ctx->nfchans = ctx->nchans - ctx->lfeon;
>>+    ctx->lfe_channel = ctx->nfchans;
>>+    ctx->cpl_channel = ctx->lfe_channel+1;
>>+    ctx->frame_size = hdr.frame_size;
>>+    ctx->output_mode = nfchans_tbl[ctx->acmod];
> 
> 
> this could be vertically aligned like:
> 
> ctx->crc1                   = hdr.crc1;
> ctx->fscod                  = hdr.fscod;
> ctx->bit_alloc_params.fscod = hdr.fscod;
> ctx->acmod                  = hdr.acmod;

ok. I do like that better.

> 
> [...]
> 
>>+static void do_bit_allocation(AC3DecodeContext *ctx, int ch)
>>+{
>>+    int start=0, end=0;
>>+
>>+    if(ch == ctx->cpl_channel) {
>>+        start = ctx->cplstrtmant;
>>+        end = ctx->cplendmant;
>>+    } else {
>>+        end = ctx->endmant[ch];
>>+    }
> 
> 
> iam tempted to suggest to set
> ctx->endmant[ ctx->cpl_channel ] = ctx->cplendmant;
> and then simplify the related code
> whats your oppinion about that?

interesting idea.  I'll give it a try.

> 
> [...]
> 
>>+    subbnd = 0;
>>+    for(i=ctx->cplstrtmant,bnd=0; i<ctx->cplendmant; bnd++) {
>>+        do {
>>+            for(j=0; j<12; j++) {
>>+                for(ch=0; ch<ctx->nfchans; ch++) {
>>+                if(ctx->chincpl[ch]) {
>>+                    ctx->transform_coeffs[ch][i] = ctx->cpl_coeffs[i] *
>>+                                                   ctx->cplco[ch][bnd];
>>+                    }
>>+                }
> 
> 
> the if() is idented wrongly

yeah, that's pretty ugly...

> 
> [...]
> 
>>+        v0 = 0.0f;
>>+        for(j=0; j<nfchans; j++) {
>>+            v0 += samples[j][i] * coef[j][0];
>>+        }
>>+        v1 = 0.0f;
>>+        for(j=0; j<nfchans; j++) {
>>+            v1 += samples[j][i] * coef[j][1];
>>+        }
> 
> 
> the 2 loops can be merged

ok.

> 
> [...]
> 
>>+    /*block switch flag */
> 
> 
> nitpick, missing space between * and b :)

thanks. it's hard to find all these.

> 
> 
>>+    memset(ctx->blksw, 0, sizeof(ctx->blksw));
>>+    for(ch=0; ch<nfchans; ch++)
>>+        ctx->blksw[ch] = get_bits1(gb);
>>+
>>+    /* dithering flag */
>>+    memset(ctx->dithflag, 0, sizeof(ctx->dithflag));
>>+    for(ch=0; ch<nfchans; ch++)
>>+        ctx->dithflag[ch] = get_bits1(gb);
> 
> 
> are all these memsets really needed?
> 

nope. I'll remove them.

> [...]
> 
>>+                    av_log(NULL, AV_LOG_ERROR, "chbwcod = %d > 60", chbwcod);
> 
> 
> please use some context, NULL is bad if there are several decoders

ok.

> [...]
> 
>>+    /* coupling exponents */
>>+    if(ctx->cplinu && ctx->expstr[ctx->cpl_channel] != EXP_REUSE) {
>>+        int cplabsexp = get_bits(gb, 4) << 1;
>>+        int grpsize = 3 << (ctx->expstr[ctx->cpl_channel] - 1);
>>+        int ngrps = (ctx->cplendmant - ctx->cplstrtmant) / grpsize;
>>+        dexps = ctx->dexps[ctx->cpl_channel];
>>+        decode_exponents(gb, ctx->expstr[ctx->cpl_channel], ngrps, cplabsexp,
>>+                         dexps + ctx->cplstrtmant);
>>+    }
>>+    /* fbw & lfe exponents */
>>+    for(ch=0; ch<ctx->nchans; ch++) {
>>+        if(ctx->expstr[ch] != EXP_REUSE) {
>>+            int ngrps = 2;
>>+            if(ch != ctx->lfe_channel) {
>>+                int grpsize = 3 << (ctx->expstr[ch] - 1);
>>+                ngrps = (ctx->endmant[ch] + grpsize - 4) / grpsize;
>>+            }
>>+            dexps = ctx->dexps[ch];
>>+            dexps[0] = get_bits(gb, 4);
>>+            decode_exponents(gb, ctx->expstr[ch], ngrps, dexps[0], dexps + 1);
>>+            if(ch != ctx->lfe_channel)
>>+                skip_bits(gb, 2); // skip gainrng
>>+        }
>>+    }
> 
> 
> maybe these can be merged?

You might not like the results, but I'll try it.  The coupling absolute
exponent and number of exponent groups are handled differently than in
the full bandwidth channels.

> 
> [...]
> 
>>+    /* snroffset */
>>+    if(get_bits1(gb)) {
>>+        /* coarse snr offset */
>>+        ctx->csnroffst = get_bits(gb, 6);
>>+
>>+        /* fine snr offsets and fast gain codes */
>>+        if(ctx->cplinu){
>>+            ctx->fsnroffst[ctx->cpl_channel] = get_bits(gb, 4);
>>+            ctx->fgaincod [ctx->cpl_channel] = get_bits(gb, 3);
>>+        }
>>+        for(ch=0; ch<ctx->nchans; ch++) {
>>+            ctx->fsnroffst[ch] = get_bits(gb, 4);
>>+            ctx->fgaincod [ch] = get_bits(gb, 3);
>>+        }
>>+        memset(ctx->bit_alloc_stages, 3, sizeof(ctx->bit_alloc_stages));
>>+    }
>>+
>>+    /* coupling leak information */
>>+    if(ctx->cplinu && get_bits1(gb)) {
>>+        ctx->bit_alloc_params.cplfleak = get_bits(gb, 3);
>>+        ctx->bit_alloc_params.cplsleak = get_bits(gb, 3);
>>+        ctx->bit_alloc_stages[ctx->cpl_channel] = FFMAX(ctx->bit_alloc_stages[ctx->cpl_channel], 2);
>>+    }
>>+
>>+    /* delta bit allocation information */
>>+    if(get_bits1(gb)) {
>>+        /* bit allocation exists (new/reuse/none) */
>>+        if(ctx->cplinu) {
>>+            ctx->deltbae[ctx->cpl_channel] = get_bits(gb, 2);
>>+        }
>>+        for(ch=0; ch<nfchans; ch++) {
>>+            ctx->deltbae[ch] = get_bits(gb, 2);
>>+        }
>>+
>>+        /* delta offset, len and bit allocation */
>>+        if(ctx->cplinu) {
>>+            ch = ctx->cpl_channel;
>>+            if(ctx->deltbae[ch] == DBA_NEW) {
>>+                ctx->deltnseg[ch] = get_bits(gb, 3);
>>+                for(seg=0; seg<=ctx->deltnseg[ch]; seg++) {
>>+                    ctx->deltoffst[ch][seg] = get_bits(gb, 5);
>>+                    ctx->deltlen[ch][seg] = get_bits(gb, 4);
>>+                    ctx->deltba[ch][seg] = get_bits(gb, 3);
>>+                }
>>+            }
>>+            ctx->bit_alloc_stages[ch] = FFMAX(ctx->bit_alloc_stages[ch], 2);
>>+        }
>>+        for(ch=0; ch<nfchans; ch++) {
>>+            if(ctx->deltbae[ch] == DBA_NEW) {
>>+                ctx->deltnseg[ch] = get_bits(gb, 3);
>>+                for(seg=0; seg<=ctx->deltnseg[ch]; seg++) {
>>+                    ctx->deltoffst[ch][seg] = get_bits(gb, 5);
>>+                    ctx->deltlen[ch][seg] = get_bits(gb, 4);
>>+                    ctx->deltba[ch][seg] = get_bits(gb, 3);
>>+                }
>>+            }
>>+            ctx->bit_alloc_stages[ch] = FFMAX(ctx->bit_alloc_stages[ch], 2);
>>+        }
> 
> 
> why not put the cpl_channel at ch=-1 position? maybe ive already asked that
> i dont remember exactly ...

It ends up making things more complex overall for the simplicity gain in
a few places.  The coupling channel is sometimes handled before full
channels, but other times it comes after the first coupled channel.
Also, there are some cases where it has to be treated differently
anyway, as with the exponent decoding.  The case above with delta bit
allocation is the only place I really dislike how the current way works.

But I'll try it out just to see what the results look like.

> 
> [...]
> 
>>+    /* apply dynamic range scaling */
>>+    for(ch=0; ch<ctx->nchans; ch++) {
>>+        float gain = 1.0f;
>>+        if(ctx->acmod == AC3_ACMOD_DUALMONO) {
>>+            gain = 2.0f * ctx->dynrng[ch];
>>+        } else {
>>+            gain = 2.0f * ctx->dynrng[0];
>>+        }
> 
> 
> hmmmm, setting gain to 1.0 and then setting it to another value, the
> initial set is useless

yeah, I tend to do that by habit to shut the compiler up...  I guess it
could be set to 2.0f and then do gain *= ctx->dynrng[XX].

> 
> [...]
> 
>>+    /* convert float to 16-bit integer */
>>+    for(ch=0; ch<ctx->out_channels; ch++) {
>>+        for(i=0; i<256; i++) {
>>+            ctx->output[ch][i] = ctx->output[ch][i] * ctx->mul_bias +
>>+                                 ctx->add_bias;
>>+        }
>>+        ctx->dsp.float_to_int16(ctx->int_output[ch], ctx->output[ch], 256);
> 
> 
> this isnt exactly ideal
> the bias should be applied in some more efficient way like multiplying some
> scaling coefficients, allthough i dunno if thats possible in AC3 ...
> 
> [...]

Well, currently the only scale factors applied besides the bias are the
dynamic range (applied to freq coefficients) and sometimes downmixing
(applied to time domain samples).  I don't really know if scaling the
frequency coefficients before the imdct yields the same results as
scaling after the transform, but I can try to test it out.  In theory it
should, but I don't always trust theory. :)

Thanks,
Justin