[FFmpeg-devel] [PATCH] updated LGPL AC-3 decoder

Mon May 7 00:49:00 CEST 2007

Michael Niedermayer wrote:
> Hi
> 
> On Thu, May 03, 2007 at 12:42:09AM -0400, Justin Ruggles wrote:
> 
>>Hi,
>>
>>First of all, I don't have the original emails for this thread, so I'm
>>starting a new thread with the same name...sorry.
>>
>>Here is an update of the AC-3 decoder patch.  I've included most of
>>Michael's suggestions.  Also, the downmixing is much simpler now.
>>
>>I think this should be ready for merging after another review.  A
>>codec-independent channel reordering and downmixing patch can always be
>>added later for more advanced mixing.  For now, the downmixing
>>functionality mirrors that of the liba52 and dca decoders.
> 
> [...]
> 
> 
>>+/** adjustments in dB gain */
>>+#define LEVEL_ZERO              (0.0000000000000000f)
>>+#define LEVEL_ONE               (1.0000000000000000f)
>>+#define LEVEL_MINUS_3DB         (0.7071067811865476f) /* sqrt(2)/2 */
>>+#define LEVEL_MINUS_4POINT5DB   (0.5946035575013605f)
>>+#define LEVEL_MINUS_6DB         (0.5000000000000000f)
>>+#define LEVEL_MINUS_9DB         (0.3548133892335755f)
> 
> 
> are you sure these are correct?
> 
> LEVEL_MINUS_xDB 2^(-x/6)
> matches all except LEVEL_MINUS_9DB

fixed.

> 
> [...]
> 
> 
>>+static inline float
>>+symmetric_dequant(int code, int levels)
>>+{
>>+    float m = (code - (levels >> 1)) << 1;
>>+    return (m / levels);
>>+}
> 
> 
> as level is always a constant this could be simplified to
> 
> return (code - (levels >> 1)) * (2.0/levels);
> 
> gcc should here be able to change levels >> 1 and 2.0/levels to
> constants

done.

> 
> 
>>+
>>+static void ac3_decoder_tables_init(void)
>>+{
>>+    int i;
>>+
>>+    /* generate mantissa tables */
>>+    for(i=0; i<32; i++) {
>>+        b1_mantissas[i][0] = symmetric_dequant( i / 9     , 3);
>>+        b1_mantissas[i][1] = symmetric_dequant((i % 9) / 3, 3);
>>+        b1_mantissas[i][2] = symmetric_dequant((i % 9) % 3, 3);
>>+    }
>>+    for(i=0; i<128; i++) {
>>+        b2_mantissas[i][0] = symmetric_dequant( i / 25     , 5);
>>+        b2_mantissas[i][1] = symmetric_dequant((i % 25) / 5, 5);
>>+        b2_mantissas[i][2] = symmetric_dequant((i % 25) % 5, 5);
>>+    }
>>+    for(i=0; i<128; i++) {
>>+        b4_mantissas[i][0] = symmetric_dequant(i / 11, 11);
>>+        b4_mantissas[i][1] = symmetric_dequant(i % 11, 11);
>>+    }
>>+
>>+    /* generate dynamic range table */
>>+    for(i=0; i<256; i++) {
>>+        int v = i >> 5;
>>+        if(v > 3) v -= 7;
>>+        else v++;
>>+        dynrng_tbl[i] = powf(2.0f, v);
>>+        v = (i & 0x1F) | 0x20;
>>+        dynrng_tbl[i] *= v / 64.0f;
>>+    }
>>+
>>+    /* generate scale factors */
>>+    for(i=0; i<25; i++)
>>+        scale_factors[i] = pow(2.0, -i);
>>+
>>+    /* generate exponent tables */
>>+    for(i=0; i<128; i++) {
>>+        exp_ungroup_tbl[i][0] =  i / 25;
>>+        exp_ungroup_tbl[i][1] = (i % 25) / 5;
>>+        exp_ungroup_tbl[i][2] = (i % 25) % 5;
>>+    }
> 
> 
> all the loops with 128 iteratons could be merged though iam unsure if this
> would be a good idea, maybe its clearer if its left as it is

partially done. kept exponent tables separate from mantissa tables.

> 
> [...]
> 
>>+/**
>>+ * Parses the 'sync info' and 'bit stream info' from the AC-3 bitstream.
>>+ * GetBitContext within AC3DecodeContext must point to start of the
>>+ * synchronized AC-3 bitstream.
>>+ */
>>+static int parse_header(AC3DecodeContext *ctx)
>>+{
>>+    AC3HeaderInfo hdr;
>>+    GetBitContext *gb = &ctx->gb;
>>+    int err, i;
>>+    float cmix, smix, smix1r;
>>+
>>+    err = ff_ac3_parse_header(gb->buffer, &hdr);
>>+    if(err)
>>+        return err;
>>+
>>+    /* get decoding parameters from header info */
>>+    ctx->crc1 = hdr.crc1;
>>+    ctx->fscod = hdr.fscod;
>>+    ctx->bit_alloc_params.fscod = hdr.fscod;
>>+    ctx->acmod = hdr.acmod;
>>+    ctx->cmixlev = hdr.cmixlev;
>>+    ctx->surmixlev = hdr.surmixlev;
>>+    ctx->lfeon = hdr.lfeon;
>>+    ctx->halfratecod = hdr.halfratecod;
>>+    ctx->bit_alloc_params.halfratecod = hdr.halfratecod;
>>+    ctx->sample_rate = hdr.sample_rate;
>>+    ctx->bit_rate = hdr.bit_rate / 1000;
>>+    ctx->nchans = hdr.channels;
>>+    ctx->nfchans = ctx->nchans - ctx->lfeon;
>>+    ctx->lfe_channel = ctx->nfchans;
>>+    ctx->cpl_channel = ctx->lfe_channel+1;
>>+    ctx->frame_size = hdr.frame_size;
>>+    ctx->output_mode = nfchans_tbl[ctx->acmod];
> 
> 
> this could be vertically aligned like:
> 
> ctx->crc1                   = hdr.crc1;
> ctx->fscod                  = hdr.fscod;
> ctx->bit_alloc_params.fscod = hdr.fscod;
> ctx->acmod                  = hdr.acmod;

done.

> 
> [...]
> 
>>+static void do_bit_allocation(AC3DecodeContext *ctx, int ch)
>>+{
>>+    int start=0, end=0;
>>+
>>+    if(ch == ctx->cpl_channel) {
>>+        start = ctx->cplstrtmant;
>>+        end = ctx->cplendmant;
>>+    } else {
>>+        end = ctx->endmant[ch];
>>+    }
> 
> 
> iam tempted to suggest to set
> ctx->endmant[ ctx->cpl_channel ] = ctx->cplendmant;
> and then simplify the related code
> whats your oppinion about that?

I took your advice on that.  Also did the same for cplstrtmant,
althought it's not used quite as much.

> 
> [...]
> 
>>+    subbnd = 0;
>>+    for(i=ctx->cplstrtmant,bnd=0; i<ctx->cplendmant; bnd++) {
>>+        do {
>>+            for(j=0; j<12; j++) {
>>+                for(ch=0; ch<ctx->nfchans; ch++) {
>>+                if(ctx->chincpl[ch]) {
>>+                    ctx->transform_coeffs[ch][i] = ctx->cpl_coeffs[i] *
>>+                                                   ctx->cplco[ch][bnd];
>>+                    }
>>+                }
> 
> 
> the if() is idented wrongly

fixed.

> 
> [...]
> 
>>+        v0 = 0.0f;
>>+        for(j=0; j<nfchans; j++) {
>>+            v0 += samples[j][i] * coef[j][0];
>>+        }
>>+        v1 = 0.0f;
>>+        for(j=0; j<nfchans; j++) {
>>+            v1 += samples[j][i] * coef[j][1];
>>+        }
> 
> 
> the 2 loops can be merged

done.

> 
> [...]
> 
>>+    /*block switch flag */
> 
> 
> nitpick, missing space between * and b :)

fixed.

> 
> 
>>+    memset(ctx->blksw, 0, sizeof(ctx->blksw));
>>+    for(ch=0; ch<nfchans; ch++)
>>+        ctx->blksw[ch] = get_bits1(gb);
>>+
>>+    /* dithering flag */
>>+    memset(ctx->dithflag, 0, sizeof(ctx->dithflag));
>>+    for(ch=0; ch<nfchans; ch++)
>>+        ctx->dithflag[ch] = get_bits1(gb);
> 
> 
> are all these memsets really needed?

removed.

> 
> [...]
> 
>>+                    av_log(NULL, AV_LOG_ERROR, "chbwcod = %d > 60", chbwcod);
> 
> 
> please use some context, NULL is bad if there are several decoders

done.

> 
> [...]
> 
>>+    /* coupling exponents */
>>+    if(ctx->cplinu && ctx->expstr[ctx->cpl_channel] != EXP_REUSE) {
>>+        int cplabsexp = get_bits(gb, 4) << 1;
>>+        int grpsize = 3 << (ctx->expstr[ctx->cpl_channel] - 1);
>>+        int ngrps = (ctx->cplendmant - ctx->cplstrtmant) / grpsize;
>>+        dexps = ctx->dexps[ctx->cpl_channel];
>>+        decode_exponents(gb, ctx->expstr[ctx->cpl_channel], ngrps, cplabsexp,
>>+                         dexps + ctx->cplstrtmant);
>>+    }
>>+    /* fbw & lfe exponents */
>>+    for(ch=0; ch<ctx->nchans; ch++) {
>>+        if(ctx->expstr[ch] != EXP_REUSE) {
>>+            int ngrps = 2;
>>+            if(ch != ctx->lfe_channel) {
>>+                int grpsize = 3 << (ctx->expstr[ch] - 1);
>>+                ngrps = (ctx->endmant[ch] + grpsize - 4) / grpsize;
>>+            }
>>+            dexps = ctx->dexps[ch];
>>+            dexps[0] = get_bits(gb, 4);
>>+            decode_exponents(gb, ctx->expstr[ch], ngrps, dexps[0], dexps + 1);
>>+            if(ch != ctx->lfe_channel)
>>+                skip_bits(gb, 2); // skip gainrng
>>+        }
>>+    }
> 
> 
> maybe these can be merged?

done.

> 
> [...]
> 
>>+    /* snroffset */
>>+    if(get_bits1(gb)) {
>>+        /* coarse snr offset */
>>+        ctx->csnroffst = get_bits(gb, 6);
>>+
>>+        /* fine snr offsets and fast gain codes */
>>+        if(ctx->cplinu){
>>+            ctx->fsnroffst[ctx->cpl_channel] = get_bits(gb, 4);
>>+            ctx->fgaincod [ctx->cpl_channel] = get_bits(gb, 3);
>>+        }
>>+        for(ch=0; ch<ctx->nchans; ch++) {
>>+            ctx->fsnroffst[ch] = get_bits(gb, 4);
>>+            ctx->fgaincod [ch] = get_bits(gb, 3);
>>+        }
>>+        memset(ctx->bit_alloc_stages, 3, sizeof(ctx->bit_alloc_stages));
>>+    }
>>+
>>+    /* coupling leak information */
>>+    if(ctx->cplinu && get_bits1(gb)) {
>>+        ctx->bit_alloc_params.cplfleak = get_bits(gb, 3);
>>+        ctx->bit_alloc_params.cplsleak = get_bits(gb, 3);
>>+        ctx->bit_alloc_stages[ctx->cpl_channel] = FFMAX(ctx->bit_alloc_stages[ctx->cpl_channel], 2);
>>+    }
>>+
>>+    /* delta bit allocation information */
>>+    if(get_bits1(gb)) {
>>+        /* bit allocation exists (new/reuse/none) */
>>+        if(ctx->cplinu) {
>>+            ctx->deltbae[ctx->cpl_channel] = get_bits(gb, 2);
>>+        }
>>+        for(ch=0; ch<nfchans; ch++) {
>>+            ctx->deltbae[ch] = get_bits(gb, 2);
>>+        }
>>+
>>+        /* delta offset, len and bit allocation */
>>+        if(ctx->cplinu) {
>>+            ch = ctx->cpl_channel;
>>+            if(ctx->deltbae[ch] == DBA_NEW) {
>>+                ctx->deltnseg[ch] = get_bits(gb, 3);
>>+                for(seg=0; seg<=ctx->deltnseg[ch]; seg++) {
>>+                    ctx->deltoffst[ch][seg] = get_bits(gb, 5);
>>+                    ctx->deltlen[ch][seg] = get_bits(gb, 4);
>>+                    ctx->deltba[ch][seg] = get_bits(gb, 3);
>>+                }
>>+            }
>>+            ctx->bit_alloc_stages[ch] = FFMAX(ctx->bit_alloc_stages[ch], 2);
>>+        }
>>+        for(ch=0; ch<nfchans; ch++) {
>>+            if(ctx->deltbae[ch] == DBA_NEW) {
>>+                ctx->deltnseg[ch] = get_bits(gb, 3);
>>+                for(seg=0; seg<=ctx->deltnseg[ch]; seg++) {
>>+                    ctx->deltoffst[ch][seg] = get_bits(gb, 5);
>>+                    ctx->deltlen[ch][seg] = get_bits(gb, 4);
>>+                    ctx->deltba[ch][seg] = get_bits(gb, 3);
>>+                }
>>+            }
>>+            ctx->bit_alloc_stages[ch] = FFMAX(ctx->bit_alloc_stages[ch], 2);
>>+        }
> 
> 
> why not put the cpl_channel at ch=-1 position? maybe ive already asked that
> i dont remember exactly ...

I tried i tout, and I think I do like it better.  It makes it slightly
harder to figure out what's going on at first look, but the code is
definitely simpler.

> 
> [...]
> 
>>+    /* apply dynamic range scaling */
>>+    for(ch=0; ch<ctx->nchans; ch++) {
>>+        float gain = 1.0f;
>>+        if(ctx->acmod == AC3_ACMOD_DUALMONO) {
>>+            gain = 2.0f * ctx->dynrng[ch];
>>+        } else {
>>+            gain = 2.0f * ctx->dynrng[0];
>>+        }
> 
> 
> hmmmm, setting gain to 1.0 and then setting it to another value, the
> initial set is useless

fixed.

> 
> [...]
> 
>>+    /* convert float to 16-bit integer */
>>+    for(ch=0; ch<ctx->out_channels; ch++) {
>>+        for(i=0; i<256; i++) {
>>+            ctx->output[ch][i] = ctx->output[ch][i] * ctx->mul_bias +
>>+                                 ctx->add_bias;
>>+        }
>>+        ctx->dsp.float_to_int16(ctx->int_output[ch], ctx->output[ch], 256);
> 
> 
> this isnt exactly ideal
> the bias should be applied in some more efficient way like multiplying some
> scaling coefficients, allthough i dunno if thats possible in AC3 ...

I changed it so that the bias is multiplied in before the IMDCT along
with the dialogue normalization and dynamic range compression.

On top of your suggestions, I also did several other changes.  I added
dialogue normalization, put in some references to the specs in various
places, fixed some bugs, simplified a few things, added more checks for
error conditions, and removed some unused variables.

I still can't figure out why the 5.1 decoding is so much slower than
liba52.  My guess is the downmixing, but it might be other things as
well.  Stereo decoding is pretty much the same speed on my system as
liba52, but 5.1 decoding takes twice as long...

-Justin

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ac3dec-1.diff
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070506/10350457/attachment.asc>