[FFmpeg-devel] [RFC] AAC Encoder

Sun Aug 17 14:59:01 CEST 2008

On Sun, Aug 17, 2008 at 02:40:19PM +0300, Kostya wrote:
> On Sat, Aug 16, 2008 at 08:35:44PM +0200, Michael Niedermayer wrote:
> > On Sat, Aug 16, 2008 at 05:31:09PM +0300, Kostya wrote:
> > > On Fri, Aug 15, 2008 at 09:05:27PM +0200, Michael Niedermayer wrote:
> > > > On Fri, Aug 15, 2008 at 07:59:52PM +0300, Kostya wrote:
> [...]
> > > > 
> > > > this will not work with the data structs of the decoder
> > > > ms_mask is 120 elements
> > > > also the new group_len is still leaving holes in the arrays, its
> > > > surely better now as it doesnt loop over the 0 elements anymore but
> > > > they are still there.
> > > > I do not see why they should be there, it does not appear that there
> > > > is any advantage in them being there ... but if iam wrong iam sure you
> > > > will explain what they are good for?
> > >  
> > > Now it will work with flat data arrays having size 128 (which is comparable).
> > > I think this should be acceptable and working with fixed offset
> > > (window_num*16 + scalefactor_band_index) is easier.
> > > 
> > > Also I must note that decoder is presented with grouping data first and
> > > decodes the rest of data basing on it.
> > > Encoder, on the other hand, has transformed coefficients first, and applies
> > > grouping to them later. So it's easier and convenient to use first window of
> > > group to hold needed scalefactors, band types, etc. than move that stuff
> > > to another windows.
> > 
> > i do not think i understand.
> > Please correct me if iam misunderstanding something but
> > the coefficients are grouped into hmm groups (what was the correct term...)
> > the encoder can choose these grouping withing some limits ...
> > each group then gets a single scale factor. Thus one cannot select
> > scalefactors prior to grouping or independant of it, it would impose
> > a unpredictable limitation
> > 
> > anyway iam not arguing for the encoder to do exactly what the decoder does ATM
> > but it must be clean, compact, optimal and decoder and encoder should match
> > each other whenever possible.
> > 
> > Could you maybe point to exactly what code would become uglier with the
> > style used by the decoder? ideally with a diff showing the uglification?
> 
> they are almost the same to the extent decoder knows all limits and can
> exploit it and my encoder have data for all windows and just marks window
> group starts.

can the decoder be changed to use the system of the encoder?
I really would prefer if both where using the same system ...
also [w*16 + i] can be written as [w][i] which is a little cleaner

[...]
> > >  
> > > > > +
> > > > > +    //convert resulting path from backward-linked list
> > > > > +    stack_len = 0;
> > > > > +    idx = max_sfb;
> > > > > +    while(idx > 0){
> > > > > +        stack[stack_len++] = idx;
> > > > > +        idx = s->path[idx].prev_idx;
> > > > > +    }
> > > > > +
> > > > > +    //perform actual band info encoding
> > > > > +    start = 0;
> > > > > +    for(i = stack_len - 1; i >= 0; i--){
> > > > > +        put_bits(&s->pb, 4, s->path[stack[i]].codebook);
> > > > > +        count = stack[i] - s->path[stack[i]].prev_idx;
> > > > 
> > > > > +        for(j = 0; j < count; j++){
> > > > > +            cpe->ch[channel].band_type[win][start] =  s->path[stack[i]].codebook;
> > > > > +            cpe->ch[channel].zeroes[win][start]    = !s->path[stack[i]].codebook;
> > > > > +            start++;
> > > > > +        }
> > > > 
> > > > memset
> > > 
> > > umm, band_type[] type is int 
> > 
> > why is it an int?
> 
> because it's enum. changed for zeroes[] though

i forgot about the enum :(

[...]
> > > +            //it turned out that all signed codebooks use the same offset for index coding
> > > +            idx += 40;
> > > +            put_bits(&s->pb, bits[idx], codes[idx]);
> > > +        }
> > > +    }
> > > +}
> > 
> > [...]
> > > +/**
> > > + * Encode scalefactors.
> > > + */
> > > +static void encode_scale_factors(AVCodecContext *avctx, AACEncContext *s, ChannelElement *cpe, int channel, int global_gain)
> > > +{
> > > +    int off = global_gain, diff;
> > > +    int i, w, wg;
> > > +
> > > +    w = 0;
> > > +    for(wg = 0; wg < cpe->ch[channel].ics.num_window_groups; wg++){
> > > +        for(i = 0; i < cpe->ch[channel].ics.max_sfb; i++){
> > > +            if(!cpe->ch[channel].zeroes[w*16 + i]){
> > > +                /* if we have encountered scale=256 it means empty band
> > > +                 * which was decided to be coded by encoder, so assign it
> > > +                 * last scalefactor value for compression efficiency
> > > +                 */
> > > +                if(cpe->ch[channel].sf_idx[w*16 + i] == 256)
> > > +                    cpe->ch[channel].sf_idx[w*16 + i] = off;
> > 
> > why is th code that selects scalefactors not simply setting it to the last
> > scale factor?
> 
> it may occur before first band with scale too 

for(i=end; i>=0; i--)
    if(sf[i] == 256) sf[i]=last;
    else             last= sf[i];

as a sideeffect s[0] will contain the "global_gain" which ive seen some
complicated code to find because the firsts may be 256

[...]
> @@ -162,6 +185,16 @@
>      MDCTContext mdct1024;                        ///< long (1024 samples) frame transform context
>      MDCTContext mdct128;                         ///< short (128 samples) frame transform context
>      DSPContext  dsp;
> +    DECLARE_ALIGNED_16(FFTSample, output[2048]); ///< temporary buffer for MDCT input coefficients
> +    int16_t* samples;                            ///< saved preprocessed input
> +

> +    int samplerate_index;                        ///< MPEG-4 samplerate index

ok

> +    const uint8_t *swb_sizes1024;                ///< scalefactor band sizes for long frame
> +    int swb_num1024;                             ///< number of scalefactor bands for long frame
> +    const uint8_t *swb_sizes128;                 ///< scalefactor band sizes for short frame
> +    int swb_num128;                              ///< number of scalefactor bands for short frame

look nearly unused

> +
> +    ChannelElement *cpe;                         ///< channel elements

ok

>      AACPsyContext psy;                           ///< psychoacoustic model context
>      int last_frame;
>  } AACEncContext;

> @@ -221,7 +254,9 @@
>  
>      s->samples = av_malloc(2 * 1024 * avctx->channels * sizeof(s->samples[0]));
>      s->cpe = av_mallocz(sizeof(ChannelElement) * aac_chan_configs[avctx->channels-1][0]);
> -    if(ff_aac_psy_init(&s->psy, avctx, AAC_PSY_3GPP, aac_chan_configs[avctx->channels-1][0], 0, s->swb_sizes1024, s->swb_num1024, s->swb_sizes128, s->swb_num128) < 0){
> +    if(ff_aac_psy_init(&s->psy, avctx, AAC_PSY_3GPP,
> +                       aac_chan_configs[avctx->channels-1][0], 0,
> +                       s->swb_sizes1024, s->swb_num1024, s->swb_sizes128, s->swb_num128) < 0){
>          av_log(avctx, AV_LOG_ERROR, "Cannot initialize selected model.\n");
>          return -1;
>      }

ok

[...]
>  /**
>   * Encode ics_info element.
>   * @see Table 4.6 (syntax of ics_info)
>   */
> -static void put_ics_info(AVCodecContext *avctx, IndividualChannelStream *info)
> +static void put_ics_info(AACEncContext *s, IndividualChannelStream *info)
>  {
> -    AACEncContext *s = avctx->priv_data;

ok

> -    int i;
> +    int wg;
>  
>      put_bits(&s->pb, 1, 0);                // ics_reserved bit
>      put_bits(&s->pb, 2, info->window_sequence[0]);
> @@ -248,15 +334,307 @@
>          put_bits(&s->pb, 1, 0);            // no prediction
>      }else{
>          put_bits(&s->pb, 4, info->max_sfb);
> -        for(i = 1; i < info->num_windows; i++)
> -            put_bits(&s->pb, 1, info->group_len[i]);
> +        for(wg = 0; wg < info->num_window_groups; wg++){
> +            if(wg)
> +                put_bits(&s->pb, 1, 0);
> +            if(info->group_len[wg] > 1)
> +                put_sbits(&s->pb, info->group_len[wg] - 1, 0xFF);
> +        }
> +    }
> +}

is the change in group_len content leading to an overall simplifiation?
Because it makes the code distinctly worse here
Also the way i understood you was that you didnt want arrays compacted
like that

[...]
> +/**
> + * Calculate the number of bits needed to code all coefficient signs in current band.
> + */
> +static int calculate_band_sign_bits(AACEncContext *s, ChannelElement *cpe, int channel,
> +                                    int win, int group_len, int start, int size)
> +{
> +    int score = 0, start2 = start;

start seems unused so start2 is redundant
also bits is a better name than score

> +    int i, w;
> +    for(w = win; w < win + group_len; w++){

w=0; w<group_len; w++

> +        for(i = start2; i < start2 + size; i++){
> +            if(cpe->ch[channel].icoefs[i])

i=0; i<size; i++
    if(cpe->ch[channel].icoefs[i + start])

> +                score++;
> +        }
> +        start2 += 128;
> +    }
> +    return score;
> +}

> +
> +/**
> + * Calculate the number of bits needed to code given band with given codebook.
> + *
> + * @param s       encoder context
> + * @param cpe     channel element
> + * @param channel channel number inside channel pair
> + * @param win     window group start number
> + * @param start   scalefactor band position in spectral coefficients
> + * @param size    scalefactor band size
> + * @param cb      codebook number
> + */
> +static int calculate_band_bits(AACEncContext *s, ChannelElement *cpe, int channel,
> +                               int win, int group_len, int start, int size, int cb)
> +{
> +    int i, j, w;
> +    int score = 0, dim, idx, start2;
> +    int range = aac_cb_info[cb].range;
> +
> +    if(range == -1) return 0;
> +    cb--;
> +    dim = cb < FIRST_PAIR_BT ? 4 : 2;
> +

> +    start2 = start;
> +    if(IS_CODEBOOK_UNSIGNED(cb)){
> +        int coef_abs[2];
> +        for(w = win; w < win + group_len; w++){
> +            for(i = start2; i < start2 + size; i += dim){
> +                idx = 0;

> +                for(j = 0; j < dim; j++){
> +                    coef_abs[j] = FFABS(cpe->ch[channel].icoefs[i+j]);
> +                    idx = idx * range + FFMIN(coef_abs[j], 16);
> +                }
> +                score += ff_aac_spectral_bits[cb][idx];
> +                for(j = 0; j < dim; j++){
> +                    if(cb == ESC_BT && coef_abs[j] > 15)
> +                        score += av_log2(coef_abs[j]) * 2 - 4 + 1;
> +                }

The loops can still be merged

> +            }
> +            start2 += 128;
> +        }
> +    }else{
> +        for(w = win; w < win + group_len; w++){
> +            for(i = start2; i < start2 + size; i += dim){
> +                idx = cpe->ch[channel].icoefs[i];
> +                for(j = 1; j < dim; j++)
> +                    idx = idx * range + cpe->ch[channel].icoefs[i+j];
> +                //it turned out that all signed codebooks use the same offset for index coding
> +                idx += 40;
> +                score += ff_aac_spectral_bits[cb][idx];
> +            }
> +            start2 += 128;
> +        }
> +    }

win seems irrelevant w can start from 0
the onlx thing from cpe used is cpe->ch[channel] thus cpe should not be
passed into this function (i suspect similar things apply to other functions)

> +    return score;
> +}

> +
> +/**
> + * Encode band info for single window group bands.
> + */
> +static void encode_window_bands_info(AACEncContext *s, ChannelElement *cpe,
> +                                     int channel, int win, int group_len)
> +{
> +    BandCodingPath path[64];
> +    int band_bits[64][12];
> +    int maxval;
> +    int w, swb, cb, start, start2, size;
> +    int i, j;
> +    const int max_sfb = cpe->ch[channel].ics.max_sfb;
> +    const int run_bits = cpe->ch[channel].ics.num_windows == 1 ? 5 : 3;
> +    const int run_esc = (1 << run_bits) - 1;
> +    int bits, sbits, idx, count;
> +    int stack[64], stack_len;
> +
> +    start = win*128;
> +    for(swb = 0; swb < max_sfb; swb++){
> +        maxval = 0;
> +        start2 = start;
> +        size = cpe->ch[channel].ics.swb_sizes[swb];
> +        if(cpe->ch[channel].zeroes[win*16 + swb])
> +            maxval = 0;
> +        else{

> +            for(w = win; w < win + group_len; w++){
> +                for(i = start2; i < start2 + size; i++){
> +                    maxval = FFMAX(maxval, FFABS(cpe->ch[channel].icoefs[i]));
> +                }
> +                start2 += 128;
> +            }

win has no effect here

> +        }
> +        sbits = calculate_band_sign_bits(s, cpe, channel, win, group_len, start, size);
> +        for(cb = 0; cb < 12; cb++){
> +            if(aac_cb_info[cb].maxval < maxval)
> +                band_bits[swb][cb] = INT_MAX;
> +            else{
> +                band_bits[swb][cb] = calculate_band_bits(s, cpe, channel, win, group_len, start, size, cb);
> +                if(IS_CODEBOOK_UNSIGNED(cb-1)){
> +                    band_bits[swb][cb] += sbits;
> +                }
> +            }
> +        }
> +        start += cpe->ch[channel].ics.swb_sizes[swb];
> +    }

> +    path[0].bits = 0;
> +    for(i = 1; i <= max_sfb; i++)
> +        path[i].bits = INT_MAX;
> +    for(i = 0; i < max_sfb; i++){
> +        for(cb = 0; cb < 12; cb++){
> +            int sum = 0;
> +            for(j = 1; j <= max_sfb - i; j++){

> +                if(band_bits[i+j-1][cb] == INT_MAX)
> +                    break;
> +                sum += band_bits[i+j-1][cb];

if(sum<0) break;

> +                bits = sum + path[i].bits + run_value_bits[cpe->ch[channel].ics.num_windows == 8][j];
> +                if(bits < path[i+j].bits){
> +                    path[i+j].bits     = bits;
> +                    path[i+j].codebook = cb;
> +                    path[i+j].prev_idx = i;
> +                }
> +            }
> +        }
> +    }
> +    assert(path[max_sfb].bits != INT_MAX);
> +
> +    //convert resulting path from backward-linked list
> +    stack_len = 0;
> +    idx = max_sfb;
> +    while(idx > 0){
> +        stack[stack_len++] = idx;
> +        idx = path[idx].prev_idx;
> +    }
> +
> +    //perform actual band info encoding
> +    start = 0;
> +    for(i = stack_len - 1; i >= 0; i--){
> +        put_bits(&s->pb, 4, path[stack[i]].codebook);
> +        count = stack[i] - path[stack[i]].prev_idx;
> +        memset(cpe->ch[channel].zeroes + win*16 + start, !path[stack[i]].codebook, count);
> +        //XXX: memset when band_type is also uint8_t
> +        for(j = 0; j < count; j++){
> +            cpe->ch[channel].band_type[win*16 + start] =  path[stack[i]].codebook;
> +            start++;
> +        }
> +        while(count >= run_esc){
> +            put_bits(&s->pb, run_bits, run_esc);
> +            count -= run_esc;
> +        }
> +        put_bits(&s->pb, run_bits, count);
> +    }
> +}

as alraedy guessed, cpe here as well is not used, just cpe->ch[channel]
and this also applies to encode_spectral_coeffs()

> +
> +/**
> + * Encode the coefficients of one scalefactor band with selected codebook.
> + */
> +static void encode_band_coeffs(AACEncContext *s, ChannelElement *cpe, int channel,
> +                               int start, int size, int cb)
> +{
> +    const uint8_t  *bits  = ff_aac_spectral_bits [cb - 1];
> +    const uint16_t *codes = ff_aac_spectral_codes[cb - 1];
> +    const int range = aac_cb_info[cb].range;
> +    const int dim = (cb < FIRST_PAIR_BT) ? 4 : 2;
> +    int i, j, idx;
> +
> +    //do not encode zero or special codebooks
> +    if(range == -1) return;
> +
> +    if(cb == ESC_BT){

> +        int coef_abs[2];
> +        for(i = start; i < start + size; i += dim){
> +            idx = 0;
> +            for(j = 0; j < dim; j++){
> +                coef_abs[j] = FFABS(cpe->ch[channel].icoefs[i+j]);
> +                idx = idx*17 + FFMIN(coef_abs[j], 16);
> +            }
> +            put_bits(&s->pb, bits[idx], codes[idx]);
> +            //output signs
> +            for(j = 0; j < dim; j++)
> +                if(cpe->ch[channel].icoefs[i+j])
> +                    put_bits(&s->pb, 1, cpe->ch[channel].icoefs[i+j] < 0);
> +            //output escape values
> +            for(j = 0; j < dim; j++)
> +                if(coef_abs[j] > 15){
> +                    int len = av_log2(coef_abs[j]);
> +
> +                    put_bits(&s->pb, len - 4 + 1, (1 << (len - 4 + 1)) - 2);
> +                    put_bits(&s->pb, len, coef_abs[j] & ((1 << len) - 1));
> +                }
> +        }

how many different values for dim can ESC_BT have?

[...]

>  /**
>   * Encode pulse data.
>   */
> -static void encode_pulses(AVCodecContext *avctx, AACEncContext *s, Pulse *pulse, int channel)
> +static void encode_pulses(AACEncContext *s, Pulse *pulse, int channel)
>  {
>      int i;
>  

ok

[...]

>   * Encode spectral coefficients processed by psychoacoustic model.
>   */
> -static void encode_spectral_coeffs(AVCodecContext *avctx, AACEncContext *s, ChannelElement *cpe, int channel)
> +static void encode_spectral_coeffs(AACEncContext *s, ChannelElement *cpe, int channel)
>  {
>      int start, i, w, w2, wg;
>  
> @@ -287,7 +706,9 @@
>                  continue;
>              }
>              for(w2 = w; w2 < w + cpe->ch[channel].ics.group_len[wg]; w2++){
> -                encode_band_coeffs(s, cpe, channel, start + w2*128, cpe->ch[channel].ics.swb_sizes[i], cpe->ch[channel].band_type[w*16 + i]);
> +                encode_band_coeffs(s, cpe, channel, start + w2*128,
> +                                   cpe->ch[channel].ics.swb_sizes[i],
> +                                   cpe->ch[channel].band_type[w*16 + i]);
>              }
>              start += cpe->ch[channel].ics.swb_sizes[i];
>          }

ok

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

While the State exists there can be no freedom; when there is freedom there
will be no State. -- Vladimir Lenin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080817/91de9ad6/attachment.pgp>