[FFmpeg-devel] [RFC] AAC Encoder
Michael Niedermayer
michaelni
Sun Aug 17 14:59:01 CEST 2008
On Sun, Aug 17, 2008 at 02:40:19PM +0300, Kostya wrote:
> On Sat, Aug 16, 2008 at 08:35:44PM +0200, Michael Niedermayer wrote:
> > On Sat, Aug 16, 2008 at 05:31:09PM +0300, Kostya wrote:
> > > On Fri, Aug 15, 2008 at 09:05:27PM +0200, Michael Niedermayer wrote:
> > > > On Fri, Aug 15, 2008 at 07:59:52PM +0300, Kostya wrote:
> [...]
> > > >
> > > > this will not work with the data structs of the decoder
> > > > ms_mask is 120 elements
> > > > also the new group_len is still leaving holes in the arrays, its
> > > > surely better now as it doesnt loop over the 0 elements anymore but
> > > > they are still there.
> > > > I do not see why they should be there, it does not appear that there
> > > > is any advantage in them being there ... but if iam wrong iam sure you
> > > > will explain what they are good for?
> > >
> > > Now it will work with flat data arrays having size 128 (which is comparable).
> > > I think this should be acceptable and working with fixed offset
> > > (window_num*16 + scalefactor_band_index) is easier.
> > >
> > > Also I must note that decoder is presented with grouping data first and
> > > decodes the rest of data basing on it.
> > > Encoder, on the other hand, has transformed coefficients first, and applies
> > > grouping to them later. So it's easier and convenient to use first window of
> > > group to hold needed scalefactors, band types, etc. than move that stuff
> > > to another windows.
> >
> > i do not think i understand.
> > Please correct me if iam misunderstanding something but
> > the coefficients are grouped into hmm groups (what was the correct term...)
> > the encoder can choose these grouping withing some limits ...
> > each group then gets a single scale factor. Thus one cannot select
> > scalefactors prior to grouping or independant of it, it would impose
> > a unpredictable limitation
> >
> > anyway iam not arguing for the encoder to do exactly what the decoder does ATM
> > but it must be clean, compact, optimal and decoder and encoder should match
> > each other whenever possible.
> >
> > Could you maybe point to exactly what code would become uglier with the
> > style used by the decoder? ideally with a diff showing the uglification?
>
> they are almost the same to the extent decoder knows all limits and can
> exploit it and my encoder have data for all windows and just marks window
> group starts.
can the decoder be changed to use the system of the encoder?
I really would prefer if both where using the same system ...
also [w*16 + i] can be written as [w][i] which is a little cleaner
[...]
> > >
> > > > > +
> > > > > + //convert resulting path from backward-linked list
> > > > > + stack_len = 0;
> > > > > + idx = max_sfb;
> > > > > + while(idx > 0){
> > > > > + stack[stack_len++] = idx;
> > > > > + idx = s->path[idx].prev_idx;
> > > > > + }
> > > > > +
> > > > > + //perform actual band info encoding
> > > > > + start = 0;
> > > > > + for(i = stack_len - 1; i >= 0; i--){
> > > > > + put_bits(&s->pb, 4, s->path[stack[i]].codebook);
> > > > > + count = stack[i] - s->path[stack[i]].prev_idx;
> > > >
> > > > > + for(j = 0; j < count; j++){
> > > > > + cpe->ch[channel].band_type[win][start] = s->path[stack[i]].codebook;
> > > > > + cpe->ch[channel].zeroes[win][start] = !s->path[stack[i]].codebook;
> > > > > + start++;
> > > > > + }
> > > >
> > > > memset
> > >
> > > umm, band_type[] type is int
> >
> > why is it an int?
>
> because it's enum. changed for zeroes[] though
i forgot about the enum :(
[...]
> > > + //it turned out that all signed codebooks use the same offset for index coding
> > > + idx += 40;
> > > + put_bits(&s->pb, bits[idx], codes[idx]);
> > > + }
> > > + }
> > > +}
> >
> > [...]
> > > +/**
> > > + * Encode scalefactors.
> > > + */
> > > +static void encode_scale_factors(AVCodecContext *avctx, AACEncContext *s, ChannelElement *cpe, int channel, int global_gain)
> > > +{
> > > + int off = global_gain, diff;
> > > + int i, w, wg;
> > > +
> > > + w = 0;
> > > + for(wg = 0; wg < cpe->ch[channel].ics.num_window_groups; wg++){
> > > + for(i = 0; i < cpe->ch[channel].ics.max_sfb; i++){
> > > + if(!cpe->ch[channel].zeroes[w*16 + i]){
> > > + /* if we have encountered scale=256 it means empty band
> > > + * which was decided to be coded by encoder, so assign it
> > > + * last scalefactor value for compression efficiency
> > > + */
> > > + if(cpe->ch[channel].sf_idx[w*16 + i] == 256)
> > > + cpe->ch[channel].sf_idx[w*16 + i] = off;
> >
> > why is th code that selects scalefactors not simply setting it to the last
> > scale factor?
>
> it may occur before first band with scale too
for(i=end; i>=0; i--)
if(sf[i] == 256) sf[i]=last;
else last= sf[i];
as a sideeffect s[0] will contain the "global_gain" which ive seen some
complicated code to find because the firsts may be 256
[...]
> @@ -162,6 +185,16 @@
> MDCTContext mdct1024; ///< long (1024 samples) frame transform context
> MDCTContext mdct128; ///< short (128 samples) frame transform context
> DSPContext dsp;
> + DECLARE_ALIGNED_16(FFTSample, output[2048]); ///< temporary buffer for MDCT input coefficients
> + int16_t* samples; ///< saved preprocessed input
> +
> + int samplerate_index; ///< MPEG-4 samplerate index
ok
> + const uint8_t *swb_sizes1024; ///< scalefactor band sizes for long frame
> + int swb_num1024; ///< number of scalefactor bands for long frame
> + const uint8_t *swb_sizes128; ///< scalefactor band sizes for short frame
> + int swb_num128; ///< number of scalefactor bands for short frame
look nearly unused
> +
> + ChannelElement *cpe; ///< channel elements
ok
> AACPsyContext psy; ///< psychoacoustic model context
> int last_frame;
> } AACEncContext;
> @@ -221,7 +254,9 @@
>
> s->samples = av_malloc(2 * 1024 * avctx->channels * sizeof(s->samples[0]));
> s->cpe = av_mallocz(sizeof(ChannelElement) * aac_chan_configs[avctx->channels-1][0]);
> - if(ff_aac_psy_init(&s->psy, avctx, AAC_PSY_3GPP, aac_chan_configs[avctx->channels-1][0], 0, s->swb_sizes1024, s->swb_num1024, s->swb_sizes128, s->swb_num128) < 0){
> + if(ff_aac_psy_init(&s->psy, avctx, AAC_PSY_3GPP,
> + aac_chan_configs[avctx->channels-1][0], 0,
> + s->swb_sizes1024, s->swb_num1024, s->swb_sizes128, s->swb_num128) < 0){
> av_log(avctx, AV_LOG_ERROR, "Cannot initialize selected model.\n");
> return -1;
> }
ok
[...]
> /**
> * Encode ics_info element.
> * @see Table 4.6 (syntax of ics_info)
> */
> -static void put_ics_info(AVCodecContext *avctx, IndividualChannelStream *info)
> +static void put_ics_info(AACEncContext *s, IndividualChannelStream *info)
> {
> - AACEncContext *s = avctx->priv_data;
ok
> - int i;
> + int wg;
>
> put_bits(&s->pb, 1, 0); // ics_reserved bit
> put_bits(&s->pb, 2, info->window_sequence[0]);
> @@ -248,15 +334,307 @@
> put_bits(&s->pb, 1, 0); // no prediction
> }else{
> put_bits(&s->pb, 4, info->max_sfb);
> - for(i = 1; i < info->num_windows; i++)
> - put_bits(&s->pb, 1, info->group_len[i]);
> + for(wg = 0; wg < info->num_window_groups; wg++){
> + if(wg)
> + put_bits(&s->pb, 1, 0);
> + if(info->group_len[wg] > 1)
> + put_sbits(&s->pb, info->group_len[wg] - 1, 0xFF);
> + }
> + }
> +}
is the change in group_len content leading to an overall simplifiation?
Because it makes the code distinctly worse here
Also the way i understood you was that you didnt want arrays compacted
like that
[...]
> +/**
> + * Calculate the number of bits needed to code all coefficient signs in current band.
> + */
> +static int calculate_band_sign_bits(AACEncContext *s, ChannelElement *cpe, int channel,
> + int win, int group_len, int start, int size)
> +{
> + int score = 0, start2 = start;
start seems unused so start2 is redundant
also bits is a better name than score
> + int i, w;
> + for(w = win; w < win + group_len; w++){
w=0; w<group_len; w++
> + for(i = start2; i < start2 + size; i++){
> + if(cpe->ch[channel].icoefs[i])
i=0; i<size; i++
if(cpe->ch[channel].icoefs[i + start])
> + score++;
> + }
> + start2 += 128;
> + }
> + return score;
> +}
> +
> +/**
> + * Calculate the number of bits needed to code given band with given codebook.
> + *
> + * @param s encoder context
> + * @param cpe channel element
> + * @param channel channel number inside channel pair
> + * @param win window group start number
> + * @param start scalefactor band position in spectral coefficients
> + * @param size scalefactor band size
> + * @param cb codebook number
> + */
> +static int calculate_band_bits(AACEncContext *s, ChannelElement *cpe, int channel,
> + int win, int group_len, int start, int size, int cb)
> +{
> + int i, j, w;
> + int score = 0, dim, idx, start2;
> + int range = aac_cb_info[cb].range;
> +
> + if(range == -1) return 0;
> + cb--;
> + dim = cb < FIRST_PAIR_BT ? 4 : 2;
> +
> + start2 = start;
> + if(IS_CODEBOOK_UNSIGNED(cb)){
> + int coef_abs[2];
> + for(w = win; w < win + group_len; w++){
> + for(i = start2; i < start2 + size; i += dim){
> + idx = 0;
> + for(j = 0; j < dim; j++){
> + coef_abs[j] = FFABS(cpe->ch[channel].icoefs[i+j]);
> + idx = idx * range + FFMIN(coef_abs[j], 16);
> + }
> + score += ff_aac_spectral_bits[cb][idx];
> + for(j = 0; j < dim; j++){
> + if(cb == ESC_BT && coef_abs[j] > 15)
> + score += av_log2(coef_abs[j]) * 2 - 4 + 1;
> + }
The loops can still be merged
> + }
> + start2 += 128;
> + }
> + }else{
> + for(w = win; w < win + group_len; w++){
> + for(i = start2; i < start2 + size; i += dim){
> + idx = cpe->ch[channel].icoefs[i];
> + for(j = 1; j < dim; j++)
> + idx = idx * range + cpe->ch[channel].icoefs[i+j];
> + //it turned out that all signed codebooks use the same offset for index coding
> + idx += 40;
> + score += ff_aac_spectral_bits[cb][idx];
> + }
> + start2 += 128;
> + }
> + }
win seems irrelevant w can start from 0
the onlx thing from cpe used is cpe->ch[channel] thus cpe should not be
passed into this function (i suspect similar things apply to other functions)
> + return score;
> +}
> +
> +/**
> + * Encode band info for single window group bands.
> + */
> +static void encode_window_bands_info(AACEncContext *s, ChannelElement *cpe,
> + int channel, int win, int group_len)
> +{
> + BandCodingPath path[64];
> + int band_bits[64][12];
> + int maxval;
> + int w, swb, cb, start, start2, size;
> + int i, j;
> + const int max_sfb = cpe->ch[channel].ics.max_sfb;
> + const int run_bits = cpe->ch[channel].ics.num_windows == 1 ? 5 : 3;
> + const int run_esc = (1 << run_bits) - 1;
> + int bits, sbits, idx, count;
> + int stack[64], stack_len;
> +
> + start = win*128;
> + for(swb = 0; swb < max_sfb; swb++){
> + maxval = 0;
> + start2 = start;
> + size = cpe->ch[channel].ics.swb_sizes[swb];
> + if(cpe->ch[channel].zeroes[win*16 + swb])
> + maxval = 0;
> + else{
> + for(w = win; w < win + group_len; w++){
> + for(i = start2; i < start2 + size; i++){
> + maxval = FFMAX(maxval, FFABS(cpe->ch[channel].icoefs[i]));
> + }
> + start2 += 128;
> + }
win has no effect here
> + }
> + sbits = calculate_band_sign_bits(s, cpe, channel, win, group_len, start, size);
> + for(cb = 0; cb < 12; cb++){
> + if(aac_cb_info[cb].maxval < maxval)
> + band_bits[swb][cb] = INT_MAX;
> + else{
> + band_bits[swb][cb] = calculate_band_bits(s, cpe, channel, win, group_len, start, size, cb);
> + if(IS_CODEBOOK_UNSIGNED(cb-1)){
> + band_bits[swb][cb] += sbits;
> + }
> + }
> + }
> + start += cpe->ch[channel].ics.swb_sizes[swb];
> + }
> + path[0].bits = 0;
> + for(i = 1; i <= max_sfb; i++)
> + path[i].bits = INT_MAX;
> + for(i = 0; i < max_sfb; i++){
> + for(cb = 0; cb < 12; cb++){
> + int sum = 0;
> + for(j = 1; j <= max_sfb - i; j++){
> + if(band_bits[i+j-1][cb] == INT_MAX)
> + break;
> + sum += band_bits[i+j-1][cb];
if(sum<0) break;
> + bits = sum + path[i].bits + run_value_bits[cpe->ch[channel].ics.num_windows == 8][j];
> + if(bits < path[i+j].bits){
> + path[i+j].bits = bits;
> + path[i+j].codebook = cb;
> + path[i+j].prev_idx = i;
> + }
> + }
> + }
> + }
> + assert(path[max_sfb].bits != INT_MAX);
> +
> + //convert resulting path from backward-linked list
> + stack_len = 0;
> + idx = max_sfb;
> + while(idx > 0){
> + stack[stack_len++] = idx;
> + idx = path[idx].prev_idx;
> + }
> +
> + //perform actual band info encoding
> + start = 0;
> + for(i = stack_len - 1; i >= 0; i--){
> + put_bits(&s->pb, 4, path[stack[i]].codebook);
> + count = stack[i] - path[stack[i]].prev_idx;
> + memset(cpe->ch[channel].zeroes + win*16 + start, !path[stack[i]].codebook, count);
> + //XXX: memset when band_type is also uint8_t
> + for(j = 0; j < count; j++){
> + cpe->ch[channel].band_type[win*16 + start] = path[stack[i]].codebook;
> + start++;
> + }
> + while(count >= run_esc){
> + put_bits(&s->pb, run_bits, run_esc);
> + count -= run_esc;
> + }
> + put_bits(&s->pb, run_bits, count);
> + }
> +}
as alraedy guessed, cpe here as well is not used, just cpe->ch[channel]
and this also applies to encode_spectral_coeffs()
> +
> +/**
> + * Encode the coefficients of one scalefactor band with selected codebook.
> + */
> +static void encode_band_coeffs(AACEncContext *s, ChannelElement *cpe, int channel,
> + int start, int size, int cb)
> +{
> + const uint8_t *bits = ff_aac_spectral_bits [cb - 1];
> + const uint16_t *codes = ff_aac_spectral_codes[cb - 1];
> + const int range = aac_cb_info[cb].range;
> + const int dim = (cb < FIRST_PAIR_BT) ? 4 : 2;
> + int i, j, idx;
> +
> + //do not encode zero or special codebooks
> + if(range == -1) return;
> +
> + if(cb == ESC_BT){
> + int coef_abs[2];
> + for(i = start; i < start + size; i += dim){
> + idx = 0;
> + for(j = 0; j < dim; j++){
> + coef_abs[j] = FFABS(cpe->ch[channel].icoefs[i+j]);
> + idx = idx*17 + FFMIN(coef_abs[j], 16);
> + }
> + put_bits(&s->pb, bits[idx], codes[idx]);
> + //output signs
> + for(j = 0; j < dim; j++)
> + if(cpe->ch[channel].icoefs[i+j])
> + put_bits(&s->pb, 1, cpe->ch[channel].icoefs[i+j] < 0);
> + //output escape values
> + for(j = 0; j < dim; j++)
> + if(coef_abs[j] > 15){
> + int len = av_log2(coef_abs[j]);
> +
> + put_bits(&s->pb, len - 4 + 1, (1 << (len - 4 + 1)) - 2);
> + put_bits(&s->pb, len, coef_abs[j] & ((1 << len) - 1));
> + }
> + }
how many different values for dim can ESC_BT have?
[...]
> /**
> * Encode pulse data.
> */
> -static void encode_pulses(AVCodecContext *avctx, AACEncContext *s, Pulse *pulse, int channel)
> +static void encode_pulses(AACEncContext *s, Pulse *pulse, int channel)
> {
> int i;
>
ok
[...]
> * Encode spectral coefficients processed by psychoacoustic model.
> */
> -static void encode_spectral_coeffs(AVCodecContext *avctx, AACEncContext *s, ChannelElement *cpe, int channel)
> +static void encode_spectral_coeffs(AACEncContext *s, ChannelElement *cpe, int channel)
> {
> int start, i, w, w2, wg;
>
> @@ -287,7 +706,9 @@
> continue;
> }
> for(w2 = w; w2 < w + cpe->ch[channel].ics.group_len[wg]; w2++){
> - encode_band_coeffs(s, cpe, channel, start + w2*128, cpe->ch[channel].ics.swb_sizes[i], cpe->ch[channel].band_type[w*16 + i]);
> + encode_band_coeffs(s, cpe, channel, start + w2*128,
> + cpe->ch[channel].ics.swb_sizes[i],
> + cpe->ch[channel].band_type[w*16 + i]);
> }
> start += cpe->ch[channel].ics.swb_sizes[i];
> }
ok
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
While the State exists there can be no freedom; when there is freedom there
will be no State. -- Vladimir Lenin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080817/91de9ad6/attachment.pgp>
More information about the ffmpeg-devel
mailing list