[FFmpeg-devel] [RFC] AAC Encoder
Kostya
kostya.shishkov
Sat Aug 16 16:31:09 CEST 2008
On Fri, Aug 15, 2008 at 09:05:27PM +0200, Michael Niedermayer wrote:
> On Fri, Aug 15, 2008 at 07:59:52PM +0300, Kostya wrote:
> > On Thu, Aug 14, 2008 at 03:38:17PM +0200, Michael Niedermayer wrote:
> > > viterbi for determining band_types ...
> > > look this isnt hard, its not even slow in this paricular case,let me explain
> > [explanation skipped]
> >
> > Hmm, I have not understood it at the beginning but then I found out it's
> > strikingly similar to the one-pass almost-optimal LZ matching scheme.
> > So here it is with other comments taken care of too.
> >
> > P.S. I was surprised to find out that I won't be near computer next week
> > so I'll try to make encoder fit for committing ASAP.
>
[...]
> > @@ -119,6 +118,34 @@
> > swb_size_128_16, swb_size_128_16, swb_size_128_8
> > };
> >
> > +#define CB_UNSIGNED 0x01 ///< coefficients are coded as absolute values
> > +#define CB_PAIRS 0x02 ///< coefficients are grouped into pairs before coding (quads by default)
> > +#define CB_ESCAPE 0x04 ///< codebook allows escapes
> > +
> > +/** spectral coefficients codebook information */
> > +static const struct {
> > + int16_t maxval; ///< maximum possible value
> > + int8_t cb_num; ///< codebook number
> > + uint8_t flags; ///< codebook features
> > +} aac_cb_info[] = {
> > + { 0, -1, CB_UNSIGNED }, // zero codebook
> > + { 1, 0, 0 },
> > + { 1, 1, 0 },
> > + { 2, 2, CB_UNSIGNED },
> > + { 2, 3, CB_UNSIGNED },
> > + { 4, 4, CB_PAIRS },
> > + { 4, 5, CB_PAIRS },
> > + { 7, 6, CB_PAIRS | CB_UNSIGNED },
> > + { 7, 7, CB_PAIRS | CB_UNSIGNED },
> > + { 12, 8, CB_PAIRS | CB_UNSIGNED },
> > + { 12, 9, CB_PAIRS | CB_UNSIGNED },
> > + { 8191, 10, CB_PAIRS | CB_UNSIGNED | CB_ESCAPE },
> > + { -1, -1, 0 }, // reserved
> > + { -1, -1, 0 }, // perceptual noise substitution
> > + { -1, -1, 0 }, // intensity out-of-phase
> > + { -1, -1, 0 }, // intensity in-phase
> > +};
>
> cb_num is useless index-1 will work as well.
contracted table a bit
[...]
> > +/**
> > + * Encode MS data.
> > + * @see 4.6.8.1 "Joint Coding - M/S Stereo"
> > + */
> > +static void encode_ms_info(PutBitContext *pb, ChannelElement *cpe)
> > +{
> > + int i, w, wg;
> > +
> > + put_bits(pb, 2, cpe->ms.present);
> > + if(cpe->ms.present == 1){
> > + w = 0;
> > + for(wg = 0; wg < cpe->ch[0].ics.num_window_groups; wg++){
> > + for(i = 0; i < cpe->ch[0].ics.max_sfb; i++)
> > + put_bits(pb, 1, cpe->ms.mask[w][i]);
> > + w += cpe->ch[0].ics.group_len[wg];
> > + }
> > }
> > }
>
> this will not work with the data structs of the decoder
> ms_mask is 120 elements
> also the new group_len is still leaving holes in the arrays, its
> surely better now as it doesnt loop over the 0 elements anymore but
> they are still there.
> I do not see why they should be there, it does not appear that there
> is any advantage in them being there ... but if iam wrong iam sure you
> will explain what they are good for?
Now it will work with flat data arrays having size 128 (which is comparable).
I think this should be acceptable and working with fixed offset
(window_num*16 + scalefactor_band_index) is easier.
Also I must note that decoder is presented with grouping data first and
decodes the rest of data basing on it.
Encoder, on the other hand, has transformed coefficients first, and applies
grouping to them later. So it's easier and convenient to use first window of
group to hold needed scalefactors, band types, etc. than move that stuff
to another windows.
> >
> > /**
> > + * Return number of bits needed to write codebook run length value.
> > + *
> > + * @param run run length
> > + * @param bits number of bits used to code value (5 for long frames, 3 for short frames)
> > + */
> > +static av_always_inline int calculate_run_bits(int run, const int bits)
> > +{
> > + int esc = (1 << bits) - 1;
> > + return (1 + (run >= esc)) * bits;
> > +}
>
> I think a table would be simpler and faster.
done
> > +
> > +/**
> > + * Calculate the number of bits needed to code given band with given codebook.
> > + *
> > + * @param s encoder context
> > + * @param cpe channel element
> > + * @param channel channel number inside channel pair
> > + * @param win window group start number
> > + * @param start scalefactor band position in spectral coefficients
> > + * @param size scalefactor band size
> > + * @param cb codebook number
> > + */
> > +static int calculate_band_bits(AACEncContext *s, ChannelElement *cpe, int channel, int win, int group_len, int start, int size, int cb)
> > +{
> > + int i, j, w;
> > + int score = 0, dim, idx, start2;
> > + int range;
> > +
> > + if(!cb) return 0;
> > + cb--;
>
> > + dim = (aac_cb_info[cb].flags & CB_PAIRS) ? 2 : 4;
>
> as CB_PAIRS is never used for anything but selecting betweem 2 and 4, it
> would be simpler to store that or maybe drop CB_PAIRS completely and use
> the > 123 check like in the decoder
dropped, I use just check from decoder (and for other codebook-related stuff too)
> > + if(aac_cb_info[cb].flags & CB_UNSIGNED)
> > + range = aac_cb_info[cb].maxval + 1;
> > + else
> > + range = aac_cb_info[cb].maxval*2 + 1;
>
> it would be simpler to store range in the table i think.
no problems
> > +
> > + start2 = start;
> > + if(aac_cb_info[cb].flags & CB_ESCAPE){
> > + int coef_abs[2];
> > + for(w = win; w < win + group_len; w++){
> > + for(i = start2; i < start2 + size; i += dim){
> > + idx = 0;
> > + for(j = 0; j < dim; j++)
> > + coef_abs[j] = FFABS(cpe->ch[channel].icoefs[i+j]);
> > + for(j = 0; j < dim; j++)
> > + idx = idx*17 + FFMIN(coef_abs[j], 16);
> > + score += ff_aac_spectral_bits[cb][idx];
> > + for(j = 0; j < dim; j++)
> > + if(cpe->ch[channel].icoefs[i+j])
> > + score++;
> > + for(j = 0; j < dim; j++)
> > + if(coef_abs[j] > 15)
> > + score += av_log2(coef_abs[j]) * 2 - 4 + 1;
>
> please merge the
> for(j = 0; j < dim; j++)
> loops
> and this applies to more than just the part above
merged
> > + }
> > + start2 += 128;
> > + }
> > + }else if(aac_cb_info[cb].flags & CB_UNSIGNED){
> > + for(w = win; w < win + group_len; w++){
> > + for(i = start2; i < start2 + size; i += dim){
> > + idx = 0;
> > + for(j = 0; j < dim; j++)
> > + idx = idx * range + FFABS(cpe->ch[channel].icoefs[i+j]);
> > + score += ff_aac_spectral_bits[cb][idx];
> > + for(j = 0; j < dim; j++)
> > + if(cpe->ch[channel].icoefs[i+j])
> > + score++;
> > + }
> > + start2 += 128;
> > + }
> > + }else{
> > + for(w = win; w < win + group_len; w++){
> > + for(i = start2; i < start2 + size; i += dim){
> > + idx = 0;
>
> > + for(j = 0; j < dim; j++)
> > + idx = idx * range + cpe->ch[channel].icoefs[i+j] + aac_cb_info[cb].maxval;
> > + score += ff_aac_spectral_bits[cb][idx];
> > + }
>
> the addition of maxval can be factored out of the dim loop, its effect is just
> the addition of a constant to idx no matter what icoefs contains.
factored out. This constant turned out to be 40 for all codebooks.
[...]
> > + s->path[0].bits = 0;
> > + for(i = 1; i <= max_sfb; i++)
> > + s->path[i].bits = INT_MAX;
> > + for(i = 0; i < max_sfb; i++){
> > + for(j = 1; j <= max_sfb - i; j++){
> > + bits = INT_MAX;
> > + ccb = 0;
> > + for(cb = 0; cb < 12; cb++){
> > + int sum = 0;
> > + for(k = 0; k < j; k++){
> > + if(s->band_bits[i + k][cb] == INT_MAX){
> > + sum = INT_MAX;
> > + break;
> > + }
> > + sum += s->band_bits[i + k][cb];
> > + }
> > + if(sum < bits){
> > + bits = sum;
> > + ccb = cb;
> > + }
> > + }
> > + assert(bits != INT_MAX);
> > + bits += s->path[i].bits + calculate_run_bits(j, run_bits);
> > + if(bits < s->path[i+j].bits){
> > + s->path[i+j].bits = bits;
> > + s->path[i+j].codebook = ccb;
> > + s->path[i+j].prev_idx = i;
> > + }
> > + }
> > + }
>
> hmm this is doing a loop more than it should ...
> (note code below ignores [-1] and INT_MAX+a issues)
>
> s->path[-1].bits= 0;
> for(i = 0; i < max_sfb; i++){
> s->path[i].bits= INT_MAX;
> for(cb = 0; cb < 12; cb++){
> int sum=0;
> for(k = 0; k <= i; k++){
> sum += s->band_bits[i - k][cb];
> sum2= sum + calculate_run_bits(k, run_bits) + s->path[i-k-1].bits;
> if(sum2 < s->path[i].bits){
> s->path[i].bits= sum2;
> s->path[i].codebook= cb;
> s->path[i].prev_idx= i - k - 1;
> }else if(sum2 - s->path[i].bits > THRESHOLD) // early termination to skip impossible cases
> break;
> }
> }
> }
I can't see a significant difference between them, except your code
searches paths backward instead of forward. And calculates runs per
codebook, so sum is updated instead of full recalculation (which I
should adopt).
Leaved as is for now.
> > +
> > + //convert resulting path from backward-linked list
> > + stack_len = 0;
> > + idx = max_sfb;
> > + while(idx > 0){
> > + stack[stack_len++] = idx;
> > + idx = s->path[idx].prev_idx;
> > + }
> > +
> > + //perform actual band info encoding
> > + start = 0;
> > + for(i = stack_len - 1; i >= 0; i--){
> > + put_bits(&s->pb, 4, s->path[stack[i]].codebook);
> > + count = stack[i] - s->path[stack[i]].prev_idx;
>
> > + for(j = 0; j < count; j++){
> > + cpe->ch[channel].band_type[win][start] = s->path[stack[i]].codebook;
> > + cpe->ch[channel].zeroes[win][start] = !s->path[stack[i]].codebook;
> > + start++;
> > + }
>
> memset
umm, band_type[] type is int
> > + while(count >= run_esc){
> > + put_bits(&s->pb, run_bits, run_esc);
> > + count -= run_esc;
> > + }
> > + put_bits(&s->pb, run_bits, count);
> > + }
> > +}
> > +
>
> > +/**
> > + * Encode one scalefactor band with selected codebook.
> > + */
> > +static void encode_band_coeffs(AACEncContext *s, ChannelElement *cpe, int channel, int start, int size, int cb)
> > +{
> > + const uint8_t *bits = ff_aac_spectral_bits [aac_cb_info[cb].cb_num];
> > + const uint16_t *codes = ff_aac_spectral_codes[aac_cb_info[cb].cb_num];
> > + const int dim = (aac_cb_info[cb].flags & CB_PAIRS) ? 2 : 4;
> > + int i, j, idx, range;
> > +
> > + if(!bits) return;
> > +
> > + if(aac_cb_info[cb].flags & CB_UNSIGNED)
> > + range = aac_cb_info[cb].maxval + 1;
> > + else
> > + range = aac_cb_info[cb].maxval*2 + 1;
> > +
> > + if(aac_cb_info[cb].flags & CB_ESCAPE){
> > + int coef_abs[2];
> > + for(i = start; i < start + size; i += dim){
> > + idx = 0;
>
> > + for(j = 0; j < dim; j++)
> > + coef_abs[j] = FFABS(cpe->ch[channel].icoefs[i+j]);
> > + for(j = 0; j < dim; j++)
> > + idx = idx*17 + FFMIN(coef_abs[j], 16);
>
> the loops can be merged
merged
> > + put_bits(&s->pb, bits[idx], codes[idx]);
> > + //output signs
> > + for(j = 0; j < dim; j++)
> > + if(cpe->ch[channel].icoefs[i+j])
> > + put_bits(&s->pb, 1, cpe->ch[channel].icoefs[i+j] < 0);
> > + //output escape values
> > + for(j = 0; j < dim; j++)
> > + if(coef_abs[j] > 15){
> > + int len = av_log2(coef_abs[j]);
> > +
> > + put_bits(&s->pb, len - 4 + 1, (1 << (len - 4 + 1)) - 2);
> > + put_bits(&s->pb, len, coef_abs[j] & ((1 << len) - 1));
> > + }
> > + }
> > + }else if(aac_cb_info[cb].flags & CB_UNSIGNED){
> > + for(i = start; i < start + size; i += dim){
> > + idx = 0;
> > + for(j = 0; j < dim; j++)
> > + idx = idx * range + FFABS(cpe->ch[channel].icoefs[i+j]);
> > + put_bits(&s->pb, bits[idx], codes[idx]);
> > + //output signs
> > + for(j = 0; j < dim; j++)
> > + if(cpe->ch[channel].icoefs[i+j])
> > + put_bits(&s->pb, 1, cpe->ch[channel].icoefs[i+j] < 0);
> > + }
> > + }else{
> > + for(i = start; i < start + size; i += dim){
>
> > + idx = 0;
> > + for(j = 0; j < dim; j++)
> > + idx = idx * range + cpe->ch[channel].icoefs[i+j] + aac_cb_info[cb].maxval;
>
> the add maxval can be factored out
> SUM(0,i,dim) maxval*range^i is a constant
factored out that += 40 thing
> [...]
>
> > +/**
> > + * Encode scalefactors.
> > + */
> > +static void encode_scale_factors(AVCodecContext *avctx, AACEncContext *s, ChannelElement *cpe, int channel, int global_gain)
> > +{
> > + int off = global_gain, diff;
> > + int i, w, wg;
> > +
> > + w = 0;
> > + for(wg = 0; wg < cpe->ch[channel].ics.num_window_groups; wg++){
> > + for(i = 0; i < cpe->ch[channel].ics.max_sfb; i++){
> > + if(!cpe->ch[channel].zeroes[w][i]){
>
> > + if(cpe->ch[channel].sf_idx[w][i] == 256) cpe->ch[channel].sf_idx[w][i] = off;
>
> what is 256 ?
> and please write
> if(condition)
> statement;
>
> its more readable than
> if(condition) statement;
> when condition and statement are complex
done
[...]
> > + init_put_bits(&s->pb, frame, buf_size*8);
> > + if(avctx->frame_number==1 && !(avctx->flags & CODEC_FLAG_BITEXACT)){
> > + put_bitstream_info(avctx, s, LIBAVCODEC_IDENT);
> > + }
>
> this still does not look like it is stored in extradata and neither is it
> repeated.
now it's repeated (but I still prefer more shy marking of the file)
> [...]
>
> ill review psychoacoustics ASAP, yes i need little pauses :)
so here's encoder part, psy model will be sent separately
> [...]
> --
> Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> Let us carefully observe those good qualities wherein our enemies excel us
> and endeavor to excel them, by avoiding what is faulty, and imitating what
> is excellent in them. -- Plutarch
-------------- next part --------------
--- /home/kst/cvs-get/ffmpeg/libavcodec/aacenc.c 2008-08-16 14:53:38.000000000 +0300
+++ aacenc.c 2008-08-16 13:48:45.000000000 +0300
@@ -118,6 +118,50 @@
swb_size_128_16, swb_size_128_16, swb_size_128_8
};
+#define CB_UNSIGNED 0x01 ///< coefficients are coded as absolute values
+#define CB_PAIRS 0x02 ///< coefficients are grouped into pairs before coding (quads by default)
+#define CB_ESCAPE 0x04 ///< codebook allows escapes
+
+/** spectral coefficients codebook information */
+static const struct {
+ int16_t maxval; ///< maximum possible value
+ int8_t range; ///< value used in vector calculation
+} aac_cb_info[] = {
+ { 0, -1 }, // zero codebook
+ { 1, 3 },
+ { 1, 3 },
+ { 2, 3 },
+ { 2, 3 },
+ { 4, 9 },
+ { 4, 9 },
+ { 7, 8 },
+ { 7, 8 },
+ { 12, 13 },
+ { 12, 13 },
+ { 8191, 17 },
+ { -1, -1 }, // reserved
+ { -1, -1 }, // perceptual noise substitution
+ { -1, -1 }, // intensity out-of-phase
+ { -1, -1 }, // intensity in-phase
+};
+
+/** bits needed to code codebook run value for long windows */
+static const uint8_t run_value_bits_long[64] = {
+ 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
+ 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 10,
+ 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
+ 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 15
+};
+
+/** bits needed to code codebook run value for short windows */
+static const uint8_t run_value_bits_short[16] = {
+ 3, 3, 3, 3, 3, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 9
+};
+
+static const uint8_t* run_value_bits[2] = {
+ run_value_bits_long, run_value_bits_short
+};
+
/** default channel configurations */
static const uint8_t aac_chan_configs[6][5] = {
{1, TYPE_SCE}, // 1 channel - single channel element
@@ -129,6 +173,15 @@
};
/**
+ * structure used in optimal codebook search
+ */
+typedef struct BandCodingPath {
+ int prev_idx; ///< pointer to the previous path point
+ int codebook; ///< codebook for coding band run
+ int bits; ///< number of bit needed to code given number of bands
+} BandCodingPath;
+
+/**
* AAC encoder context
*/
typedef struct {
@@ -136,6 +189,20 @@
MDCTContext mdct1024; ///< long (1024 samples) frame transform context
MDCTContext mdct128; ///< short (128 samples) frame transform context
DSPContext dsp;
+ DECLARE_ALIGNED_16(FFTSample, output[2048]); ///< temporary buffer for MDCT input coefficients
+ int16_t* samples; ///< saved preprocessed input
+
+ int samplerate_index; ///< MPEG-4 samplerate index
+ const uint8_t *swb_sizes1024; ///< scalefactor band sizes for long frame
+ int swb_num1024; ///< number of scalefactor bands for long frame
+ const uint8_t *swb_sizes128; ///< scalefactor band sizes for short frame
+ int swb_num128; ///< number of scalefactor bands for short frame
+
+ ChannelElement *cpe; ///< channel elements
+ AACPsyContext psy; ///< psychoacoustic model context
+ int last_frame;
+ BandCodingPath path[64]; ///< auxiliary data needed for optimal band info coding
+ int band_bits[64][12]; ///< bits needed to encode each band with each codebook
} AACEncContext;
/**
@@ -203,6 +270,55 @@
return 0;
}
+static void apply_window_and_mdct(AVCodecContext *avctx, AACEncContext *s, ChannelElement *cpe, short *audio, int channel)
+{
+ int i, j, k;
+ const float * lwindow = cpe->ch[channel].ics.use_kb_window[0] ? ff_aac_kbd_long_1024 : ff_sine_1024;
+ const float * swindow = cpe->ch[channel].ics.use_kb_window[0] ? ff_aac_kbd_short_128 : ff_sine_128;
+ const float * pwindow = cpe->ch[channel].ics.use_kb_window[1] ? ff_aac_kbd_short_128 : ff_sine_128;
+
+ if (cpe->ch[channel].ics.window_sequence[0] != EIGHT_SHORT_SEQUENCE) {
+ memcpy(s->output, cpe->ch[channel].saved, sizeof(float)*1024);
+ if(cpe->ch[channel].ics.window_sequence[0] == LONG_STOP_SEQUENCE){
+ memset(s->output, 0, sizeof(s->output[0]) * 448);
+ for(i = 448; i < 576; i++)
+ s->output[i] = cpe->ch[channel].saved[i] * pwindow[i - 448];
+ for(i = 576; i < 704; i++)
+ s->output[i] = cpe->ch[channel].saved[i];
+ }
+ if(cpe->ch[channel].ics.window_sequence[0] != LONG_START_SEQUENCE){
+ j = channel;
+ for (i = 0; i < 1024; i++, j += avctx->channels){
+ s->output[i+1024] = audio[j] * lwindow[1024 - i - 1];
+ cpe->ch[channel].saved[i] = audio[j] * lwindow[i];
+ }
+ }else{
+ j = channel;
+ for(i = 0; i < 448; i++, j += avctx->channels)
+ s->output[i+1024] = audio[j];
+ for(i = 448; i < 576; i++, j += avctx->channels)
+ s->output[i+1024] = audio[j] * swindow[576 - i - 1];
+ memset(s->output+1024+576, 0, sizeof(s->output[0]) * 448);
+ j = channel;
+ for(i = 0; i < 1024; i++, j += avctx->channels)
+ cpe->ch[channel].saved[i] = audio[j];
+ }
+ ff_mdct_calc(&s->mdct1024, cpe->ch[channel].coeffs, s->output);
+ }else{
+ j = channel;
+ for (k = 0; k < 1024; k += 128) {
+ for(i = 448 + k; i < 448 + k + 256; i++)
+ s->output[i - 448 - k] = (i < 1024) ? cpe->ch[channel].saved[i] : audio[channel + (i-1024)*avctx->channels] / 512.0;
+ s->dsp.vector_fmul (s->output, k ? swindow : pwindow, 128);
+ s->dsp.vector_fmul_reverse(s->output+128, s->output+128, swindow, 128);
+ ff_mdct_calc(&s->mdct128, cpe->ch[channel].coeffs + k, s->output);
+ }
+ j = channel;
+ for(i = 0; i < 1024; i++, j += avctx->channels)
+ cpe->ch[channel].saved[i] = audio[j];
+ }
+}
+
/**
* Encode ics_info element.
* @see Table 4.6 (syntax of ics_info)
@@ -210,7 +326,7 @@
static void put_ics_info(AVCodecContext *avctx, IndividualChannelStream *info)
{
AACEncContext *s = avctx->priv_data;
- int i;
+ int wg;
put_bits(&s->pb, 1, 0); // ics_reserved bit
put_bits(&s->pb, 2, info->window_sequence[0]);
@@ -220,8 +336,295 @@
put_bits(&s->pb, 1, 0); // no prediction
}else{
put_bits(&s->pb, 4, info->max_sfb);
- for(i = 1; i < info->num_windows; i++)
- put_bits(&s->pb, 1, info->group_len[i]);
+ for(wg = 0; wg < info->num_window_groups; wg++){
+ if(wg)
+ put_bits(&s->pb, 1, 0);
+ if(info->group_len[wg] > 1)
+ put_sbits(&s->pb, info->group_len[wg] - 1, 0xFF);
+ }
+ }
+}
+
+/**
+ * Encode MS data.
+ * @see 4.6.8.1 "Joint Coding - M/S Stereo"
+ */
+static void encode_ms_info(PutBitContext *pb, ChannelElement *cpe)
+{
+ int i, w, wg;
+
+ put_bits(pb, 2, cpe->ms_mode);
+ if(cpe->ms_mode == 1){
+ w = 0;
+ for(wg = 0; wg < cpe->ch[0].ics.num_window_groups; wg++){
+ for(i = 0; i < cpe->ch[0].ics.max_sfb; i++)
+ put_bits(pb, 1, cpe->ms_mask[w + i]);
+ w += cpe->ch[0].ics.group_len[wg]*16;
+ }
+ }
+}
+
+/**
+ * Calculate the number of bits needed to code given band with given codebook.
+ *
+ * @param s encoder context
+ * @param cpe channel element
+ * @param channel channel number inside channel pair
+ * @param win window group start number
+ * @param start scalefactor band position in spectral coefficients
+ * @param size scalefactor band size
+ * @param cb codebook number
+ */
+static int calculate_band_bits(AACEncContext *s, ChannelElement *cpe, int channel, int win, int group_len, int start, int size, int cb)
+{
+ int i, j, w;
+ int score = 0, dim, idx, start2;
+ int range = aac_cb_info[cb].range;
+
+ if(!range) return 0;
+ cb--;
+ dim = cb < FIRST_PAIR_BT ? 4 : 2;
+
+ start2 = start;
+ if(cb == ESC_BT){
+ int coef_abs[2];
+ for(w = win; w < win + group_len; w++){
+ for(i = start2; i < start2 + size; i += dim){
+ idx = 0;
+ for(j = 0; j < dim; j++){
+ coef_abs[j] = FFABS(cpe->ch[channel].icoefs[i+j]);
+ idx = idx*17 + FFMIN(coef_abs[j], 16);
+ }
+ score += ff_aac_spectral_bits[cb][idx];
+ for(j = 0; j < dim; j++)
+ if(cpe->ch[channel].icoefs[i+j])
+ score++;
+ for(j = 0; j < dim; j++)
+ if(coef_abs[j] > 15)
+ score += av_log2(coef_abs[j]) * 2 - 4 + 1;
+ }
+ start2 += 128;
+ }
+ }else if(IS_CODEBOOK_UNSIGNED(cb)){
+ for(w = win; w < win + group_len; w++){
+ for(i = start2; i < start2 + size; i += dim){
+ idx = 0;
+ for(j = 0; j < dim; j++)
+ idx = idx * range + FFABS(cpe->ch[channel].icoefs[i+j]);
+ score += ff_aac_spectral_bits[cb][idx];
+ for(j = 0; j < dim; j++)
+ if(cpe->ch[channel].icoefs[i+j])
+ score++;
+ }
+ start2 += 128;
+ }
+ }else{
+ for(w = win; w < win + group_len; w++){
+ for(i = start2; i < start2 + size; i += dim){
+ idx = 0;
+ for(j = 0; j < dim; j++)
+ idx = idx * range + cpe->ch[channel].icoefs[i+j];
+ //it turned out that all signed codebooks use the same offset for index coding
+ idx += 40;
+ score += ff_aac_spectral_bits[cb][idx];
+ }
+ start2 += 128;
+ }
+ }
+ return score;
+}
+
+/**
+ * Encode band info for single window group bands.
+ */
+static void encode_window_bands_info(AACEncContext *s, ChannelElement *cpe, int channel, int win, int group_len){
+ int maxval;
+ int w, swb, cb, ccb, start, start2, size;
+ int i, j, k;
+ const int max_sfb = cpe->ch[channel].ics.max_sfb;
+ const int run_bits = cpe->ch[channel].ics.num_windows == 1 ? 5 : 3;
+ const int run_esc = (1 << run_bits) - 1;
+ int bits, idx, count;
+ int stack[64], stack_len;
+
+ start = win*128;
+ for(swb = 0; swb < max_sfb; swb++){
+ maxval = 0;
+ start2 = start;
+ size = cpe->ch[channel].ics.swb_sizes[swb];
+ if(cpe->ch[channel].zeroes[win*16 + swb])
+ maxval = 0;
+ else{
+ for(w = win; w < win + group_len; w++){
+ for(i = start2; i < start2 + size; i++){
+ maxval = FFMAX(maxval, FFABS(cpe->ch[channel].icoefs[i]));
+ }
+ start2 += 128;
+ }
+ }
+ for(cb = 0; cb < 12; cb++){
+ if(aac_cb_info[cb].maxval < maxval)
+ s->band_bits[swb][cb] = INT_MAX;
+ else
+ s->band_bits[swb][cb] = calculate_band_bits(s, cpe, channel, win, group_len, start, size, cb);
+ }
+ start += cpe->ch[channel].ics.swb_sizes[swb];
+ }
+ s->path[0].bits = 0;
+ for(i = 1; i <= max_sfb; i++)
+ s->path[i].bits = INT_MAX;
+ for(i = 0; i < max_sfb; i++){
+ for(j = 1; j <= max_sfb - i; j++){
+ bits = INT_MAX;
+ ccb = 0;
+ for(cb = 0; cb < 12; cb++){
+ int sum = 0;
+ for(k = 0; k < j; k++){
+ if(s->band_bits[i + k][cb] == INT_MAX){
+ sum = INT_MAX;
+ break;
+ }
+ sum += s->band_bits[i + k][cb];
+ }
+ if(sum < bits){
+ bits = sum;
+ ccb = cb;
+ }
+ }
+ assert(bits != INT_MAX);
+ bits += s->path[i].bits + run_value_bits[cpe->ch[channel].ics.num_windows == 8][j];
+ if(bits < s->path[i+j].bits){
+ s->path[i+j].bits = bits;
+ s->path[i+j].codebook = ccb;
+ s->path[i+j].prev_idx = i;
+ }
+ }
+ }
+
+ //convert resulting path from backward-linked list
+ stack_len = 0;
+ idx = max_sfb;
+ while(idx > 0){
+ stack[stack_len++] = idx;
+ idx = s->path[idx].prev_idx;
+ }
+
+ //perform actual band info encoding
+ start = 0;
+ for(i = stack_len - 1; i >= 0; i--){
+ put_bits(&s->pb, 4, s->path[stack[i]].codebook);
+ count = stack[i] - s->path[stack[i]].prev_idx;
+ for(j = 0; j < count; j++){
+ cpe->ch[channel].band_type[win*16 + start] = s->path[stack[i]].codebook;
+ cpe->ch[channel].zeroes[win*16 + start] = !s->path[stack[i]].codebook;
+ start++;
+ }
+ while(count >= run_esc){
+ put_bits(&s->pb, run_bits, run_esc);
+ count -= run_esc;
+ }
+ put_bits(&s->pb, run_bits, count);
+ }
+}
+
+/**
+ * Encode one scalefactor band with selected codebook.
+ */
+static void encode_band_coeffs(AACEncContext *s, ChannelElement *cpe, int channel, int start, int size, int cb)
+{
+ const uint8_t *bits = ff_aac_spectral_bits [cb - 1];
+ const uint16_t *codes = ff_aac_spectral_codes[cb - 1];
+ const int range = aac_cb_info[cb].range;
+ const int dim = (cb < FIRST_PAIR_BT) ? 4 : 2;
+ int i, j, idx;
+
+ //do not encode zero or special codebooks
+ if(range == -1) return;
+
+ if(cb == ESC_BT){
+ int coef_abs[2];
+ for(i = start; i < start + size; i += dim){
+ idx = 0;
+ for(j = 0; j < dim; j++){
+ coef_abs[j] = FFABS(cpe->ch[channel].icoefs[i+j]);
+ idx = idx*17 + FFMIN(coef_abs[j], 16);
+ }
+ put_bits(&s->pb, bits[idx], codes[idx]);
+ //output signs
+ for(j = 0; j < dim; j++)
+ if(cpe->ch[channel].icoefs[i+j])
+ put_bits(&s->pb, 1, cpe->ch[channel].icoefs[i+j] < 0);
+ //output escape values
+ for(j = 0; j < dim; j++)
+ if(coef_abs[j] > 15){
+ int len = av_log2(coef_abs[j]);
+
+ put_bits(&s->pb, len - 4 + 1, (1 << (len - 4 + 1)) - 2);
+ put_bits(&s->pb, len, coef_abs[j] & ((1 << len) - 1));
+ }
+ }
+ }else if(IS_CODEBOOK_UNSIGNED(cb)){
+ for(i = start; i < start + size; i += dim){
+ idx = 0;
+ for(j = 0; j < dim; j++)
+ idx = idx * range + FFABS(cpe->ch[channel].icoefs[i+j]);
+ put_bits(&s->pb, bits[idx], codes[idx]);
+ //output signs
+ for(j = 0; j < dim; j++)
+ if(cpe->ch[channel].icoefs[i+j])
+ put_bits(&s->pb, 1, cpe->ch[channel].icoefs[i+j] < 0);
+ }
+ }else{
+ for(i = start; i < start + size; i += dim){
+ idx = 0;
+ for(j = 0; j < dim; j++)
+ idx = idx * range + cpe->ch[channel].icoefs[i+j];
+ //it turned out that all signed codebooks use the same offset for index coding
+ idx += 40;
+ put_bits(&s->pb, bits[idx], codes[idx]);
+ }
+ }
+}
+
+/**
+ * Encode scalefactor band coding type.
+ */
+static void encode_band_info(AVCodecContext *avctx, AACEncContext *s, ChannelElement *cpe, int channel)
+{
+ int w, wg;
+
+ w = 0;
+ for(wg = 0; wg < cpe->ch[channel].ics.num_window_groups; wg++){
+ encode_window_bands_info(s, cpe, channel, w, cpe->ch[channel].ics.group_len[wg]);
+ w += cpe->ch[channel].ics.group_len[wg];
+ }
+}
+
+/**
+ * Encode scalefactors.
+ */
+static void encode_scale_factors(AVCodecContext *avctx, AACEncContext *s, ChannelElement *cpe, int channel, int global_gain)
+{
+ int off = global_gain, diff;
+ int i, w, wg;
+
+ w = 0;
+ for(wg = 0; wg < cpe->ch[channel].ics.num_window_groups; wg++){
+ for(i = 0; i < cpe->ch[channel].ics.max_sfb; i++){
+ if(!cpe->ch[channel].zeroes[w*16 + i]){
+ /* if we have encountered scale=256 it means empty band
+ * which was decided to be coded by encoder, so assign it
+ * last scalefactor value for compression efficiency
+ */
+ if(cpe->ch[channel].sf_idx[w*16 + i] == 256)
+ cpe->ch[channel].sf_idx[w*16 + i] = off;
+ diff = cpe->ch[channel].sf_idx[w*16 + i] - off + SCALE_DIFF_ZERO;
+ if(diff < 0 || diff > 120) av_log(avctx, AV_LOG_ERROR, "Scalefactor difference is too big to be coded\n");
+ off = cpe->ch[channel].sf_idx[w*16 + i];
+ put_bits(&s->pb, ff_aac_scalefactor_bits[diff], ff_aac_scalefactor_code[diff]);
+ }
+ }
+ w += cpe->ch[channel].ics.group_len[wg];
}
}
@@ -244,6 +647,46 @@
}
/**
+ * Encode temporal noise shaping data.
+ */
+static void encode_tns_data(AVCodecContext *avctx, AACEncContext *s, ChannelElement *cpe, int channel)
+{
+ int i, w;
+
+ put_bits(&s->pb, 1, cpe->ch[channel].tns.present);
+ if(!cpe->ch[channel].tns.present) return;
+ if(cpe->ch[channel].ics.window_sequence[0] == EIGHT_SHORT_SEQUENCE){
+ for(w = 0; w < cpe->ch[channel].ics.num_windows; w++){
+ put_bits(&s->pb, 1, cpe->ch[channel].tns.n_filt[w]);
+ if(!cpe->ch[channel].tns.n_filt[w]) continue;
+ put_bits(&s->pb, 1, cpe->ch[channel].tns.coef_res[w] - 3);
+ put_bits(&s->pb, 4, cpe->ch[channel].tns.length[w][0]);
+ put_bits(&s->pb, 3, cpe->ch[channel].tns.order[w][0]);
+ if(cpe->ch[channel].tns.order[w][0]){
+ put_bits(&s->pb, 1, cpe->ch[channel].tns.direction[w][0]);
+ put_bits(&s->pb, 1, cpe->ch[channel].tns.coef_compress[w][0]);
+ for(i = 0; i < cpe->ch[channel].tns.order[w][0]; i++)
+ put_bits(&s->pb, cpe->ch[channel].tns.coef_len[w][0], cpe->ch[channel].tns.coef[w][0][i]);
+ }
+ }
+ }else{
+ put_bits(&s->pb, 1, cpe->ch[channel].tns.n_filt[0]);
+ if(!cpe->ch[channel].tns.n_filt[0]) return;
+ put_bits(&s->pb, 1, cpe->ch[channel].tns.coef_res[0] - 3);
+ for(w = 0; w < cpe->ch[channel].tns.n_filt[0]; w++){
+ put_bits(&s->pb, 6, cpe->ch[channel].tns.length[0][w]);
+ put_bits(&s->pb, 5, cpe->ch[channel].tns.order[0][w]);
+ if(cpe->ch[channel].tns.order[0][w]){
+ put_bits(&s->pb, 1, cpe->ch[channel].tns.direction[0][w]);
+ put_bits(&s->pb, 1, cpe->ch[channel].tns.coef_compress[0][w]);
+ for(i = 0; i < cpe->ch[channel].tns.order[0][w]; i++)
+ put_bits(&s->pb, cpe->ch[channel].tns.coef_len[0][w], cpe->ch[channel].tns.coef[0][w][i]);
+ }
+ }
+ }
+}
+
+/**
* Encode spectral coefficients processed by psychoacoustic model.
*/
static void encode_spectral_coeffs(AVCodecContext *avctx, AACEncContext *s, ChannelElement *cpe, int channel)
@@ -254,12 +697,12 @@
for(wg = 0; wg < cpe->ch[channel].ics.num_window_groups; wg++){
start = 0;
for(i = 0; i < cpe->ch[channel].ics.max_sfb; i++){
- if(cpe->ch[channel].zeroes[w][i]){
+ if(cpe->ch[channel].zeroes[w*16 + i]){
start += cpe->ch[channel].ics.swb_sizes[i];
continue;
}
for(w2 = w; w2 < w + cpe->ch[channel].ics.group_len[wg]; w2++){
- encode_band_coeffs(s, cpe, channel, start + w2*128, cpe->ch[channel].ics.swb_sizes[i], cpe->ch[channel].band_type[w][i]);
+ encode_band_coeffs(s, cpe, channel, start + w2*128, cpe->ch[channel].ics.swb_sizes[i], cpe->ch[channel].band_type[w*16 + i]);
}
start += cpe->ch[channel].ics.swb_sizes[i];
}
@@ -268,6 +711,39 @@
}
/**
+ * Encode one channel of audio data.
+ */
+static int encode_individual_channel(AVCodecContext *avctx, ChannelElement *cpe, int channel)
+{
+ AACEncContext *s = avctx->priv_data;
+ int g, w, wg;
+ int global_gain = 0;
+
+ //determine global gain as standard recommends - the first scalefactor value
+ w = 0;
+ for(wg = 0; wg < cpe->ch[channel].ics.num_window_groups; wg++){
+ for(g = 0; g < cpe->ch[channel].ics.max_sfb; g++){
+ if(!cpe->ch[channel].zeroes[w + g]){
+ global_gain = cpe->ch[channel].sf_idx[w + g];
+ break;
+ }
+ }
+ if(global_gain) break;
+ w += cpe->ch[channel].ics.group_len[wg]*16;
+ }
+
+ put_bits(&s->pb, 8, global_gain);
+ if(!cpe->common_window) put_ics_info(avctx, &cpe->ch[channel].ics);
+ encode_band_info(avctx, s, cpe, channel);
+ encode_scale_factors(avctx, s, cpe, channel, global_gain);
+ encode_pulses(avctx, s, &cpe->ch[channel].pulse, channel);
+ encode_tns_data(avctx, s, cpe, channel);
+ put_bits(&s->pb, 1, 0); //ssr
+ encode_spectral_coeffs(avctx, s, cpe, channel);
+ return 0;
+}
+
+/**
* Write some auxiliary information about the created AAC file.
*/
static void put_bitstream_info(AVCodecContext *avctx, AACEncContext *s, const char *name)
@@ -287,6 +763,80 @@
put_bits(&s->pb, 12 - padbits, 0);
}
+static int aac_encode_frame(AVCodecContext *avctx,
+ uint8_t *frame, int buf_size, void *data)
+{
+ AACEncContext *s = avctx->priv_data;
+ int16_t *samples = s->samples, *samples2, *la;
+ ChannelElement *cpe;
+ int i, j, chans, tag, start_ch;
+ const uint8_t *chan_map = aac_chan_configs[avctx->channels-1];
+ int chan_el_counter[4];
+
+ if(s->last_frame)
+ return 0;
+ if(data){
+ if((s->psy.flags & PSY_MODEL_NO_PREPROC) == PSY_MODEL_NO_PREPROC){
+ memcpy(s->samples + 1024 * avctx->channels, data, 1024 * avctx->channels * sizeof(s->samples[0]));
+ }else{
+ start_ch = 0;
+ samples2 = s->samples + 1024 * avctx->channels;
+ for(i = 0; i < chan_map[0]; i++){
+ tag = chan_map[i+1];
+ chans = tag == TYPE_CPE ? 2 : 1;
+ ff_aac_psy_preprocess(&s->psy, (uint16_t*)data + start_ch, samples2 + start_ch, i, tag);
+ start_ch += chans;
+ }
+ }
+ }
+ if(!avctx->frame_number){
+ memcpy(s->samples, s->samples + 1024 * avctx->channels, 1024 * avctx->channels * sizeof(s->samples[0]));
+ return 0;
+ }
+
+ init_put_bits(&s->pb, frame, buf_size*8);
+ if((avctx->frame_number & 0xFF)==1 && !(avctx->flags & CODEC_FLAG_BITEXACT)){
+ put_bitstream_info(avctx, s, LIBAVCODEC_IDENT);
+ }
+ start_ch = 0;
+ memset(chan_el_counter, 0, sizeof(chan_el_counter));
+ for(i = 0; i < chan_map[0]; i++){
+ tag = chan_map[i+1];
+ chans = tag == TYPE_CPE ? 2 : 1;
+ cpe = &s->cpe[i];
+ samples2 = samples + start_ch;
+ la = samples2 + 1024 * avctx->channels + start_ch;
+ if(!data) la = NULL;
+ ff_aac_psy_suggest_window(&s->psy, samples2, la, i, tag, cpe);
+ for(j = 0; j < chans; j++){
+ apply_window_and_mdct(avctx, s, cpe, samples2, j);
+ }
+ ff_aac_psy_analyze(&s->psy, i, tag, cpe);
+ put_bits(&s->pb, 3, tag);
+ put_bits(&s->pb, 4, chan_el_counter[tag]++);
+ if(chans == 2){
+ put_bits(&s->pb, 1, cpe->common_window);
+ if(cpe->common_window){
+ put_ics_info(avctx, &cpe->ch[0].ics);
+ encode_ms_info(&s->pb, cpe);
+ }
+ }
+ for(j = 0; j < chans; j++){
+ encode_individual_channel(avctx, cpe, j);
+ }
+ start_ch += chans;
+ }
+
+ put_bits(&s->pb, 3, TYPE_END);
+ flush_put_bits(&s->pb);
+ avctx->frame_bits = put_bits_count(&s->pb);
+
+ if(!data)
+ s->last_frame = 1;
+ memcpy(s->samples, s->samples + 1024 * avctx->channels, 1024 * avctx->channels * sizeof(s->samples[0]));
+ return put_bits_count(&s->pb)>>3;
+}
+
static av_cold int aac_encode_end(AVCodecContext *avctx)
{
AACEncContext *s = avctx->priv_data;
More information about the ffmpeg-devel
mailing list