[FFmpeg-devel] [RFC] AAC Encoder
Kostya
kostya.shishkov
Tue Aug 12 07:33:15 CEST 2008
On Mon, Aug 11, 2008 at 08:03:49PM +0200, Michael Niedermayer wrote:
> On Mon, Aug 11, 2008 at 12:18:03PM +0300, Kostya wrote:
> > Here's my implementation of AAC encoder.
> >
> > Since AAC decoder is not fully in SVN yet, I'm omitting data structures
> > and declarations used from it, I'll sort it after decoder commit.
> > The main difference is that psy model fills data structures and encoder
> > writes bitstream with them, so several structures have additional members.
> >
> > I've also stripped out model based on 3GPP 26.403 to make review easier.
>
[...]
> > +/** default channel configurations */
> > +static const uint8_t aac_chan_configs[6][5] = {
> > + {1, ID_SCE}, // 1 channel - single channel element
> > + {1, ID_CPE}, // 2 channels - channel pair
> > + {2, ID_SCE, ID_CPE}, // 3 channels - center + stereo
> > + {3, ID_SCE, ID_CPE, ID_SCE}, // 4 channels - front center + stereo + back center
> > + {3, ID_SCE, ID_CPE, ID_CPE}, // 5 channels - front center + stereo + back stereo
> > + {4, ID_SCE, ID_CPE, ID_CPE, ID_LFE}, // 6 channels - front center + stereo + back stereo + LFE
> > +};
>
> can this maybe be used to simplify the decoder?
No, decoder relies on totally different model of representing channels
in order to accept custom channel configurations.
[...]
> > +/**
> > + * Perform windowing and MDCT.
> > + */
> > +static void analyze(AVCodecContext *avctx, AACEncContext *s, ChannelElement *cpe, short *audio, int channel)
> > +{
> > + int i, j, k;
> > + const float * lwindow = cpe->ch[channel].ics.use_kb_window[0] ? ff_aac_kbd_long_1024 : ff_aac_sine_long_1024;
> > + const float * swindow = cpe->ch[channel].ics.use_kb_window[0] ? ff_aac_kbd_short_128 : ff_aac_sine_short_128;
> > + const float * pwindow = cpe->ch[channel].ics.use_kb_window[1] ? ff_aac_kbd_short_128 : ff_aac_sine_short_128;
> > +
> > + if (cpe->ch[channel].ics.window_sequence[0] != EIGHT_SHORT_SEQUENCE) {
> > + memcpy(s->output, cpe->ch[channel].saved, sizeof(float)*1024);
> > + if(cpe->ch[channel].ics.window_sequence[0] == LONG_STOP_SEQUENCE){
> > + memset(s->output, 0, sizeof(s->output[0]) * 448);
> > + for(i = 448; i < 576; i++)
> > + s->output[i] = cpe->ch[channel].saved[i] * pwindow[i - 448];
> > + for(i = 576; i < 704; i++)
> > + s->output[i] = cpe->ch[channel].saved[i];
> > + }
> > + if(cpe->ch[channel].ics.window_sequence[0] != LONG_START_SEQUENCE){
> > + j = channel;
> > + for (i = 0; i < 1024; i++, j += avctx->channels){
> > + s->output[i+1024] = audio[j] / 512.0 * lwindow[1024 - i - 1];
> > + cpe->ch[channel].saved[i] = audio[j] / 512.0 * lwindow[i];
> > + }
> > + }else{
> > + j = channel;
> > + for(i = 0; i < 448; i++, j += avctx->channels)
> > + s->output[i+1024] = audio[j] / 512.0;
> > + for(i = 448; i < 576; i++, j += avctx->channels)
> > + s->output[i+1024] = audio[j] / 512.0 * swindow[576 - i - 1];
> > + memset(s->output+1024+576, 0, sizeof(s->output[0]) * 448);
> > + j = channel;
> > + for(i = 0; i < 1024; i++, j += avctx->channels)
> > + cpe->ch[channel].saved[i] = audio[j] / 512.0;
> > + }
> > + ff_mdct_calc(&s->mdct1024, cpe->ch[channel].coeffs, s->output, s->tmp);
> > + }else{
> > + j = channel;
> > + for (k = 0; k < 1024; k += 128) {
> > + for(i = 448 + k; i < 448 + k + 256; i++)
> > + s->output[i - 448 - k] = (i < 1024) ? cpe->ch[channel].saved[i] : audio[channel + (i-1024)*avctx->channels] / 512.0;
> > + s->dsp.vector_fmul (s->output, k ? swindow : pwindow, 128);
> > + s->dsp.vector_fmul_reverse(s->output+128, s->output+128, swindow, 128);
> > + ff_mdct_calc(&s->mdct128, cpe->ch[channel].coeffs + k, s->output, s->tmp);
> > + }
> > + j = channel;
>
> > + for(i = 0; i < 1024; i++, j += avctx->channels)
> > + cpe->ch[channel].saved[i] = audio[j] / 512.0;
>
> cant all this rescaling be done at somewhere else? window / quant/dequant?
It's funny you say that, 3GPP reference decoder merges rescaling into quantization.
I think I will do the same.
[...]
> > + }while(w < cpe->ch[channel].ics.num_windows && cpe->ch[channel].ics.group_len[w]);
> > + }
> > + if(score < best){
> > + best = score;
> > + bestcb = cb;
> > + }
> > + }
> > + return bestcb;
> > +}
>
> This code does not seem to consider the distortion
> please use rate distortion not just the number of bits!
Umm, why?
By this point psy model produced coefficients to encode and encoder
tries to write them in a minimum number of bits.
It's lossless operation.
> > +
> > +/**
> > + * Encode one scalefactor band with selected codebook.
> > + */
> > +static void encode_codebook(AACEncContext *s, ChannelElement *cpe, int channel, int start, int size, int cb)
>
> hmmmmmmm
> encode_coeffs
> encode_spectral
> encode_codebook_indexes
> ...
> but not encode_codebook
I will rename it
> > +{
> > + const uint8_t *bits = ff_aac_spectral_bits[aac_cb_info[cb].cb_num];
> > + const uint16_t *codes = ff_aac_spectral_codes[aac_cb_info[cb].cb_num];
>
> vertical align
>
>
> > + const int dim = (aac_cb_info[cb].flags & CB_PAIRS) ? 2 : 4;
> > + int i, j, idx;
> > +
>
> > + if(!bits || !codes) return;
>
> one of these is redundant i guess
>
>
> [...]
> > +/**
> > + * Encode information about codebooks used for scalefactor bands coding.
> > + */
> > +static void encode_section_data(AVCodecContext *avctx, AACEncContext *s, ChannelElement *cpe, int channel)
>
> encode_band_types() i suspect ?
yes, but it's called section_data in the standard
I will pick clearer name though.
> > +{
> > + int i, w;
> > + int bits = cpe->ch[channel].ics.num_windows == 1 ? 5 : 3;
> > + int esc = (1 << bits) - 1;
> > + int count;
> > +
> > + for(w = 0; w < cpe->ch[channel].ics.num_windows; w++){
> > + if(cpe->ch[channel].ics.group_len[w]) continue;
> > + count = 0;
> > + for(i = 0; i < cpe->ch[channel].ics.max_sfb; i++){
> > + if(!i || cpe->ch[channel].band_type[w][i] != cpe->ch[channel].band_type[w][i-1]){
> > + if(count){
> > + while(count >= esc){
> > + put_bits(&s->pb, bits, esc);
> > + count -= esc;
> > + }
> > + put_bits(&s->pb, bits, count);
> > + }
> > + put_bits(&s->pb, 4, cpe->ch[channel].band_type[w][i]);
> > + count = 1;
> > + }else
> > + count++;
>
> The code that selects the band types should really already know the number
> of identical band types.
> And i think that should be determined with the viterbi algorithm.
> Its pretty trivial to encode each with several bands (thats already done
> as far as i can see) and then find the global optimal path through the
> table using the viterbi algorithm.
I will do that separately as it will complicate the code for now.
It deserves a TODO comment for now.
> > + }
> > + if(count){
> > + while(count >= esc){
> > + put_bits(&s->pb, bits, esc);
> > + count -= esc;
> > + }
> > + put_bits(&s->pb, bits, count);
> > + }
> > + }
> > +}
>
> > +
> > +/**
> > + * Encode scalefactors.
> > + */
> > +static void encode_scale_factor_data(AVCodecContext *avctx, AACEncContext *s, ChannelElement *cpe, int channel)
>
> s/_data/s/
>
>
> > +{
> > + int off = cpe->ch[channel].mixing_gain, diff;
> > + int i, w;
> > +
> > + for(w = 0; w < cpe->ch[channel].ics.num_windows; w++){
> > + if(cpe->ch[channel].ics.group_len[w]) continue;
> > + for(i = 0; i < cpe->ch[channel].ics.max_sfb; i++){
> > + if(!cpe->ch[channel].zeroes[w][i]){
> > + diff = cpe->ch[channel].sf_idx[w][i] - off + SCALE_DIFF_ZERO;
>
> > + if(diff < 0 || diff > 120) av_log(avctx, AV_LOG_ERROR, "Scalefactor difference is too big to be coded\n");
>
> assert() ?
Is it recommended to be used here?
[...]
> > + .capabilities = CODEC_CAP_SMALL_LAST_FRAME,
> > +};
>
> iam no sure maybe, this needs CODEC_CAP_DELAY as well
seems so
[...]
> > +
> > +//borrowed from aac.c
> > +static float pow2sf_tab[340];
>
> duplicate
will get rid of it
[...]
> > +static av_always_inline float lowpass_iir_filter(LPFilterState *s, const float *coeffs, float in)
> > +{
> > + memmove(s->x, s->x + 1, sizeof(s->x) - sizeof(s->x[0]));
> > + memmove(s->y, s->y + 1, sizeof(s->y) - sizeof(s->y[0]));
>
> no we will not do memmoves in av_always_inline functions
loop then
> > + s->x[IIR_ORDER] = in * coeffs[1];
> > + //FIXME: made only for 4th order filter
> > + s->y[IIR_ORDER] = (s->x[0] + s->x[4])*1 + (s->x[1] + s->x[3])*4 + s->x[2]*6
> > + + coeffs[2]*s->y[0] + coeffs[3]*s->y[1] + coeffs[4]*s->y[2] + coeffs[5]*s->y[3];
> > + return s->y[IIR_ORDER];
> > +}
>
> the lowpass code belongs in a seperate file and patch and should probably
> be applied outside the codec. Unless of course it uses some feedback from the
> psy model ...
> besides lowpass, things should be run through a highpass filter to remove
> the DC component, its inaudible and expensive to represent after MDCT transform
ok, I will try to prepare such patch
and any pointers about how to remove DC? channel->coeffs[ch][0] = 0.0f seems enough to me
> [...]
>
> > +void ff_aac_psy_preprocess(AACPsyContext *ctx, int16_t *audio, int16_t *dest, int tag, int type)
> > +{
> > + int chans = type == ID_CPE ? 2 : 1;
> > + const int chstride = ctx->avctx->channels;
> > + int i, ch;
> > + float t[2];
> > +
>
> > + if(chans == 1 || (ctx->flags & PSY_MODEL_NO_PREPROC) == PSY_MODEL_NO_PREPROC){
> > + for(ch = 0; ch < chans; ch++){
> > + for(i = 0; i < 1024; i++){
> > + dest[i * chstride + ch] = audio[i * chstride + ch];
> > + }
> > + }
>
> memcpy()
>
>
> > + }else{
> > + for(i = 0; i < 1024; i++){
> > + if(ctx->flags & PSY_MODEL_NO_ST_ATT){
> > + for(ch = 0; ch < 2; ch++)
> > + t[ch] = audio[i * chstride + ch];
>
> > + }else{
> > + t[0] = audio[i * chstride + 0] * (0.5 + ctx->stereo_att) + audio[i * chstride + 1] * (0.5 - ctx->stereo_att);
> > + t[1] = audio[i * chstride + 0] * (0.5 - ctx->stereo_att) + audio[i * chstride + 1] * (0.5 + ctx->stereo_att);
> > + }
>
> this stereo attenuation looks like it should be in a filter before the codec
logically it is, as it's applied to raw audio before it's used by encoder
but I will gladly move that code to libavfilter when possible
[...]
> > + AAC_NB_PSY_MODELS ///< total number of psychoacoustic models
> > +};
>
> make it clear that this is not part of he API due to binary compatibility
> when a new model is added
ok
> > +
> > +enum AACPsyModelMode{
> > + PSY_MODE_CBR, ///< follow bitrate as closely as possible
> > + PSY_MODE_ABR, ///< try to achieve bitrate but actual bitrate may differ significantly
> > + PSY_MODE_QUALITY, ///< try to achieve set quality instead of bitrate
> > +};
>
> please use the bitrate_tolerance parameter for closeness of filesize matching
> and the buffer_size / max and min bitrates for strict CBR
> It might also be interresting to consider using ratecontrol.c
I will try, actual code for it is in model anyway
> > +
> > +#define PSY_MODEL_MODE_MASK 0x0000000F ///< bit fields for storing mode (CBR, ABR, VBR)
> > +#define PSY_MODEL_NO_PULSE 0x00000010 ///< disable pulse searching
> > +#define PSY_MODEL_NO_SWITCH 0x00000020 ///< disable window switching
> > +#define PSY_MODEL_NO_ST_ATT 0x00000040 ///< disable stereo attenuation
> > +#define PSY_MODEL_NO_LOWPASS 0x00000080 ///< disable low-pass filtering
> > +
> > +#define PSY_MODEL_NO_PREPROC (PSY_MODEL_NO_ST_ATT | PSY_MODEL_NO_LOWPASS)
> > +
> > +#define PSY_MODEL_MODE(a) ((a) & PSY_MODEL_MODE_MASK)
> > +
> > +/**
> > + * context used by psychoacoustic model
> > + */
> > +typedef struct AACPsyContext {
> > + AVCodecContext *avctx;
>
> > + DSPContext dsp;
>
> for what does the psy model use dsp ?
> Iam not very happy about this being duplicated between encoder and psy model
nowhere, will drop it
> > +
> > + int flags;
> > + int window_type[2];
> > + int window_shape[2];
> > + const uint8_t *bands1024;
> > + int num_bands1024;
> > + const uint8_t *bands128;
> > + int num_bands128;
> > +
> > + const struct AACPsyModel *model;
> > + void* model_priv_data;
> > +
> > + float stereo_att;
> > + int cutoff;
> > + void* lp_state;
> > +}AACPsyContext;
>
> document the fields please, also true for all other structs where fields
> do not have a immedeatly obvious meaning.
no problems
I'll send actual patches later.
> [...]
> --
> Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> Concerning the gods, I have no means of knowing whether they exist or not
> or of what sort they may be, because of the obscurity of the subject, and
> the brevity of human life -- Protagoras
More information about the ffmpeg-devel
mailing list