[FFmpeg-devel] [RFC] AAC Encoder
Kostya
kostya.shishkov
Sun Aug 17 13:46:47 CEST 2008
On Sat, Aug 16, 2008 at 11:06:33PM +0200, Michael Niedermayer wrote:
> On Sat, Aug 16, 2008 at 06:00:39PM +0300, Kostya wrote:
[...]
> > > [...]
> > > > /**
> > > > * Quantize one coefficient.
> > > > * @return absolute value of the quantized coefficient
> > > > * @see 3GPP TS26.403 5.6.2 "Scalefactor determination"
> > > > */
> > > > static av_always_inline int quant(float coef, const float Q)
> > > > {
> > > > return av_clip((int)(pow(fabsf(coef) * Q, 0.75) + 0.4054), 0, 8191);
> > > > }
> > >
> > > converting float to int by casting is rather slow on x86
> > > also i do not see why the cliping against 0 is done
> > >
> > > and where does the 0.4054 come from? How has this value been selected?
> >
> > ask 3GPP folks, in their spec (there's a reference in the comment above)
> > it's also called MAGIC_NUMBER.
>
> ideg
> morons
> anyway, its
> 1.0 - 0.5^0.75
>
> and i seriously doubt this is optimal in the psychoacoustic sense or any
> rate distortion sense.
> It IS optimal in the "least squares distortion but i dont care about the bits"
> sense
> please add a note that this constant needs to be finetuned with listening
> tests or some more solid math!
>
>
> >
> > as for clipping, it seemed more logical than applying FFMIN()
>
> speaking of cliping, can this even overflow 8191? and if so is it even
> correct to clip?
> most signals do not like being cliped randomly
values > 8191 can't be coded with AAC codebook
[...]
> > Per my understanding, 8 short windows sequence is used for better preserving
> > of sudden change of signal. And the change decision is implemented after
> > the same 3GPP spec.
>
> thats also my understanding, but i have my doubts about the actual way
> that is used to detect such changes.
> I suspect that doing the windowd MDCT for both cases and
> picking the one which has a lower sum of absolute coeffs would work better.
> of course this would absolutely NEED a listening test as i may be dead wrong
> iam just applying what i know form video coding to audio ...
well, I know a bit too and looks like nothing helps from other fields,
well, almost.
[...]
> > > > for(ch = 0; ch < chans; ch++){
> > > > prev_scale = -1;
> > > > for(w = 0; w < cpe->ch[ch].ics.num_windows; w++){
> > > > for(g = 0; g < cpe->ch[ch].ics.num_swb; g++){
> > > > g2 = w*16 + g;
> > >
> > > > cpe->ch[ch].zeroes[w][g] = pch->band[ch][g2].thr >= pch->band[ch][g2].energy;
> > >
> > > how much quality is lost compared to full RD decission ? its just a matter of
> > > checking how many bits this would need which is likely negligible speed wise.
> > > (assuming you can unentangle the threshold check into a distortion
> > > computation)
> >
> > well, energy < threshold means resulting band will be zero anyway,
> > and without that check weird values for perceptual entropy start
> > to appear
>
> iam perfectly fine with making bands that would quantize to all zero,
> "zero bands" but this again isnt optimal because a band with just a single
> +1 coeff almost certainly would do better as "zero band" as well.
that's a question too. But for that threshold-manipulating tricks are used
> Also what about these noise bands? i think they are never used ...
not yet.
> >
> > > > if(cpe->ch[ch].zeroes[w][g]) continue;
> > > > //spec gives constant for lg() but we scaled it for log2()
> > > > cpe->ch[ch].sf_idx[w][g] = (int)(2.66667 * (log2(6.75*pch->band[ch][g2].thr) - log2(pch->band[ch][g2].ffac)));
> > >
> > > 2.666... * log2(6.75*pch->band[ch][g2].thr / pch->band[ch][g2].ffac)
> > >
> > >
> > >
> > > > if(prev_scale != -1)
> > > > cpe->ch[ch].sf_idx[w][g] = av_clip(cpe->ch[ch].sf_idx[w][g], prev_scale - SCALE_MAX_DIFF, prev_scale + SCALE_MAX_DIFF);
> > > > prev_scale = cpe->ch[ch].sf_idx[w][g];
> > > > }
> > > > }
> > > > }
> > > > break;
> > > > case PSY_MODE_QUALITY:
> > > > for(ch = 0; ch < chans; ch++){
> > > > start = 0;
> > > > for(w = 0; w < cpe->ch[ch].ics.num_windows; w++){
> > > > for(g = 0; g < cpe->ch[ch].ics.num_swb; g++){
> > > > g2 = w*16 + g;
> > > > if(pch->band[ch][g2].thr >= pch->band[ch][g2].energy){
> > > > cpe->ch[ch].sf_idx[w][g] = 0;
> > > > cpe->ch[ch].zeroes[w][g] = 1;
> > > > }else{
> > > > cpe->ch[ch].zeroes[w][g] = 0;
> > > > cpe->ch[ch].sf_idx[w][g] = (int)(2.66667 * (log2(6.75*pch->band[ch][g2].thr) - log2(pch->band[ch][g2].ffac)));
> > > > while(cpe->ch[ch].sf_idx[ch][g] > 3){
> > > > float dist = calc_distortion(cpe->ch[ch].coeffs + start, cpe->ch[ch].ics.swb_sizes[g], SCALE_ONE_POS + cpe->ch[ch].sf_idx[ch][g]);
> > > > if(dist < pch->band[ch][g2].thr) break;
> > > > cpe->ch[ch].sf_idx[ch][g] -= 3;
> > > > }
> > > > }
> > >
> > > looks rather similar to the previous cases
> >
> > partially - it determines primary scalefactor index from threshold
> > in the same way, what it does to thresholds before that and index
> > after that differs.
>
> if possible please factorize the code
>
>
> [...]
> > > >
> > >
> > > > #ifndef CONFIG_HARDCODED_TABLES
> > > > for (i = 0; i < 316; i++)
> > > > ff_aac_pow2sf_tab[i] = pow(2, (i - 200)/4.);
> > > > #endif /* CONFIG_HARDCODED_TABLES */
> > >
> > > this is likely duplicated
> >
> > it is not. When table is not hardcoded, it should be initialized.
>
> so ff_aac_pow2sf_tab is used just by the psychoacoustic model?
in encoder code, yes
> >
> > > [...]
> > > > void ff_aac_psy_preprocess(AACPsyContext *ctx, int16_t *audio, int16_t *dest, int tag, int type)
> > > > {
> > > > int chans = type == ID_CPE ? 2 : 1;
> > > > const int chstride = ctx->avctx->channels;
> > > > int i, ch;
> > > > float t[2];
> > > >
> > > > if(chans == 1){
> > >
> > > > for(ch = 0; ch < chans; ch++){
> > > > for(i = 0; i < 1024; i++){
> > > > dest[i * chstride + ch] = audio[i * chstride + ch];
> > > > }
> > > > }
> > >
> > > memcpy
> >
> > no, it copies with gaps
>
> and chans=1 which makes the loop rather bloated
ignore preprocess code for now
> [...]
> > > > }
> > > > for(ch = 0; ch < 2; ch++)
> > > > dest[i * chstride + ch] = av_clip_int16(t[ch]);
> > >
> > > we have optimized code for converting float to int16, please use it
> >
> >
> > I may have overlooked some stuff for now but that's because it's too
> > hot here to think properly.
>
> come to austra we have 13 ?C here ATM in 2 days its probably 30 again though
will this mail be enough to get a visa? :)
> ill review the psy model ASAP, just wanted to reply with the comments above
> first
actual patch will be in reply to the next mail
> [...]
> --
> Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> In a rich man's house there is no place to spit but his face.
> -- Diogenes of Sinope
More information about the ffmpeg-devel
mailing list