[FFmpeg-devel] [RFC] AAC Encoder

Mon Aug 18 13:54:54 CEST 2008

Michael Niedermayer a ?crit :
>>static av_always_inline int quant(float coef, const float Q)
>>{
>>    return av_clip((int)(pow(fabsf(coef) * Q, 0.75) + 0.4054), 0, 8191);
>>}
> 
> 
> converting float to int by casting is rather slow on x86
> also i do not see why the cliping against 0 is done
> 
> and where does the 0.4054 come from? How has this value been selected?

It's a magic number to compensate for the fact that we want the 
quantization noise to be minimal for the actual x value while we are in 
fact quantizing x^0.75.
I don't remember how it was determined, but the exact same 0.4054f value 
is used within Lame:
http://lame.cvs.sourceforge.net/lame/lame/libmp3lame/takehiro.c?&view=markup

>>            switch(last_window_sequence){
>>            case ONLY_LONG_SEQUENCE:
>>                win[ch] = switch_to_eight ? LONG_START_SEQUENCE : ONLY_LONG_SEQUENCE;
>>                grouping[ch] = 0;
>>                break;
>>            case LONG_START_SEQUENCE:
>>                win[ch] = EIGHT_SHORT_SEQUENCE;
>>                grouping[ch] = pch->next_grouping[ch];
>>                break;
>>            case LONG_STOP_SEQUENCE:
>>                win[ch] = ONLY_LONG_SEQUENCE;
>>                grouping[ch] = 0;
>>                break;
>>            case EIGHT_SHORT_SEQUENCE:
>>                win[ch] = switch_to_eight ? EIGHT_SHORT_SEQUENCE : LONG_STOP_SEQUENCE;
>>                grouping[ch] = switch_to_eight ? pch->next_grouping[ch] : 0;
>>                break;
>>            }
>>            pch->next_grouping[ch] = window_grouping[attack_n];
>>        }
> 
> 
> How much quality is lost by using this compared to RD optimal switching?

Very few. Block switching decision are relatively easy in the vast 
majority of cases. Moreover, RD decision is relatively uneasy about it, 
as a psy-model working in the frequency domain (thus providing you a 
masking threshold which on which perceptual distorion computation is 
based) is not that good at determining time domain smearing resulting 
from the wrong window size.

>>        //determine scalefactors - 5.6.2 "Scalefactor determination"
>>        for(ch = 0; ch < chans; ch++){
>>            prev_scale = -1;
>>            for(w = 0; w < cpe->ch[ch].ics.num_windows; w++){
>>                for(g = 0; g < cpe->ch[ch].ics.num_swb; g++){
>>                    g2 = w*16 + g;
> 
> 
>>                    cpe->ch[ch].zeroes[w][g] = pch->band[ch][g2].thr >= pch->band[ch][g2].energy;
> 
> 
> how much quality is lost compared to full RD decission ? its just a matter of
> checking how many bits this would need which is likely negligible speed wise.
> (assuming you can unentangle the threshold check into a distortion
>  computation)

Quite a bit, and that is why the 3gp doc recommend to use this 
scalefator determination as a starting point for a "full search" of sf 
value, ie direct computation is just an heuristic to speed up the 
scalefactor determination. (the non linear quantizer used for coeffs 
hinders a potential "full direct" scalefactor computation)

-- 
Gabriel Bouvigne
www.mp3-tech.org
personal page: http://gabriel.mp3-tech.org