[FFmpeg-devel] [RFC] AAC Encoder, now more optimal
Kostya
kostya.shishkov
Sat Sep 6 17:36:16 CEST 2008
On Sat, Sep 06, 2008 at 03:13:54AM +0200, Michael Niedermayer wrote:
> On Fri, Sep 05, 2008 at 04:13:58PM +0300, Kostya wrote:
[...]
> ok, so only optimization related comments
>
> [...]
>
> > /**
> > * Encode band info for single window group bands.
> > */
> > static void encode_window_bands_info(AACEncContext *s, SingleChannelElement *sce,
> > int win, int group_len)
> > {
> > BandCodingPath path[64];
> > int band_bits[64][12];
> > int w, swb, cb, start, start2, size;
> > int i, j;
> > const int max_sfb = sce->ics.max_sfb;
> > const int run_bits = sce->ics.num_windows == 1 ? 5 : 3;
> > const int run_esc = (1 << run_bits) - 1;
> > int bits, sbits, idx, count;
> > int stack[64], stack_len;
> >
> > start = win*128;
> > for(swb = 0; swb < max_sfb; swb++){
> > int maxval = 0;
> > start2 = start;
> > size = sce->ics.swb_sizes[swb];
> > if(sce->zeroes[win*16 + swb]){
> > maxval = 0;
> > }else{
> > for(w = 0; w < group_len; w++){
> > for(i = start2; i < start2 + size; i++){
> > maxval = FFMAX(maxval, FFABS(sce->icoefs[i]));
> > }
> > start2 += 128;
> > }
> > }
> > sbits = calculate_band_sign_bits(s, sce, group_len, start, size);
> > for(cb = 0; cb < 12; cb++){
> > if(aac_cb_info[cb].maxval < maxval){
> > band_bits[swb][cb] = INT_MAX;
> > }else{
> > band_bits[swb][cb] = calculate_band_bits(s, sce, group_len, start, size, cb);
> > if(IS_CODEBOOK_UNSIGNED(cb-1)){
> > band_bits[swb][cb] += sbits;
> > }
> > }
> > }
> > start += sce->ics.swb_sizes[swb];
> > }
> > path[0].bits = 0;
> > for(i = 1; i <= max_sfb; i++)
> > path[i].bits = INT_MAX;
>
> > for(i = 0; i < max_sfb; i++){
> > for(cb = 0; cb < 12; cb++){
> > int sum = 0;
> > for(j = 1; j <= max_sfb - i; j++){
> > if(band_bits[i+j-1][cb] == INT_MAX)
> > break;
> > sum += band_bits[i+j-1][cb];
> > bits = sum + path[i].bits + run_value_bits[sce->ics.num_windows == 8][j];
> > if(bits < path[i+j].bits){
> > path[i+j].bits = bits;
> > path[i+j].codebook = cb;
> > path[i+j].prev_idx = i;
> > }
> > }
> > }
> > }
>
> below should be faster, but _very_ important, check that it still finds the
> global optimum, it should. That said, always check all optimizations you do
> to ensure they do not change anything unexpected. debuging such things later
> can be very time consuming.
>
> for(i = 0; i < max_sfb; i++){
> for(cb = 0; cb < 12; cb++){
> int r= last[cb].run;
> if(run_value_bits[r] == run_value_bits[r+1]){
> last[cb].bits += band_bits[i][cb];
> last[cb].run++;
> bits = band_bits[i][cb] + path[i-1].bits + run_value_bits[1];
> if(bits < last[cb].bits){
> last[cb].bits= bits;
> last[cb].run= run;
> }
> }else{
> int sum = 0;
> last[cb].bits = INT_MAX;
> for(run = 1; run<=i+1; run++){
> sum += band_bits[i-run+1][cb];
> bits = sum + path[i-run].bits + run_value_bits[run];
> if(bits < last[cb].bits){
> last[cb].bits= bits;
> last[cb].run= run;
> }
> }
> }
> if(last[cb].bits < path[i].bits){
> path[i].bits= last[cb].bits;
> path[i].codebook= cb;
> path[i].run= last[cb].run;
> }
> }
> }
>
> also this should be working with rate distortion values not just bits.
IMO it should encode losslessly at this stage, so no rate distortion.
While this seems a bit faster it's significantly worse - on my test sample it's
222809 bytes against 209792 with current code.
> [...]
> > typedef struct TrellisPath {
> > float cost;
> > int prev;
> > int min_val;
> > int max_val;
> > } TrellisPath;
> >
>
> > static void search_for_quantizers(AACEncContext *s, ChannelElement *cpe, int channels)
> > {
> > int q, ch, w, w2, g, start = 0;
> > int i;
> > int qcoeffs[128];
> > int idx;
> > TrellisPath paths[256*121];
> > int bandaddr[121];
> > const float lambda = 5e-7f;
> > int minq;
> > float mincost;
> >
>
> > for(ch = 0; ch < channels; ch++){
>
> isnt that channels loop prettier outside the function?
I was intending to embed M/S coding decision hewe in some way.
Any tips on that?
> the rest of this function looks pretty messy and IMHO needs some cleanup
> before it can be optimized, optimiztaions will add more mess and that would
> not be good for a unction that should be about 1/4 the size it is.
> one obvious speed issue is running cbrt() and pow() in the very innermost
> loop though.
Since it's partially optimized already...
> Also as it is based on approximate quantization errors and bits we cannot
> try heuristics, as we have no way to test what effect the heuristics have
> thats because we lack a optimal but slow reference.
>From my debugging I can say that while bits needed to code band are
slowly decreasing in monotone way, distortion tends to vary significantly
inside quantizer groups of four (i.e. 2^0, 2^1/4, 2^2/4 and 2^3/4),
i.e. it may be minimal for quantizers X mod 4 == 3, so it's a bit hard to predict.
> a very simple optimization though that could be tried is to simple pick the
> so far best node and only consider it and all nodes that are less than 19
> bits worse, because the max scalefactor diff VLC has 19 bits the others
> cant be on the optimal path.
>
> [...]
> --
> Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> I have never wished to cater to the crowd; for what I know they do not
> approve, and what they approve I do not know. -- Epicurus
More information about the ffmpeg-devel
mailing list