[FFmpeg-devel] AAC decoder round 6

Sun Aug 10 15:13:26 CEST 2008

On Sun, Aug 10, 2008 at 12:31:31PM +0100, Robert Swain wrote:
> 2008/8/9 Michael Niedermayer <michaelni at gmx.at>:
> > On Sat, Aug 09, 2008 at 12:25:09PM +0100, Robert Swain wrote:
> >> $subj
> > [...]
> >
> >> +const uint8_t ff_aac_num_swb_1024[] = {
> >> +    41, 41, 47, 49, 49, 51, 47, 47, 43, 43, 43, 40
> >> +};
> >> +
> >> +const uint8_t ff_aac_num_swb_128[] = {
> >> +    12, 12, 12, 14, 14, 14, 15, 15, 15, 15, 15, 15
> >> +};
> >> +
> >
> > dont these 2 belong to the swb offset tables? if so i think they should
> > be in the same file unless iam missing something either way these 2 are
> > ok [with static if they are only used by 1 file]
> 
> Kostya wanted these two tables but not the rest of the swb tables so I
> made these shared and the others not. I can move them if you wish.

please add a note or move the tables

> 
[...]
> 
> > [...]
> >> +
> >> +/**
> >> + * Decode GA "General Audio" specific configuration; reference: table 4.1.
> >> + *
> >> + * @return  Returns error status. 0 - OK, !0 - error
> >> + */
> >> +static int decode_ga_specific_config(AACContext * ac, GetBitContext * gb, int channel_config) {
> >> +    enum ChannelPosition new_che_pos[4][MAX_ELEM_ID];
> >> +    int extension_flag, ret;
> >> +
> >> +    if(get_bits1(gb)) {  // frameLengthFlag
> >> +        av_log(ac->avccontext, AV_LOG_ERROR, "960/120 MDCT window is not supported.\n");
> >
> > is this "update to latest svn; sample welcome or patch welcome" ?
> > i think we have a function to print such messages, i dont remember what its
> > name was though
> 
> Is that a joke? I don't really understand what you're trying to point
> out. It's more "These files don't really occur in the wild and we
> haven't written any support for them."

[FFmpeg-devel] [PATCH 1/4] add a generic function to lavc to log messages about missing features.

I approved the patch but it apparently was not applied

[...]
> >> +                ics->group_len[ics->num_window_groups-1]++;
> >> +            } else {
> >> +                ics->num_window_groups++;
> >> +                ics->group_len[ics->num_window_groups-1] = 1;
> >> +            }
> >> +        }
> >
> >
> >> +        ics->swb_offset =     swb_offset_128[ac->m4ac.sampling_index];
> >> +        ics->num_swb    = ff_aac_num_swb_128[ac->m4ac.sampling_index];
> >> +    } else {
> >> +        ics->max_sfb    =                              get_bits(gb, 6);
> >> +        ics->swb_offset =     swb_offset_1024[ac->m4ac.sampling_index];
> >> +        ics->num_swb    = ff_aac_num_swb_1024[ac->m4ac.sampling_index];
> >> +    }
> >
> > is there something that prevents sampling_index from being >=12 ?
> > a quick look at ff_mpeg4audio_get_config() indicates that it can at least
> > be 15.
> 
> It's checked in decode_pce() but that is only called if the channel
> config changes. I've added another check after the call to
> ff_mpeg4audio_get_config() so now it is checked in both places where a
> sampling index is parsed.
> 
> >> +
> >> +    if(ics->max_sfb > ics->num_swb) {
> >> +        av_log(ac->avccontext, AV_LOG_ERROR,
> >> +            "Number of scalefactor bands in group (%d) exceeds limit (%d).\n",
> >> +            ics->max_sfb, ics->num_swb);
> >> +        ics->max_sfb = 0;
> >> +        ics->num_swb = 0;
> >> +        return -1;
> >> +    }
> >> +
> >
> >> +    if (ics->window_sequence[0] == EIGHT_SHORT_SEQUENCE) {
> >> +        ics->num_windows   = 8;
> >> +        ics->tns_max_bands = tns_max_bands_128[ac->m4ac.sampling_index];
> >> +    } else {
> >> +        ics->num_windows   = 1;
> >> +        ics->tns_max_bands = tns_max_bands_1024[ac->m4ac.sampling_index];
> >> +        if (get_bits1(gb)) {
> >> +            av_log(ac->avccontext, AV_LOG_ERROR,
> >> +                   "Predictor bit set but LTP is not supported.\n");
> >> +            return -1;
> >> +        }
> >> +    }
> >
> > why is this split from the first part?
> 
> It was split either side of the refactored max_sfb/num_swb validation.
> I can merge them but then if the max_sfb/num_swb check fails, should
> these values also be zeroed?

The way its done currently can leave the num_windows / tns_max_bands in
any state that was valid in the past combined with the other fields
in any independant state that was not valid now.
I did not check if this can cause problems ...
But if its left the code needs to be reviewed for possible security issues
related to invalid combinations of the fields.

[...]
> > [...]
> >
> >> @@ -330,10 +790,286 @@
> >>  }
> >>
> >>  /**
> >> - * Parse Spectral Band Replication extension data; reference: table 4.55.
> >> + * Dequantize and scale spectral data; reference: 4.6.3.3.
> >>   *
> >> + * @param   icoef       array of quantized spectral data
> >> + * @param   band_type   array of the used band type
> >> + * @param   sf          array of scalefactors or intensity stereo positions
> >> + * @param   coef        array of dequantized, scaled spectral data
> >> + */
> >> +static void dequant(AACContext * ac, float coef[1024], const int icoef[1024], float sf[120],
> >> +        const IndividualChannelStream * ics, enum BandType band_type[120]) {
> >> +    const uint16_t * offsets = ics->swb_offset;
> >> +    const int c = 1024/ics->num_window_groups;
> >> +    int g, i, group, k, idx = 0;
> >> +
> >> +    for (g = 0; g < ics->num_window_groups; g++) {
> >> +        memset(coef + g * 128 + offsets[ics->max_sfb], 0, sizeof(float)*(c - offsets[ics->max_sfb]));
> >> +        for (i = 0; i < ics->max_sfb; i++, idx++) {
> >> +            if (band_type[idx] == NOISE_BT) {
> >> +                const float scale = sf[idx] / ((offsets[i+1] - offsets[i]) * PNS_MEAN_ENERGY);
> >> +                for (group = 0; group < ics->group_len[g]; group++) {
> >> +                    for (k = offsets[i]; k < offsets[i+1]; k++)
> >> +                        coef[group*128+k] = lcg_random(&ac->random_state) * scale;
> >> +                }
> >> +            } else if (band_type[idx] != INTENSITY_BT && band_type[idx] != INTENSITY_BT2) {
> >> +                for (group = 0; group < ics->group_len[g]; group++) {
> >> +                    for (k = offsets[i]; k < offsets[i+1]; k++) {
> >> +                        coef[group*128+k] = ivquant(icoef[group*128+k]) * sf[idx];
> >> +                    }
> >> +                }
> >> +            }
> >> +        }
> >> +        coef  += ics->group_len[g]*128;
> >> +        icoef += ics->group_len[g]*128;
> >> +    }
> >> +}
> >
> > it might be better to add 128 just outside the innermost loop to avoid
> > the "group*128 +" inside
> 
> They would need to be reset the end of each iteration over i. I moved
> the group loops inside the conditions recently as I recall. What do
> you suggest? Add 128 to coef and icoef through each iteration over
> group and then reset?

I suggest you benchmark it, if it makes no speed difference they can stay
as they are.

[...]
> >> +    }
> >> +    domain = get_bits1(gb);
> >> +
> >> +    if (is_indep_coup) {
> >> +        coup->coupling_point = AFTER_IMDCT;
> >> +    } else if(domain) {
> >> +        coup->coupling_point = BETWEEN_TNS_AND_IMDCT;
> >> +    } else
> >> +        coup->coupling_point = BEFORE_TNS;
> >
> > :/ i thought the bits were at least stored together, what morons designed
> > this!?
> >
> > coup->coupling_point= get_bits1(gb);
> > ...
> > coup->coupling_point+= 2*get_bits1(gb);
> >
> > still is a possibility though that is cleaner
> > btw, what is the 4th possibility for?
> 
> is_indep_coup = 1 overrides whatever domain is so the 4th case is
> redundant. Though theoretically, domain must be 1 if is_indep_coup is
> 1. is_indep_coup = 1 and domain = 0 is theoretically invalid.
> 
> coup->coupling_point = 2*get_bits1(gb);
> ...
> coup->coupling_point += get_bits1(gb) && !coup->coupling_point;
> ?

hmm, this seems unneccesarily complex
and the invalid value should be checked for with av_log&return -1

[...]
> >> +
> >> +static int aac_decode_frame(AVCodecContext * avccontext, void * data, int * data_size, const uint8_t * buf, int buf_size) {
> >> +    AACContext * ac = avccontext->priv_data;
> >> +    GetBitContext gb;
> >> +    enum RawDataBlockType elem_type;
> >> +    int err, elem_id;
> >> +
> >> +    init_get_bits(&gb, buf, buf_size*8);
> >> +
> >> +    // parse
> >> +    while ((elem_type = get_bits(&gb, 3)) != TYPE_END) {
> >> +        elem_id = get_bits(&gb, 4);
> >> +        err = -1;
> >> +        switch (elem_type) {
> >> +
> >> +        case TYPE_SCE:
> >> +            if(!ac->che[TYPE_SCE][elem_id]) {
> >> +                if(elem_id == 1 && ac->che[TYPE_LFE][0]) {
> >> +                    /* Some streams incorrectly code 5.1 audio as SCE[0] CPE[0] CPE[1] SCE[1]
> >> +                       instead of SCE[0] CPE[0] CPE[0] LFE[0].
> >> +                       If we seem to have encountered such a stream,
> >> +                       transfer the LFE[0] element to SCE[1] */
> >> +                    ac->che[TYPE_SCE][elem_id] = ac->che[TYPE_LFE][0];
> >> +                    ac->che[TYPE_LFE][0] = NULL;
> >> +                } else
> >> +                    break;
> >> +            }
> >> +            err = decode_ics(ac, &ac->che[TYPE_SCE][elem_id]->ch[0], &gb, 0, 0);
> >> +            break;
> >> +
> >> +        case TYPE_CPE:
> >> +            if (ac->che[TYPE_CPE][elem_id])
> >> +                err = decode_cpe(ac, &gb, elem_id);
> >> +            break;
> >> +
> >> +        case TYPE_FIL:
> >> +            if (elem_id == 15)
> >> +                elem_id += get_bits(&gb, 8) - 1;
> >> +            while (elem_id > 0)
> >> +                elem_id -= decode_extension_payload(ac, &gb, elem_id);
> >> +            err = 0; /* FIXME */
> >> +            break;
> >> +
> >> +        case TYPE_PCE:
> >> +        {
> >> +            enum ChannelPosition new_che_pos[4][MAX_ELEM_ID];
> >> +            memset(new_che_pos, 0, 4 * MAX_ELEM_ID * sizeof(new_che_pos[0][0]));
> >> +            if((err = decode_pce(ac, new_che_pos, &gb)))
> >> +                break;
> >> +            err = output_configure(ac, ac->che_pos, new_che_pos);
> >> +            break;
> >> +        }
> >> +
> >> +        case TYPE_DSE:
> >> +            skip_data_stream_element(&gb);
> >> +            err = 0;
> >> +            break;
> >> +
> >> +        case TYPE_CCE:
> >> +            if (ac->che[TYPE_CCE][elem_id])
> >> +                err = decode_cce(ac, &gb, elem_id);
> >> +            break;
> >> +
> >> +        case TYPE_LFE:
> >> +            if (ac->che[TYPE_LFE][elem_id])
> >> +                err = decode_ics(ac, &ac->che[TYPE_LFE][elem_id]->ch[0], &gb, 0, 0);
> >> +            break;
> >
> > these checks could be factorized out
> > if(elem_type < C && !ac->che[elem_type][elem_id])
> >    return -1;
> 
> The TYPE_SCE case has something slightly different so it will have to be:
> 
> if(elem_type && elem_type < TYPE_DSE && !ac->che[elem_type][elem_id])
>     return -1;

TYPE_SCE can be dealt with before the error check.

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Asymptotically faster algorithms should always be preferred if you have
asymptotical amounts of data
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080810/5647e5bc/attachment.pgp>