[FFmpeg-devel] AAC decoder round 6

Robert Swain robert.swain
Sun Aug 10 21:27:47 CEST 2008


2008/8/10 Michael Niedermayer <michaelni at gmx.at>:
> On Sun, Aug 10, 2008 at 12:31:31PM +0100, Robert Swain wrote:
>> 2008/8/9 Michael Niedermayer <michaelni at gmx.at>:
>> > On Sat, Aug 09, 2008 at 12:25:09PM +0100, Robert Swain wrote:
>> >> $subj
>> > [...]
>> >
>> >> +const uint8_t ff_aac_num_swb_1024[] = {
>> >> +    41, 41, 47, 49, 49, 51, 47, 47, 43, 43, 43, 40
>> >> +};
>> >> +
>> >> +const uint8_t ff_aac_num_swb_128[] = {
>> >> +    12, 12, 12, 14, 14, 14, 15, 15, 15, 15, 15, 15
>> >> +};
>> >> +
>> >
>> > dont these 2 belong to the swb offset tables? if so i think they should
>> > be in the same file unless iam missing something either way these 2 are
>> > ok [with static if they are only used by 1 file]
>>
>> Kostya wanted these two tables but not the rest of the swb tables so I
>> made these shared and the others not. I can move them if you wish.
>
> please add a note or move the tables

A note saying that they're going to be used by other files? Isn't that
obvious from them being in the file containing shared declarations?
None of the other tables in aactab.h have any notes saying that
they're shared.

> [...]
>>
>> > [...]
>> >> +
>> >> +/**
>> >> + * Decode GA "General Audio" specific configuration; reference: table 4.1.
>> >> + *
>> >> + * @return  Returns error status. 0 - OK, !0 - error
>> >> + */
>> >> +static int decode_ga_specific_config(AACContext * ac, GetBitContext * gb, int channel_config) {
>> >> +    enum ChannelPosition new_che_pos[4][MAX_ELEM_ID];
>> >> +    int extension_flag, ret;
>> >> +
>> >> +    if(get_bits1(gb)) {  // frameLengthFlag
>> >> +        av_log(ac->avccontext, AV_LOG_ERROR, "960/120 MDCT window is not supported.\n");
>> >
>> > is this "update to latest svn; sample welcome or patch welcome" ?
>> > i think we have a function to print such messages, i dont remember what its
>> > name was though
>>
>> Is that a joke? I don't really understand what you're trying to point
>> out. It's more "These files don't really occur in the wild and we
>> haven't written any support for them."
>
> [FFmpeg-devel] [PATCH 1/4] add a generic function to lavc to log messages about missing features.
>
> I approved the patch but it apparently was not applied

Hmm, OK. I didn't know about this. It can be changed as soon as it is
committed. There are other unsupported features that can also use
this.

> [...]
>> >> +                ics->group_len[ics->num_window_groups-1]++;
>> >> +            } else {
>> >> +                ics->num_window_groups++;
>> >> +                ics->group_len[ics->num_window_groups-1] = 1;
>> >> +            }
>> >> +        }
>> >
>> >
>> >> +        ics->swb_offset =     swb_offset_128[ac->m4ac.sampling_index];
>> >> +        ics->num_swb    = ff_aac_num_swb_128[ac->m4ac.sampling_index];
>> >> +    } else {
>> >> +        ics->max_sfb    =                              get_bits(gb, 6);
>> >> +        ics->swb_offset =     swb_offset_1024[ac->m4ac.sampling_index];
>> >> +        ics->num_swb    = ff_aac_num_swb_1024[ac->m4ac.sampling_index];
>> >> +    }
>> >
>> > is there something that prevents sampling_index from being >=12 ?
>> > a quick look at ff_mpeg4audio_get_config() indicates that it can at least
>> > be 15.
>>
>> It's checked in decode_pce() but that is only called if the channel
>> config changes. I've added another check after the call to
>> ff_mpeg4audio_get_config() so now it is checked in both places where a
>> sampling index is parsed.
>>
>> >> +
>> >> +    if(ics->max_sfb > ics->num_swb) {
>> >> +        av_log(ac->avccontext, AV_LOG_ERROR,
>> >> +            "Number of scalefactor bands in group (%d) exceeds limit (%d).\n",
>> >> +            ics->max_sfb, ics->num_swb);
>> >> +        ics->max_sfb = 0;
>> >> +        ics->num_swb = 0;
>> >> +        return -1;
>> >> +    }
>> >> +
>> >
>> >> +    if (ics->window_sequence[0] == EIGHT_SHORT_SEQUENCE) {
>> >> +        ics->num_windows   = 8;
>> >> +        ics->tns_max_bands = tns_max_bands_128[ac->m4ac.sampling_index];
>> >> +    } else {
>> >> +        ics->num_windows   = 1;
>> >> +        ics->tns_max_bands = tns_max_bands_1024[ac->m4ac.sampling_index];
>> >> +        if (get_bits1(gb)) {
>> >> +            av_log(ac->avccontext, AV_LOG_ERROR,
>> >> +                   "Predictor bit set but LTP is not supported.\n");
>> >> +            return -1;
>> >> +        }
>> >> +    }
>> >
>> > why is this split from the first part?
>>
>> It was split either side of the refactored max_sfb/num_swb validation.
>> I can merge them but then if the max_sfb/num_swb check fails, should
>> these values also be zeroed?
>
> The way its done currently can leave the num_windows / tns_max_bands in
> any state that was valid in the past combined with the other fields
> in any independant state that was not valid now.
> I did not check if this can cause problems ...
> But if its left the code needs to be reviewed for possible security issues
> related to invalid combinations of the fields.

Hmm. I'm tempted to memset ics to 0 in the error cases.

>> > [...]
>> >
>> >> @@ -330,10 +790,286 @@
>> >>  }
>> >>
>> >>  /**
>> >> - * Parse Spectral Band Replication extension data; reference: table 4.55.
>> >> + * Dequantize and scale spectral data; reference: 4.6.3.3.
>> >>   *
>> >> + * @param   icoef       array of quantized spectral data
>> >> + * @param   band_type   array of the used band type
>> >> + * @param   sf          array of scalefactors or intensity stereo positions
>> >> + * @param   coef        array of dequantized, scaled spectral data
>> >> + */
>> >> +static void dequant(AACContext * ac, float coef[1024], const int icoef[1024], float sf[120],
>> >> +        const IndividualChannelStream * ics, enum BandType band_type[120]) {
>> >> +    const uint16_t * offsets = ics->swb_offset;
>> >> +    const int c = 1024/ics->num_window_groups;
>> >> +    int g, i, group, k, idx = 0;
>> >> +
>> >> +    for (g = 0; g < ics->num_window_groups; g++) {
>> >> +        memset(coef + g * 128 + offsets[ics->max_sfb], 0, sizeof(float)*(c - offsets[ics->max_sfb]));
>> >> +        for (i = 0; i < ics->max_sfb; i++, idx++) {
>> >> +            if (band_type[idx] == NOISE_BT) {
>> >> +                const float scale = sf[idx] / ((offsets[i+1] - offsets[i]) * PNS_MEAN_ENERGY);
>> >> +                for (group = 0; group < ics->group_len[g]; group++) {
>> >> +                    for (k = offsets[i]; k < offsets[i+1]; k++)
>> >> +                        coef[group*128+k] = lcg_random(&ac->random_state) * scale;
>> >> +                }
>> >> +            } else if (band_type[idx] != INTENSITY_BT && band_type[idx] != INTENSITY_BT2) {
>> >> +                for (group = 0; group < ics->group_len[g]; group++) {
>> >> +                    for (k = offsets[i]; k < offsets[i+1]; k++) {
>> >> +                        coef[group*128+k] = ivquant(icoef[group*128+k]) * sf[idx];
>> >> +                    }
>> >> +                }
>> >> +            }
>> >> +        }
>> >> +        coef  += ics->group_len[g]*128;
>> >> +        icoef += ics->group_len[g]*128;
>> >> +    }
>> >> +}
>> >
>> > it might be better to add 128 just outside the innermost loop to avoid
>> > the "group*128 +" inside
>>
>> They would need to be reset the end of each iteration over i. I moved
>> the group loops inside the conditions recently as I recall. What do
>> you suggest? Add 128 to coef and icoef through each iteration over
>> group and then reset?
>
> I suggest you benchmark it, if it makes no speed difference they can stay
> as they are.

OK. On a Core Duo 1.83GHz, 5 repetitions interleaved...

Unpatched:
real	0m8.697s
user	0m8.052s
sys	0m0.494s

real	0m8.813s
user	0m8.221s
sys	0m0.491s

real	0m8.686s
user	0m8.072s
sys	0m0.482s

real	0m8.491s
user	0m7.950s
sys	0m0.463s

real	0m8.405s
user	0m7.871s
sys	0m0.461s

Patched (see attached - 20080810-1943-increment_pointers_dont_mul.diff):
real	0m8.732s
user	0m8.129s
sys	0m0.483s

real	0m8.864s
user	0m8.275s
sys	0m0.487s

real	0m8.653s
user	0m8.062s
sys	0m0.473s

real	0m8.149s
user	0m7.664s
sys	0m0.441s

real	0m8.794s
user	0m8.149s
sys	0m0.489s

> [...]
>> >> +    }
>> >> +    domain = get_bits1(gb);
>> >> +
>> >> +    if (is_indep_coup) {
>> >> +        coup->coupling_point = AFTER_IMDCT;
>> >> +    } else if(domain) {
>> >> +        coup->coupling_point = BETWEEN_TNS_AND_IMDCT;
>> >> +    } else
>> >> +        coup->coupling_point = BEFORE_TNS;
>> >
>> > :/ i thought the bits were at least stored together, what morons designed
>> > this!?
>> >
>> > coup->coupling_point= get_bits1(gb);
>> > ...
>> > coup->coupling_point+= 2*get_bits1(gb);
>> >
>> > still is a possibility though that is cleaner
>> > btw, what is the 4th possibility for?
>>
>> is_indep_coup = 1 overrides whatever domain is so the 4th case is
>> redundant. Though theoretically, domain must be 1 if is_indep_coup is
>> 1. is_indep_coup = 1 and domain = 0 is theoretically invalid.
>>
>> coup->coupling_point = 2*get_bits1(gb);
>> ...
>> coup->coupling_point += get_bits1(gb) && !coup->coupling_point;
>> ?
>
> hmm, this seems unneccesarily complex
> and the invalid value should be checked for with av_log&return -1

It's not invalid in that it should be checked and cause an error. It's
just that the value of domain is ignored when is_indep_coup is set. I
said it was theoretically invalid to have is_indep_coup = 1 and domain
= 0 as is_indep_coup => coupling occurs after imdct => domain = 1. But
in practice, domain is ignored when is_indep_coup = 1.

So, it's just whether you prefer the if() {} else if() {} else {} or
what I suggested above. Or something else.

> [...]
>> >> +
>> >> +static int aac_decode_frame(AVCodecContext * avccontext, void * data, int * data_size, const uint8_t * buf, int buf_size) {
>> >> +    AACContext * ac = avccontext->priv_data;
>> >> +    GetBitContext gb;
>> >> +    enum RawDataBlockType elem_type;
>> >> +    int err, elem_id;
>> >> +
>> >> +    init_get_bits(&gb, buf, buf_size*8);
>> >> +
>> >> +    // parse
>> >> +    while ((elem_type = get_bits(&gb, 3)) != TYPE_END) {
>> >> +        elem_id = get_bits(&gb, 4);
>> >> +        err = -1;
>> >> +        switch (elem_type) {
>> >> +
>> >> +        case TYPE_SCE:
>> >> +            if(!ac->che[TYPE_SCE][elem_id]) {
>> >> +                if(elem_id == 1 && ac->che[TYPE_LFE][0]) {
>> >> +                    /* Some streams incorrectly code 5.1 audio as SCE[0] CPE[0] CPE[1] SCE[1]
>> >> +                       instead of SCE[0] CPE[0] CPE[0] LFE[0].
>> >> +                       If we seem to have encountered such a stream,
>> >> +                       transfer the LFE[0] element to SCE[1] */
>> >> +                    ac->che[TYPE_SCE][elem_id] = ac->che[TYPE_LFE][0];
>> >> +                    ac->che[TYPE_LFE][0] = NULL;
>> >> +                } else
>> >> +                    break;
>> >> +            }
>> >> +            err = decode_ics(ac, &ac->che[TYPE_SCE][elem_id]->ch[0], &gb, 0, 0);
>> >> +            break;
>> >> +
>> >> +        case TYPE_CPE:
>> >> +            if (ac->che[TYPE_CPE][elem_id])
>> >> +                err = decode_cpe(ac, &gb, elem_id);
>> >> +            break;
>> >> +
>> >> +        case TYPE_FIL:
>> >> +            if (elem_id == 15)
>> >> +                elem_id += get_bits(&gb, 8) - 1;
>> >> +            while (elem_id > 0)
>> >> +                elem_id -= decode_extension_payload(ac, &gb, elem_id);
>> >> +            err = 0; /* FIXME */
>> >> +            break;
>> >> +
>> >> +        case TYPE_PCE:
>> >> +        {
>> >> +            enum ChannelPosition new_che_pos[4][MAX_ELEM_ID];
>> >> +            memset(new_che_pos, 0, 4 * MAX_ELEM_ID * sizeof(new_che_pos[0][0]));
>> >> +            if((err = decode_pce(ac, new_che_pos, &gb)))
>> >> +                break;
>> >> +            err = output_configure(ac, ac->che_pos, new_che_pos);
>> >> +            break;
>> >> +        }
>> >> +
>> >> +        case TYPE_DSE:
>> >> +            skip_data_stream_element(&gb);
>> >> +            err = 0;
>> >> +            break;
>> >> +
>> >> +        case TYPE_CCE:
>> >> +            if (ac->che[TYPE_CCE][elem_id])
>> >> +                err = decode_cce(ac, &gb, elem_id);
>> >> +            break;
>> >> +
>> >> +        case TYPE_LFE:
>> >> +            if (ac->che[TYPE_LFE][elem_id])
>> >> +                err = decode_ics(ac, &ac->che[TYPE_LFE][elem_id]->ch[0], &gb, 0, 0);
>> >> +            break;
>> >
>> > these checks could be factorized out
>> > if(elem_type < C && !ac->che[elem_type][elem_id])
>> >    return -1;
>>
>> The TYPE_SCE case has something slightly different so it will have to be:
>>
>> if(elem_type && elem_type < TYPE_DSE && !ac->che[elem_type][elem_id])
>>     return -1;
>
> TYPE_SCE can be dealt with before the error check.

Do you just mean:

if(elem_type == TYPE_SCE && !ac->che[TYPE_SCE][elem_id] &&
        elem_id = 1 && ac->che[TYPE_LFE][0]) {
    ac->che[TYPE_SCE][elem_id] = ac->che[TYPE_LFE][0];
    ac->che[TYPE_LFE][0] = NULL;
}
if(elem_type < TYPE_DSE && !ac->che[elem_type][elem_id])
    return -1;

...or something like that?

Rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 20080810-1943-increment_pointers_dont_mul.diff
Type: text/x-diff
Size: 7463 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080810/3827bfee/attachment.diff>



More information about the ffmpeg-devel mailing list