[FFmpeg-devel] AAC decoder round 5

Fri Aug 8 00:53:39 CEST 2008

2008/8/7 Michael Niedermayer <michaelni at gmx.at>:
> On Thu, Aug 07, 2008 at 04:04:44PM +0100, Robert Swain wrote:
>> 2008/8/6 Michael Niedermayer <michaelni at gmx.at>:
>> > On Wed, Aug 06, 2008 at 12:32:32AM +0100, Robert Swain wrote:
>> >> $subj
>> >>
>> >> Best regards,
>> >> Rob
>> >
>> >> Index: Changelog
>> >> ===================================================================
>> >> --- Changelog (revision 14623)
>> >> +++ Changelog (working copy)
>> >> @@ -128,6 +128,7 @@
>> >>  - Motion Pixels MVI demuxer
>> >>  - removed animated GIF decoder/demuxer
>> >>  - D-Cinema audio muxer
>> >> +- AAC decoder
>> >>
>> >>  version 0.4.9-pre1:
>> >>
>> >
>> > ok
>>
>> You can ignore/I can not submit the build system and documentation
>> changes. I won't commit them until the whole thing has been committed.
>
> You could commit the doc changes :)

I'll wait. Hopefully it won't be too much longer. ;)

>> > [...]
>> >
>> >> Index: libavcodec/aactab.c
>> >> ===================================================================
>> >> --- libavcodec/aactab.c       (revision 14625)
>> >> +++ libavcodec/aactab.c       (working copy)
>>
>> [...]
>>
>> >> +DECLARE_ALIGNED(16, float,  ff_aac_kbd_long_1024[1024]);
>> >> +DECLARE_ALIGNED(16, float,  ff_aac_kbd_short_128[128]);
>> >> +DECLARE_ALIGNED(16, float, ff_aac_sine_long_1024[1024]);
>> >> +DECLARE_ALIGNED(16, float, ff_aac_sine_short_128[128]);
>> >> +
>> >> +const uint8_t ff_aac_num_swb_1024[] = {
>> >> +    41, 41, 47, 49, 49, 51, 47, 47, 43, 43, 43, 40
>> >> +};
>> >> +
>> >> +const uint8_t ff_aac_num_swb_128[] = {
>> >> +    12, 12, 12, 14, 14, 14, 15, 15, 15, 15, 15, 15
>> >> +};
>> >> +
>> >>  const uint32_t ff_aac_scalefactor_code[121] = {
>> >>      0x3ffe8, 0x3ffe6, 0x3ffe7, 0x3ffe5, 0x7fff5, 0x7fff1, 0x7ffed, 0x7fff6,
>> >>      0x7ffee, 0x7ffef, 0x7fff0, 0x7fffc, 0x7fffd, 0x7ffff, 0x7fffe, 0x7fff7,
>> >
>> >
>> >> @@ -795,4 +809,90 @@
>> >>       4064.0312908,  4074.6805676,  4085.3368071,  4096.0000000,
>> >>  };
>> >>
>> >> +/* [ 0, 255] scale factor decoding when using C dsp.float_to_int16
>> >> + * [60, 315] scale factor decoding when using SIMD dsp.float_to_int16
>> >> + * [45, 300] intensity stereo position decoding mapped in reverse order i.e. 0->300, 1->299, ..., 254->46, 255->45
>> >> + */
>> >
>> > not doxygen compat, no description of what the table actually contains.
>>
>> Comment quoted above improved. Does your statement apply to all of the
>> above or just the comment for pow2sf_tab[]?
>
> i just meant pow2sf_tab but the ff_aac_num_swb_128/1024 could also benefit
> from a little doxy, the windows are obvious at least to me so i dont care
> if they get a doxy or not.

I'll put that swb/sfb documentation next to all the *swb* arrays plus
a comment about what the offsets are and so on. I'll just have a look
through and see what I think is non-obvious.

> [...]
>> > [...]
>> >> +{
>> >> +    /* Pre-mixed down-mix outputs are not available. */
>> >> +    newpcs->mono_mixdown_tag   = -1;
>> >> +    newpcs->stereo_mixdown_tag = -1;
>> >> +
>> >> +    if(channels < 1 || channels > 7) {
>> >> +        av_log(ac->avccontext, AV_LOG_ERROR, "invalid default channel configuration (%d channels)\n",
>> >> +               channels);
>> >> +        return -1;
>> >> +    }
>> >> +
>> >> +    /* default channel configurations:
>> >> +     *
>> >> +     * 1ch : front center (mono)
>> >> +     * 2ch : L + R (stereo)
>> >> +     * 3ch : front center + L + R
>> >> +     * 4ch : front center + L + R + back center
>> >> +     * 5ch : front center + L + R + back stereo
>> >> +     * 6ch : front center + L + R + back stereo + LFE
>> >
>> >> +     * 7ch : front center + L + R + outer front left + outer front right + back stereo + LFE
>> >
>> > its still 8, so channels/num_channels is no longer an appropriate name for
>> > the vars used as its not 7 channels.
>>
>> channel_configuration_index?
>
> id prefer channel_config, its shorter

Right, me too. While changing it, I noticed the variable passed into
decode_ga_specific_config() and hence set_pce_to_defaults() is taken
from ac->m4ac.chan_config... :)

>> > [...]
>> >> +/**
>> >> + * Decode an individual_channel_stream payload; reference: table 4.44.
>> >> + *
>> >> + * @param   common_window   Channels have independent [0], or shared [1], Individual Channel Stream information.
>> >> + * @param   scale_flag      scalable [1] or non-scalable [0] AAC (Unused until scalable AAC is implemented.)
>> >> + * @return  Returns error status. 0 - OK, !0 - error
>> >> + */
>> >> +static int decode_ics(AACContext * ac, SingleChannelElement * sce, GetBitContext * gb, int common_window, int scale_flag) {
>> >> +    int icoeffs[1024];
>> >> +    Pulse pulse;
>> >> +    TemporalNoiseShaping * tns = &sce->tns;
>> >> +    IndividualChannelStream * ics = &sce->ics;
>> >> +    float * out = sce->coeffs;
>> >> +    int global_gain, pulse_present = 0;
>> >> +
>> >
>> >> +    pulse.num_pulse = 0;
>> >> +    pulse.start     = 0;
>> >
>> > what effect does start have when num_pulse=0 ?
>>
>> These are assigned only to silence a GCC warning about them possibly
>> being uninitialised when they always will be in the cases that they're
>
>> used. I could add a comment to this effect.
>
> please do, or ill forget and complain about it in every 3rd review

OK.

> [...]
>> > [...]
>> >> +
>> >> +/**
>> >> + * channel coupling transformation interface
>> >> + *
>> >> + * @param   index   index into coupling gain array
>> >> + * @param   apply_coupling_method   pointer to (in)dependent coupling function
>> >> + */
>> >> +static void apply_channel_coupling(AACContext * ac, ChannelElement * cc,
>> >> +        void (*apply_coupling_method)(AACContext * ac, SingleChannelElement * sce, ChannelElement * cc, int index))
>> >> +{
>> >> +    int c;
>> >> +    int index = 0;
>> >> +    ChannelCoupling * coup = &cc->coup;
>> >> +    for (c = 0; c <= coup->num_coupled; c++) {
>> >> +        if (     !coup->is_cpe[c] && ac->che[ID_SCE][coup->tag_select[c]]) {
>> >> +            apply_coupling_method(ac, &ac->che[ID_SCE][coup->tag_select[c]]->ch[0], cc, index++);
>> >> +        } else if(coup->is_cpe[c] && ac->che[ID_CPE][coup->tag_select[c]]) {
>> >
>> >> +            if (!coup->l[c] && !coup->r[c]) {
>> >> +                apply_coupling_method(ac, &ac->che[ID_CPE][coup->tag_select[c]]->ch[0], cc, index);
>> >> +                apply_coupling_method(ac, &ac->che[ID_CPE][coup->tag_select[c]]->ch[1], cc, index++);
>> >> +            }
>> >
>> > The struct docs say r/l are "apply gain to r/l channel" this contradicts the
>> > 0/0 case. which is correct and which is wrong?
>>
>> In the spec, for cc_l[] it says:
>>
>> "one bit indicating that a list of gain_element values is applied to the left
>> channel of a channel pair."
>>
>> The code that uses it implies that description isn't strictly true. In
>> the !l[] && !r[] case, there is a shared list of gain_element values.
>> So l[] and r[] indicate the presence of a channel-specific list of
>> gain_element values. I'll change the comments.
>
> if(coup->ch_select!=2){
>    apply_coupling_method(ac, &ac->che[ID_CPE][coup->tag_select[c]]->ch[0], cc, index);
>    if(coup->ch_select!=0)
>        index++;
> }
> if(coup->ch_select!=1)
>    apply_coupling_method(ac, &ac->che[ID_CPE][coup->tag_select[c]]->ch[1], cc, index++);

I don't know if it was what was intended but having ch_select[c] =
l[c] + 2*r[c]; would make this work. Then ch_select[c] == 0 for both
0, 1 for l[c] is 1, 2 for r[c] is 1, and 3 for both 1. It saves 1 line
and 2 calls to apply_coupling_method(). Applied.

>> > [...]
>> >> +
>> >> +/**
>> >> + * Conduct matrix mix-down and float to int16 conversion.
>> >> + *
>> >> + * @param   data        pointer to output data
>> >> + * @param   data_size   output data size in bytes
>> >> + * @return  Returns error status. 0 - OK, !0 - error
>> >> + */
>> >> +static int mixdown_and_convert_to_int16(AVCodecContext * avccontext, uint16_t * data, int * data_size) {
>> >> +    AACContext * ac = avccontext->priv_data;
>> >> +    int i;
>> >> +    float *c, *l, *r, *sl, *sr, *out;
>> >> +
>> >> +    if (!ac->is_saved) {
>> >> +        ac->is_saved = 1;
>> >> +        *data_size = 0;
>> >> +        return 0;
>> >> +    }
>> >> +
>> >> +    i = 1024 * avccontext->channels * sizeof(int16_t);
>> >> +    if(*data_size < i) {
>> >> +        av_log(avccontext, AV_LOG_ERROR,
>> >> +               "Output buffer too small (%d) or trying to output too many samples (%d) for this frame.\n",
>> >> +               *data_size, i);
>> >> +        return -1;
>> >> +    }
>> >> +    *data_size = i;
>> >> +
>> >> +    if(ac->mm[MIXDOWN_CENTER]) {
>> >> +        /* matrix mix-down */
>> >> +        l   = ac->mm[MIXDOWN_FRONT ]->ch[0].ret;
>> >> +        r   = ac->mm[MIXDOWN_FRONT ]->ch[1].ret;
>> >> +        c   = ac->mm[MIXDOWN_CENTER]->ch[0].ret;
>> >> +        sl  = ac->mm[MIXDOWN_BACK  ]->ch[0].ret;
>> >> +        sr  = ac->mm[MIXDOWN_BACK  ]->ch[1].ret;
>> >> +        out = ac->interleaved_output;
>> >> +
>> >> +        // XXX dsputil-ize
>> >> +        if(avccontext->channels == 2) {
>> >> +            if(ac->pcs.pseudo_surround) {
>> >> +                for(i = 0; i < 1024; i++) {
>> >> +                    *out++ = *l++ + *c   - *sl   - *sr   + ac->add_bias;
>> >> +                    *out++ = *r++ + *c++ + *sl++ + *sr++ - ac->add_bias * 3;
>> >> +                }
>> >> +            } else {
>> >> +                for(i = 0; i < 1024; i++) {
>> >> +                    *out++ = *l++ + *c   + *sl++ - ac->add_bias * 2;
>> >> +                    *out++ = *r++ + *c++ + *sr++ - ac->add_bias * 2;
>> >> +                }
>> >> +            }
>> >> +
>> >> +        } else {
>> >> +            assert(avccontext->channels == 1);
>> >> +            for(i = 0; i < 1024; i++) {
>> >> +                *out++ = *l++ + *r++ + *c++ + *sl++ + *sr++ - ac->add_bias * 4;
>> >> +            }
>> >> +        }
>> >> +
>> >> +        ac->dsp.float_to_int16(data, ac->interleaved_output, 1024 * avccontext->channels);
>> >> +    } else {
>> >> +        ac->dsp.float_to_int16_interleave(data, (const float **)ac->output_data, 1024, avccontext->channels);
>> >> +    }
>> >> +
>> >> +    return 0;
>> >> +}
>> >
>> > mixdown should be done prior to the IMDCT when possible and the IMDCT skipped
>> > for channels that are not needed, or _ALL_ mixdown code should be removed
>> > from the AAC decoder, as in that case mixdown can be done outside of the
>> > decoder equally well and cleaner.
>>
>> I think doing pre-IMDCT mix-down will be complicated because of the
>> different windows and their overlaps. As a consequence of these issues
>> the optimisation may or may not be worth it. I think the overlapping
>> simplifications I had planned when transitioning to using imdct_half()
>> may help so I should probably do those first.
>>
>> *** In the spec they advise against using the matrix mix-down method
>> so I think all this matrix mix-down code should be dropped in favour
>> of generic channel mixing either pre-IMDCT or post decoding, unless
>> someone knows of a good reason why it should be kept. I think Andreas
>> implemented this code but I'm not sure.
>
> also droping it will make it hit svn sooner ...

I'll remove it from SoC. Patch attached
(20080807-2351-remove_mixdown.diff). After this I'll reindent, remove
ProgramConfig, change the code accordingly, rename variables and
functions accordingly and I also spotted that che_type[][] should be
renamed to che_pos[][] or something similar as the enum was changed
from ChannelType to ChannelPosition.

>> >> +
>> >> +
>> >> +static int aac_decode_frame(AVCodecContext * avccontext, void * data, int * data_size, const uint8_t * buf, int buf_size) {
>> >> +    AACContext * ac = avccontext->priv_data;
>> >> +    GetBitContext gb;
>> >> +    enum RawDataBlockID id;
>> >> +    int err, tag;
>> >> +
>> >> +    init_get_bits(&gb, buf, buf_size*8);
>> >> +
>> >> +    // parse
>> >> +    while ((id = get_bits(&gb, 3)) != ID_END) {
>> >> +        tag = get_bits(&gb, 4);
>> >> +        err = -1;
>> >> +        switch (id) {
>> >> +
>> >
>> >> +        case ID_SCE:
>> >> +            if(!ac->che[ID_SCE][tag]) {
>> >
>> > this id / tag naming confuses the hell out of me ...
>> > Both values together identify the "thing". the id here is the
>> > type (channel pair / single channel / ...)
>>
>> Why is it confusing? As you say, the id is the type and the tag is
>> (and I know you don't like this but I'm going to use it anyway because
>> I'm happy with it) the instance of that type.
>
> Well iam not a native englishman ...
> but to me id is something by which a specific individual or instance can
> uniquely be identified. This isnt true here ...
>
> That is, the number on the license plate of a car is an ID,
> Rolls Royce Phantom II is not an ID its the type or model.

Hmm, I agree. In the spec it uses ID but something like element_type
and element_id instead of id and tag respectively would be better,
wouldn't you agree? I suppose identifier is a better word than
instance in this case, I've just not thought of using it as the other
variable was id.

>> > [...]
>> >> Index: libavcodec/aac.h
>> >> ===================================================================
>> >> --- libavcodec/aac.h  (revision 14624)
>> >> +++ libavcodec/aac.h  (working copy)
>> >> @@ -42,8 +44,63 @@
>> >>          ff_aac_spectral_codes[num], sizeof(ff_aac_spectral_codes[num][0]), sizeof(ff_aac_spectral_codes[num][0]), \
>> >>          size);
>> >>
>> >> +#define MAX_CHANNELS 64
>> >
>> > ok as long as its not causing huge arrays, like if there was a use like
>> > array[MAX_FRAME][MAX_TAGS][MAX_BANDS][MAX_CHANNELS] ...
>>
>> Please define huge.
>
> huge being 10k for the context IMHO, that is if you can safe 10k by
> making MAX_CHANNELS a realistic number please do.
> if it just safes 500 bytes its not worth it if it makes the code more
> complex ...

OK, I'll have a check. Are you mainly concerned about the main
AACContext rather than the individual channel elements? Of course it's
good to minimise them all as long as they don't cause obfuscated code
(though it shouldn't hurt to reduce MAX_CHANNELS I wouldn't have
thought).

>> > [...]
>> >> +/**
>> >> + * M/S joint channel coding
>> >> + */
>> >> +typedef struct {
>> >> +    int present;
>> >> +    uint8_t mask[8][64];
>> >> +} MidSideStereo;
>> >
>> > cant this use the channel coupling struct and code? Its doing the same thing
>> > i think
>>
>> "coupling channel elements provide two functionalities: First,
>> coupling channels may be used to implement generalized intensity
>> stereo coding where channel spectra can be shared across channel
>> boundaries. Second, coupling channels may be used to dynamically
>> perform a downmix of one sound object into the stereo image."
>>
>> I don't think they quite do the same thing.
>
> I didnt mean there is a semantic relation just that what they do is
> quite similar and that it may but does not have to be a good idea
> to convert MS to coupling structs on reading.

I thought it was a strange suggestion, anyway, I'm happy to leave
ms_mask[][] as it is now as long as you are.

> [...]
>> >> +    int num_coupled;       ///< number of target elements
>> >
>> >> +    int is_cpe[9];         ///< Set if target is an CPE (otherwise it's an SCE).
>> >
>> > Maybe some code could be simplified if these where ID_CPE/ID_SCE ?
>>
>> Do you mean if the array values were ID_CPE/ID_SCE?
>
> yes
>
>> I don't think it
>> would allow any simplifications.
>
> hmm

It could be simplified if the value of ch_select[] were 'fudged' to
make it work (increment the index) for the ID_SCE case which is
equivalent to the l[c] = 1, r[c] = 0 case so ch_select[c] would have
to be set to 1 for it to work. Is that acceptable? If so the code in
apply_channel_coupling() would look like the attached patch
(20080807-2314-is_cpe_alteration.diff). diffstat says:

 aac.c |   31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

>> >> +    int tag_select[9];     ///< element tag index
>> >> +    int l[9];              ///< Apply gain to left channel of a CPE.
>> >> +    int r[9];              ///< Apply gain to right channel of a CPE.
>> >
>> > do these arrays need 9 or 8 elements?
>>
>> 9
>
> how can a 9th element be written in there?
> from what i remember it looked like that it was limited to 8, but i might
> have been wrong...

8. :) I saw var = get_bits(gb, 3) and thought "8" and the loop uses <=
var so I thought 9. Sorry, my mistake. :)

>> > [...]
>> >> Index: libavcodec/aacdectab.h
>> >> ===================================================================
>> >> --- libavcodec/aacdectab.h    (revision 0)
>> >> +++ libavcodec/aacdectab.h    (revision 0)

[...]

>> > Btw, what does swb stand for? it should be mentioned somewhere in a doxy
>>
>> scalefactor window band - term for scalefactor bands within a window,
>> given in Table 4.110 to Table 4.128.
>>
>> scalefactor band - term for scalefactor band within a group. In case
>> of EIGHT_SHORT_SEQUENCE and grouping a scalefactor band may contain
>> several scalefactor window bands of corresponding frequency. For all
>> other window_sequences scalefactor bands and scalefactor window bands
>> are identical.
>>
>> I think all that information is useful but where would be a good place
>> to put it?
>
> hmm, i do not know but anywhere is better than nowhere

As mentioned, I'll put it before the *swb* tables.

Regards,
Rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 20080807-2314-is_cpe_alteration.diff
Type: text/x-diff
Size: 2466 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080807/a1d810d7/attachment.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 20080807-2351-remove_mixdown.diff
Type: text/x-diff
Size: 10059 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080807/a1d810d7/attachment-0001.diff>