[FFmpeg-devel] AAC-Main (round 2)

Fri Nov 14 02:04:07 CET 2008

On Mon, Nov 10, 2008 at 4:32 PM, Alex Converse <alex.converse at gmail.com> wrote:
> On Sat, Nov 8, 2008 at 2:03 PM, Alex Converse <alex.converse at gmail.com>
> wrote:
>>
>> On Sat, Nov 8, 2008 at 12:48 PM, Michael Niedermayer <michaelni at gmx.at>
>> wrote:
>>>
>>> On Fri, Nov 07, 2008 at 07:20:45PM -0500, Alex Converse wrote:
>>> > Hi,
>>> >
>>> > Attached are a series of patches that implements AAC Main in FFmpeg.
>>> > These
>>> > are very similar to the first round. The only major change was avoiding
>>> > unnecessary type punning.
>>> >
>>> > While AAC-Main is rarely used, faad2 supports it, flash claims to
>>> > support it
>>> > (I haven't tested this), and we claim to support it but do not.
>>> >
>>> > Notes:
>>> > 1) Frequency domain prediction is described only in ISO/IEC 13818-7 not
>>> > in
>>> > 14496-3.
>>>
>>> > 2) The prediction operation uses 16-bit floats, using 32-bit floats
>>> > does not
>>> > give adequate accuracy so emulation routines to round to 16-bit are
>>> > included.
>>>
>>> > 3) As only 16-bit floats are required it could be possible to store the
>>> > prediction state with half the memory but I'm not sure how to approach
>>> > that
>>> > situation without resorting to IEEE type punning.
>>> >
>>> > Regards,
>>> >
>>> > Alex Converse
>>>
>>> [...]
>>> > +static void reset_predictor_group(PredictorState * ps, int group_num)
>>> > {
>>> > +    int i;
>>> > +    if (group_num)
>>> > +        for (i = group_num-1; i < MAX_PREDICTORS; i+=30)
>>> > +            reset_predict_state(&ps[i]);
>>> > +}
>>>
>>> i think it would be clearer if the if() was moved out of the function
>>
>> ok
>>
>>>
>>> [...]
>>> > @@ -786,6 +837,95 @@ static int decode_spectrum_and_dequant(AACContext
>>> > * ac, float coef[1024], GetBit
>>> >      return 0;
>>> >  }
>>> >
>>> > +static av_always_inline float flt16_round(float pf) {
>>> > +    int exp;
>>> > +    pf = frexp(pf, &exp);
>>> > +    pf = ldexp(roundf(ldexp(pf, 8)), exp-8);
>>> > +    return pf;
>>> > +}
>>> > +
>>> > +static av_always_inline float flt16_even(float pf) {
>>> > +    int exp;
>>> > +    pf = frexpf(pf, &exp);
>>> > +    pf = ldexp(rintf(ldexp(pf, 8)), exp-8);
>>> > +    return pf;
>>> > +}
>>> > +
>>> > +static av_always_inline float flt16_trunc(float pf) {
>>> > +    int exp;
>>> > +    pf = frexpf(pf, &exp);
>>> > +    pf = ldexp(truncf(ldexp(pf, 8)), exp-8);
>>> > +    return pf;
>>> > +}
>>> > +
>>>
>>> are these faster or slower than the code suggested in the spec?
>>
>> The code suggested in the spec uses type puns for flt16_round and
>> flt16_trunc.
>>
>> flt16_even as suggested in the spec casts back and forth between float,
>> double, and int, uses floating point multiplies and divides rather than
>> ldexpf (which can take advantage of IEEE-754 on sane platforms), and uses
>> two ifs. I haven't benchmarked it, but I'm pretty sure mine is faster.
>>>
>>>
>>> > +static void predict(AACContext * ac, PredictorState * ps, float* coef,
>>> > int output_enable) {
>>>
>>> > +    const float a = 0.953125;
>>>
>>> 61.0/64 (could be in a comment too if the literal float is prefered ...)
>>>
>>>
>>> > +    const float alpha = 0.90625;
>>>
>>> 29.0/32
>>
>>
>> ok, do you have any idea why those values were chosen? I suppose I could
>> try to hunt down some of the original literature.
>>>
>>>
>>> > +
>>> > +    float e0, e1;
>>> > +    float pv;
>>> > +
>>> > +    float k1, k2;
>>> > +
>>> > +    if (ps->var0 <= 1)
>>> > +        k1 = 0;
>>> > +    else
>>> > +        k1 = ps->cor0*flt16_even(a/ps->var0);
>>> > +
>>> > +    if (ps->var1 <= 1)
>>> > +        k2 = 0;
>>> > +    else
>>> > +        k2 = ps->cor1*flt16_even(a/ps->var1);
>>> > +
>>> > +    pv = k1*ps->r0 + k2*ps->r1;
>>> > +    pv = flt16_round(pv);
>>> > +    if (output_enable)
>>> > +        *coef += pv/-1024;
>>> > +
>>> > +    e0 = *coef*-1024;
>
> This -1024 constant is no good. It's tied to MMX dsputil.
>
>>>
>>> > +    e1 = e0-k1*ps->r0;
>>> > +
>>>
>>> > +    ps->cor1 = alpha*ps->cor1 + ps->r1*e1;
>>> > +    ps->var1 = alpha*ps->var1 + 0.5 * (ps->r1*ps->r1 + e1*e1);
>>> > +    ps->cor0 = alpha*ps->cor0 + ps->r0*e0;
>>> > +    ps->var0 = alpha*ps->var0 + 0.5 * (ps->r0*ps->r0 + e0*e0);
>>> > +
>>> > +    ps->r1 = a*(ps->r0-k1*e0);
>>> > +    ps->r0 = a*e0;
>>> > +
>>>
>>> > +    ps->r0   = flt16_trunc(ps->r0);
>>> > +    ps->r1   = flt16_trunc(ps->r1);
>>> > +    ps->cor0 = flt16_trunc(ps->cor0);
>>> > +    ps->cor1 = flt16_trunc(ps->cor1);
>>> > +    ps->var0 = flt16_trunc(ps->var0);
>>> > +    ps->var1 = flt16_trunc(ps->var1);
>>>
>>> the flt16_trunc() calls could be done when the values are calculated
>>
>> I had left that there in case the consensus was that we wanted to store
>> thes
>
> In addition to the changes discussed, I also fixed the non-MMX dsputil
> scaling
>

In light of suggestions from the floating point help thread, I have
yet another new version that uses IEEE-754 puns on x86 for the
rounding functions. Callgrind shows these to be significantly faster.
I only enabled them on x86 because I have no means to test or
benchmark on any other platform.

In addition the reset function now explicitly uses floats.

[...]

Regards,
Alex Converse
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 00-aac-predict.diff
Type: text/x-diff
Size: 9670 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20081113/838a94b4/attachment.diff>