[FFmpeg-devel] AAC-Main (round 2)
Alex Converse
alex.converse
Fri Nov 14 02:04:07 CET 2008
On Mon, Nov 10, 2008 at 4:32 PM, Alex Converse <alex.converse at gmail.com> wrote:
> On Sat, Nov 8, 2008 at 2:03 PM, Alex Converse <alex.converse at gmail.com>
> wrote:
>>
>> On Sat, Nov 8, 2008 at 12:48 PM, Michael Niedermayer <michaelni at gmx.at>
>> wrote:
>>>
>>> On Fri, Nov 07, 2008 at 07:20:45PM -0500, Alex Converse wrote:
>>> > Hi,
>>> >
>>> > Attached are a series of patches that implements AAC Main in FFmpeg.
>>> > These
>>> > are very similar to the first round. The only major change was avoiding
>>> > unnecessary type punning.
>>> >
>>> > While AAC-Main is rarely used, faad2 supports it, flash claims to
>>> > support it
>>> > (I haven't tested this), and we claim to support it but do not.
>>> >
>>> > Notes:
>>> > 1) Frequency domain prediction is described only in ISO/IEC 13818-7 not
>>> > in
>>> > 14496-3.
>>>
>>> > 2) The prediction operation uses 16-bit floats, using 32-bit floats
>>> > does not
>>> > give adequate accuracy so emulation routines to round to 16-bit are
>>> > included.
>>>
>>> > 3) As only 16-bit floats are required it could be possible to store the
>>> > prediction state with half the memory but I'm not sure how to approach
>>> > that
>>> > situation without resorting to IEEE type punning.
>>> >
>>> > Regards,
>>> >
>>> > Alex Converse
>>>
>>> [...]
>>> > +static void reset_predictor_group(PredictorState * ps, int group_num)
>>> > {
>>> > + int i;
>>> > + if (group_num)
>>> > + for (i = group_num-1; i < MAX_PREDICTORS; i+=30)
>>> > + reset_predict_state(&ps[i]);
>>> > +}
>>>
>>> i think it would be clearer if the if() was moved out of the function
>>
>> ok
>>
>>>
>>> [...]
>>> > @@ -786,6 +837,95 @@ static int decode_spectrum_and_dequant(AACContext
>>> > * ac, float coef[1024], GetBit
>>> > return 0;
>>> > }
>>> >
>>> > +static av_always_inline float flt16_round(float pf) {
>>> > + int exp;
>>> > + pf = frexp(pf, &exp);
>>> > + pf = ldexp(roundf(ldexp(pf, 8)), exp-8);
>>> > + return pf;
>>> > +}
>>> > +
>>> > +static av_always_inline float flt16_even(float pf) {
>>> > + int exp;
>>> > + pf = frexpf(pf, &exp);
>>> > + pf = ldexp(rintf(ldexp(pf, 8)), exp-8);
>>> > + return pf;
>>> > +}
>>> > +
>>> > +static av_always_inline float flt16_trunc(float pf) {
>>> > + int exp;
>>> > + pf = frexpf(pf, &exp);
>>> > + pf = ldexp(truncf(ldexp(pf, 8)), exp-8);
>>> > + return pf;
>>> > +}
>>> > +
>>>
>>> are these faster or slower than the code suggested in the spec?
>>
>> The code suggested in the spec uses type puns for flt16_round and
>> flt16_trunc.
>>
>> flt16_even as suggested in the spec casts back and forth between float,
>> double, and int, uses floating point multiplies and divides rather than
>> ldexpf (which can take advantage of IEEE-754 on sane platforms), and uses
>> two ifs. I haven't benchmarked it, but I'm pretty sure mine is faster.
>>>
>>>
>>> > +static void predict(AACContext * ac, PredictorState * ps, float* coef,
>>> > int output_enable) {
>>>
>>> > + const float a = 0.953125;
>>>
>>> 61.0/64 (could be in a comment too if the literal float is prefered ...)
>>>
>>>
>>> > + const float alpha = 0.90625;
>>>
>>> 29.0/32
>>
>>
>> ok, do you have any idea why those values were chosen? I suppose I could
>> try to hunt down some of the original literature.
>>>
>>>
>>> > +
>>> > + float e0, e1;
>>> > + float pv;
>>> > +
>>> > + float k1, k2;
>>> > +
>>> > + if (ps->var0 <= 1)
>>> > + k1 = 0;
>>> > + else
>>> > + k1 = ps->cor0*flt16_even(a/ps->var0);
>>> > +
>>> > + if (ps->var1 <= 1)
>>> > + k2 = 0;
>>> > + else
>>> > + k2 = ps->cor1*flt16_even(a/ps->var1);
>>> > +
>>> > + pv = k1*ps->r0 + k2*ps->r1;
>>> > + pv = flt16_round(pv);
>>> > + if (output_enable)
>>> > + *coef += pv/-1024;
>>> > +
>>> > + e0 = *coef*-1024;
>
> This -1024 constant is no good. It's tied to MMX dsputil.
>
>>>
>>> > + e1 = e0-k1*ps->r0;
>>> > +
>>>
>>> > + ps->cor1 = alpha*ps->cor1 + ps->r1*e1;
>>> > + ps->var1 = alpha*ps->var1 + 0.5 * (ps->r1*ps->r1 + e1*e1);
>>> > + ps->cor0 = alpha*ps->cor0 + ps->r0*e0;
>>> > + ps->var0 = alpha*ps->var0 + 0.5 * (ps->r0*ps->r0 + e0*e0);
>>> > +
>>> > + ps->r1 = a*(ps->r0-k1*e0);
>>> > + ps->r0 = a*e0;
>>> > +
>>>
>>> > + ps->r0 = flt16_trunc(ps->r0);
>>> > + ps->r1 = flt16_trunc(ps->r1);
>>> > + ps->cor0 = flt16_trunc(ps->cor0);
>>> > + ps->cor1 = flt16_trunc(ps->cor1);
>>> > + ps->var0 = flt16_trunc(ps->var0);
>>> > + ps->var1 = flt16_trunc(ps->var1);
>>>
>>> the flt16_trunc() calls could be done when the values are calculated
>>
>> I had left that there in case the consensus was that we wanted to store
>> thes
>
> In addition to the changes discussed, I also fixed the non-MMX dsputil
> scaling
>
In light of suggestions from the floating point help thread, I have
yet another new version that uses IEEE-754 puns on x86 for the
rounding functions. Callgrind shows these to be significantly faster.
I only enabled them on x86 because I have no means to test or
benchmark on any other platform.
In addition the reset function now explicitly uses floats.
[...]
Regards,
Alex Converse
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 00-aac-predict.diff
Type: text/x-diff
Size: 9670 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20081113/838a94b4/attachment.diff>
More information about the ffmpeg-devel
mailing list