[FFmpeg-devel] [PATCH] NellyMoser audio decoder v2

Wed Oct 10 11:00:14 CEST 2007

Here's the comments for the Nellymoser update patch.

Michael Niedermayer wrote:
> Hi
>
> On Mon, Oct 01, 2007 at 02:25:59PM +0200, Benjamin Larsson wrote:
>   
>> Benjamin Larsson wrote:
>>     
>>> [...]
>>> Hi, I had some time yesterday and looked closer at the code. While there 
>>> is no doubt that the code is a TDAC implementation it doesn't seem like 
>>> the code can be replaced by the mdct code in ffmpeg. So I guess that the 
>>> fft replacement is what can be clean up for now. I'll sort out the table 
>>> generation for the rotations and the window.
>>>       
>
> if its not IMDCT then what is it?
>
>
> [...]
>
>
>   
>> +static const int ff_band_sizes_table[23] = {
>> +2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 4, 4, 5, 6, 6, 7, 8, 9, 10, 12, 14, 15
>> +};
>>     
>
> fits in uint8_t
>
>
>   

Fixed.

>> +
>> +static const float ff_nelly_signal_table[64] = {
>> +0.1250000000, 0.1249623969, 0.1248494014, 0.1246612966, 0.1243980974,
>> +0.1240599006, 0.1236471012, 0.1231596991, 0.1225982010, 0.1219628006,
>> +0.1212539002, 0.1204719990, 0.1196174994, 0.1186909974, 0.1176929995,
>> +0.1166241020, 0.1154849008, 0.1142762005, 0.1129987016, 0.1116530001,
>> +0.1102401987, 0.1087609008, 0.1072160974, 0.1056066975, 0.1039336994,
>> +0.1021981016, 0.1004009023, 0.0985433012, 0.0966262966, 0.0946511030,
>> +0.0926188976, 0.0905309021, 0.0883883014, 0.0861926004, 0.0839449018,
>> +0.0816465989, 0.0792991966, 0.0769039020, 0.0744623989, 0.0719759986,
>> +0.0694463030, 0.0668746978, 0.0642627999, 0.0616123006, 0.0589246005,
>> +0.0562013984, 0.0534444004, 0.0506552011, 0.0478353985, 0.0449868999,
>> +0.0421111993, 0.0392102003, 0.0362856016, 0.0333391018, 0.0303725004,
>> +0.0273876991, 0.0243862998, 0.0213702004, 0.0183412991, 0.0153013002,
>> +0.0122520998, 0.0091955997, 0.0061335000, 0.0030677000
>> +};
>>     
>
> this is cos(i/128.0*M_PI)/8
> which should be mentioned in a comment at least, also the name should be
> changed to reflect what it is
>
>   

Fixed.

> [...]
>   
>> +static void antialias(float *buf, float *audio)
>> +{
>> +    int i, end, mid_hi, mid_lo;
>> +    float a, b, c, d, e, f;
>> +
>> +    end = NELLY_BUF_LEN-1;
>> +    mid_hi = NELLY_BUF_LEN/2;
>> +    mid_lo = mid_hi-1;
>> +
>> +    for (i = 0; i < NELLY_BUF_LEN/4; i++) {
>> +        a = buf[end-2*i];
>> +        b = buf[2*i];
>> +        c = buf[2*i+1];
>> +        d = buf[end-2*i-1];
>> +
>> +        e = tcos[i];
>> +        f = tsin[i];
>> +
>> +        audio[2*i] = b*e-a*f;
>> +        audio[2*i+1] = -(a*e+b*f);
>> +
>> +        a = tsin[mid_lo-i];
>> +        b = tcos[mid_lo-i];
>> +
>> +        audio[end-2*i-1] = b*d-a*c;
>> +        audio[end-2*i] = -(b*c+a*d);
>> +    }
>>     
>
> i think it would be more readable if instead of efab
> the tcos/tsin would be stored in variables with the names s and c or
> wr and wi maybe?
>
> [...]
>   

Rewritten (compacted).

>   
>> +        audio[bot] = audio[mid_up]*t[bot]+state[bot]*t[top];
>> +        audio[top] = state[bot]*t[bot]-audio[mid_up]*t[top];
>> +        state[bot] = -audio[mid_down];
>> +
>> +        audio[mid_down] = s_top*t[mid_down]+state[mid_down]*t[mid_up];
>> +        audio[mid_up] = state[mid_down]*t[mid_down]-s_top*t[mid_up];
>> +        state[mid_down] = -s_bot;
>>     
>
> audio[bot] =  audio[mid_up]*t[bot]+state[bot   ]*t[top];
> audio[top] =  state[bot   ]*t[bot]-audio[mid_up]*t[top];
> state[bot] = -audio[mid_down];
>
> audio[mid_down] =  s_top         *t[mid_down]+state[mid_down]*t[mid_up];
> audio[mid_up  ] = state[mid_down]*t[mid_down]-s_top          *t[mid_up];
> state[mid_down] = -s_bot;
>
>
> [...]
>   

Fixed.

>> +    bitsum = sum_bits(sbuf, shift_saved, off);
>> +
>> +    if (bitsum != NELLY_DETAIL_BITS) {
>> +        shift = 0;
>> +        diff = bitsum - NELLY_DETAIL_BITS;
>> +
>> +        while (FFABS(diff) <= 16383) {
>> +            shift++;
>> +            diff *= 2;
>> +        }
>> +
>> +        diff = (diff * NELLY_BASE_OFF) >> 15;
>> +        shift = shift_saved-(NELLY_BASE_SHIFT+shift-15);
>> +
>> +        diff = signed_shift(diff, shift);
>> +
>> +        for (j = 1; j < 20; j++) {
>> +            tmp = off;
>> +            off += diff;
>> +            last_bitsum = bitsum;
>> +
>> +            bitsum = sum_bits(sbuf, shift_saved, off);
>> +
>> +            if ((bitsum-NELLY_DETAIL_BITS) * (last_bitsum-NELLY_DETAIL_BITS) <= 0)
>> +                break;
>> +        }
>> +
>> +        if (bitsum != NELLY_DETAIL_BITS) {
>> +            if (bitsum > NELLY_DETAIL_BITS) {
>> +                big_off = off;
>> +                off = tmp;
>> +                big_bitsum=bitsum;
>> +                small_bitsum=last_bitsum;
>> +            } else {
>> +                big_off = tmp;
>> +                big_bitsum=last_bitsum;
>> +                small_bitsum=bitsum;
>> +            }
>> +
>> +            while (bitsum != NELLY_DETAIL_BITS && j <= 19) {
>> +                diff = (big_off+off)>>1;
>> +                bitsum = sum_bits(sbuf, shift_saved, diff);
>> +                if (bitsum > NELLY_DETAIL_BITS) {
>> +                    big_off=diff;
>> +                    big_bitsum=bitsum;
>> +                } else {
>> +                    off = diff;
>> +                    small_bitsum=bitsum;
>> +                }
>> +                j++;
>> +            }
>>     
>
> this is a binary search and i think the variables could be named slightly
> better and the whole can be simplified alot unless its buggy (=not a
> working binary search)
> and maybe it should be moved into its own function
>
>
>   

Looked at it but I don't really understand what it does. Help cleaning 
this up would be appreciated.

>> +
>> +            if (abs(big_bitsum-NELLY_DETAIL_BITS) >=
>> +                abs(small_bitsum-NELLY_DETAIL_BITS)) {
>>     
>
> are the abs needed? isnt the sign of these expresions always the same?
>
> [...]
>   

They are needed.

>> +        bitsum = NELLY_DETAIL_BITS;
>> +        while (i < NELLY_FILL_LEN) {
>> +            bits[i] = 0;
>> +            i++;
>> +        }
>> +    }
>>     
>
> the bitsum= here makes no sense, its not used
>
>
> [...]
>   

Removed.

>> +        antialias(buf, aptr);
>> +        ff_fft_permute(&fftc, (FFTComplex*)aptr);
>> +        ff_fft_calc(&fftc, (FFTComplex*)aptr);
>> +
>> +         for (j = 0; j<64; j++) {
>> +             ((FFTComplex*)aptr)[j].im = -((FFTComplex*)aptr)[j].im;
>> +         }
>> +        complex2signal(aptr);
>>     
>
> indention looks odd
>
>
> [...]
>   

Removed.

>> +void nelly_util_floats2shorts(float audio[256], short shorts[256])
>> +{
>> +    int i;
>> +
>> +    for (i = 0; i < 256; i++) {
>> +        if (audio[i] >= 32767.0)
>> +            shorts[i] = 32767;
>> +        else if (audio[i] <= -32768.0)
>> +            shorts[i] = -32768;
>> +        else
>> +            shorts[i] = (short)audio[i];
>> +    }
>> +}
>>     
>
> shouldnt this be using the optimized dsp code?
>
> [...]
>   

Fixed.

>> +    while (remaining_size >= NELLY_BLOCK_LEN) {
>> +        nelly_decode_block(s, buf, s->float_buf);
>> +        nelly_util_floats2shorts(s->float_buf, data);
>> +        data += NELLY_SAMPLE_SIZE;
>> +        *data_size += NELLY_SAMPLE_SIZE;
>> +        buf += NELLY_BLOCK_LEN;
>> +        remaining_size -= NELLY_BLOCK_LEN;
>> +    }
>>     
>
> this looks like theres a AVParser missing to split it into packets
>
>
> [...]
>   

Rewritten.

MvH
Benjamin Larsson