[FFmpeg-devel] [PATCH] NellyMoser audio decoder v2
Benjamin Larsson
banan
Tue Oct 2 10:44:46 CEST 2007
Hi.
Michael Niedermayer wrote:
> Hi
>
> On Mon, Oct 01, 2007 at 02:25:59PM +0200, Benjamin Larsson wrote:
>
>> Benjamin Larsson wrote:
>>
>>> [...]
>>> Hi, I had some time yesterday and looked closer at the code. While there
>>> is no doubt that the code is a TDAC implementation it doesn't seem like
>>> the code can be replaced by the mdct code in ffmpeg. So I guess that the
>>> fft replacement is what can be clean up for now. I'll sort out the table
>>> generation for the rotations and the window.
>>>
>
> if its not IMDCT then what is it?
>
In the flash specs that are available they write it is just plain
Fourier transform. And from reading the code they use a 50% overlap and
the overlap window is a sine window (same as in most mdct based codecs).
But the antialiasing stuff suggest that there is some other thing also.
It looks a bit like the qdm2 transformation.
http://www.lisog.org/projekte/supported_projects/bounty-programm/nellymoserasaocodec-bounty.pdf
That pdf contains the same amount of info as the flash specs. But they
mention huffman here which I'm doubting is true.
Another source of info is this old email:
http://article.gmane.org/gmane.comp.video.ffmpeg.user/9232/match=nellymoser
>
> [...]
>
>
>
>> +static const int ff_band_sizes_table[23] = {
>> +2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 4, 4, 5, 6, 6, 7, 8, 9, 10, 12, 14, 15
>> +};
>>
>
> fits in uint8_t
>
>
I'll fix.
>
>> +
>> +static const float ff_nelly_signal_table[64] = {
>> +0.1250000000, 0.1249623969, 0.1248494014, 0.1246612966, 0.1243980974,
>> +0.1240599006, 0.1236471012, 0.1231596991, 0.1225982010, 0.1219628006,
>> +0.1212539002, 0.1204719990, 0.1196174994, 0.1186909974, 0.1176929995,
>> +0.1166241020, 0.1154849008, 0.1142762005, 0.1129987016, 0.1116530001,
>> +0.1102401987, 0.1087609008, 0.1072160974, 0.1056066975, 0.1039336994,
>> +0.1021981016, 0.1004009023, 0.0985433012, 0.0966262966, 0.0946511030,
>> +0.0926188976, 0.0905309021, 0.0883883014, 0.0861926004, 0.0839449018,
>> +0.0816465989, 0.0792991966, 0.0769039020, 0.0744623989, 0.0719759986,
>> +0.0694463030, 0.0668746978, 0.0642627999, 0.0616123006, 0.0589246005,
>> +0.0562013984, 0.0534444004, 0.0506552011, 0.0478353985, 0.0449868999,
>> +0.0421111993, 0.0392102003, 0.0362856016, 0.0333391018, 0.0303725004,
>> +0.0273876991, 0.0243862998, 0.0213702004, 0.0183412991, 0.0153013002,
>> +0.0122520998, 0.0091955997, 0.0061335000, 0.0030677000
>> +};
>>
>
> this is cos(i/128.0*M_PI)/8
> which should be mentioned in a comment at least, also the name should be
> changed to reflect what it is
>
Thanks for that. I saw the pattern but couldn't figure out the formula.
>
> [...]
>
>> +static void antialias(float *buf, float *audio)
>> +{
>> + int i, end, mid_hi, mid_lo;
>> + float a, b, c, d, e, f;
>> +
>> + end = NELLY_BUF_LEN-1;
>> + mid_hi = NELLY_BUF_LEN/2;
>> + mid_lo = mid_hi-1;
>> +
>> + for (i = 0; i < NELLY_BUF_LEN/4; i++) {
>> + a = buf[end-2*i];
>> + b = buf[2*i];
>> + c = buf[2*i+1];
>> + d = buf[end-2*i-1];
>> +
>> + e = tcos[i];
>> + f = tsin[i];
>> +
>> + audio[2*i] = b*e-a*f;
>> + audio[2*i+1] = -(a*e+b*f);
>> +
>> + a = tsin[mid_lo-i];
>> + b = tcos[mid_lo-i];
>> +
>> + audio[end-2*i-1] = b*d-a*c;
>> + audio[end-2*i] = -(b*c+a*d);
>> + }
>>
>
> i think it would be more readable if instead of efab
> the tcos/tsin would be stored in variables with the names s and c or
> wr and wi maybe?
>
I was thinking of rerolling the loops. Then it should be fairly trivial
to skip the usage of the a-f vars.
> [...]
>
>
>> + audio[bot] = audio[mid_up]*t[bot]+state[bot]*t[top];
>> + audio[top] = state[bot]*t[bot]-audio[mid_up]*t[top];
>> + state[bot] = -audio[mid_down];
>> +
>> + audio[mid_down] = s_top*t[mid_down]+state[mid_down]*t[mid_up];
>> + audio[mid_up] = state[mid_down]*t[mid_down]-s_top*t[mid_up];
>> + state[mid_down] = -s_bot;
>>
>
> audio[bot] = audio[mid_up]*t[bot]+state[bot ]*t[top];
> audio[top] = state[bot ]*t[bot]-audio[mid_up]*t[top];
> state[bot] = -audio[mid_down];
>
> audio[mid_down] = s_top *t[mid_down]+state[mid_down]*t[mid_up];
> audio[mid_up ] = state[mid_down]*t[mid_down]-s_top *t[mid_up];
> state[mid_down] = -s_bot;
>
>
> [...]
>
>> + bitsum = sum_bits(sbuf, shift_saved, off);
>> +
>> + if (bitsum != NELLY_DETAIL_BITS) {
>> + shift = 0;
>> + diff = bitsum - NELLY_DETAIL_BITS;
>> +
>> + while (FFABS(diff) <= 16383) {
>> + shift++;
>> + diff *= 2;
>> + }
>> +
>> + diff = (diff * NELLY_BASE_OFF) >> 15;
>> + shift = shift_saved-(NELLY_BASE_SHIFT+shift-15);
>> +
>> + diff = signed_shift(diff, shift);
>> +
>> + for (j = 1; j < 20; j++) {
>> + tmp = off;
>> + off += diff;
>> + last_bitsum = bitsum;
>> +
>> + bitsum = sum_bits(sbuf, shift_saved, off);
>> +
>> + if ((bitsum-NELLY_DETAIL_BITS) * (last_bitsum-NELLY_DETAIL_BITS) <= 0)
>> + break;
>> + }
>> +
>> + if (bitsum != NELLY_DETAIL_BITS) {
>> + if (bitsum > NELLY_DETAIL_BITS) {
>> + big_off = off;
>> + off = tmp;
>> + big_bitsum=bitsum;
>> + small_bitsum=last_bitsum;
>> + } else {
>> + big_off = tmp;
>> + big_bitsum=last_bitsum;
>> + small_bitsum=bitsum;
>> + }
>> +
>> + while (bitsum != NELLY_DETAIL_BITS && j <= 19) {
>> + diff = (big_off+off)>>1;
>> + bitsum = sum_bits(sbuf, shift_saved, diff);
>> + if (bitsum > NELLY_DETAIL_BITS) {
>> + big_off=diff;
>> + big_bitsum=bitsum;
>> + } else {
>> + off = diff;
>> + small_bitsum=bitsum;
>> + }
>> + j++;
>> + }
>>
>
> this is a binary search and i think the variables could be named slightly
> better and the whole can be simplified alot unless its buggy (=not a
> working binary search)
> and maybe it should be moved into its own function
>
>
I'll look into it.
>
>> +
>> + if (abs(big_bitsum-NELLY_DETAIL_BITS) >=
>> + abs(small_bitsum-NELLY_DETAIL_BITS)) {
>>
>
> are the abs needed? isnt the sign of these expresions always the same?
>
>
I'll look into it.
> [...]
>
>> + bitsum = NELLY_DETAIL_BITS;
>> + while (i < NELLY_FILL_LEN) {
>> + bits[i] = 0;
>> + i++;
>> + }
>> + }
>>
>
> the bitsum= here makes no sense, its not used
>
>
I'll look into it.
> [...]
>
>> + antialias(buf, aptr);
>> + ff_fft_permute(&fftc, (FFTComplex*)aptr);
>> + ff_fft_calc(&fftc, (FFTComplex*)aptr);
>> +
>> + for (j = 0; j<64; j++) {
>> + ((FFTComplex*)aptr)[j].im = -((FFTComplex*)aptr)[j].im;
>> + }
>> + complex2signal(aptr);
>>
>
> indention looks odd
>
It's just me who can't figure out what values to negate in complex2signal.
>
> [...]
>
>> +void nelly_util_floats2shorts(float audio[256], short shorts[256])
>> +{
>> + int i;
>> +
>> + for (i = 0; i < 256; i++) {
>> + if (audio[i] >= 32767.0)
>> + shorts[i] = 32767;
>> + else if (audio[i] <= -32768.0)
>> + shorts[i] = -32768;
>> + else
>> + shorts[i] = (short)audio[i];
>> + }
>> +}
>>
>
> shouldnt this be using the optimized dsp code?
>
Indeed it will.
> [...]
>
>> + while (remaining_size >= NELLY_BLOCK_LEN) {
>> + nelly_decode_block(s, buf, s->float_buf);
>> + nelly_util_floats2shorts(s->float_buf, data);
>> + data += NELLY_SAMPLE_SIZE;
>> + *data_size += NELLY_SAMPLE_SIZE;
>> + buf += NELLY_BLOCK_LEN;
>> + remaining_size -= NELLY_BLOCK_LEN;
>> + }
>>
>
> this looks like theres a AVParser missing to split it into packets
>
I'll look into it. According to the docs there is 1,2 or 4 frames worth
of data in every block.
MvH
Benjamin Larsson
More information about the ffmpeg-devel
mailing list