[FFmpeg-devel] [PATCH] ALAC Encoder

Ramiro Polla ramiro.polla
Wed Aug 20 07:21:56 CEST 2008


Hi Michael,

On Mon, Aug 18, 2008 at 11:01 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Mon, Aug 18, 2008 at 09:38:53AM -0300, Ramiro Polla wrote:
>> Hi,
>>
>> On Sun, Aug 17, 2008 at 12:55 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
>> > On Sun, Aug 17, 2008 at 11:17:27AM -0300, Ramiro Polla wrote:
>> >> On Sun, Aug 17, 2008 at 10:15 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
>> >> > On Sun, Aug 17, 2008 at 09:09:00AM +0530, Jai Menon wrote:
>> >> >> Hi,
>> >> >>
>> >> >> On Sunday 17 Aug 2008 8:05:14 am Michael Niedermayer wrote:
>> >> >> > On Sun, Aug 17, 2008 at 04:14:43AM +0530, Jai Menon wrote:
>> >> > [...]
>> >> >> > > +static void alac_stereo_decorrelation(AlacEncodeContext *s)
>> >> >> > > +{
>> >> >> > > +    int32_t *left = s->sample_buf[0], *right = s->sample_buf[1];
>> >> >> > > +    int32_t tmp;
>> >> >> > > +    int i;
>> >> >> > > +
>> >> >> > > +    for(i=0; i<s->avctx->frame_size; i++) {
>> >> >> > > +        tmp = left[i];
>> >> >> > > +        left[i] = (tmp + right[i]) >> 1;
>> >> >> > > +        right[i] = tmp - right[i];
>> >> >> > > +    }
>> >> >> > >
>> >> >> > > +    s->interlacing_leftweight = 1;
>> >> >> > > +    s->interlacing_shift = 1;
>> >> >> >
>> >> >> > i do not belive this is optimal
>> >> >> >
>> >> >>
>> >> >> It may not be optimal in the sense that I do not adaptively select the
>> >> >> decorrelation scheme, but this is just the first iteration which aims at
>> >> >> getting a basic encoder into svn. And it is better than doing no
>> >> >> deorrelation. I did initially try out an adaptive approach but the difference
>> >> >> in compression wasn't that great. I'm looking into how this can be done in a
>> >> >> better manner. Till then, I was hoping if we could go with this.
>> >> >
>> >> > see the pca.c/h i posted in a reply to ramiro a few days ago
>> >> > it might be worth a try ...
>> >>
>> >> Speaking of that... I haven't finished integrating it in MLP (I'm
>> >> working on some other stuff atm), but it seems to be what I need.
>> >
>> >> Could you get it cleaned up and committed like you suggested?
>> >
>> > done
>>
>> I can get it working in my tests, but not in MLP =(
>>
>> I take something like this (I'll name it here s[channels][samples]):
>>             [samples]
>> [channel 0] 0 1 2 3 4 5 6 7 8
>> [channel 1] 0 1 2 3 4 5 6 7 8
>> [noise   0] 0 1 2 3 4 5 6 7 8
>> [noise   1] 0 1 2 3 4 5 6 7 8
>>
>> Pass it through the pca, <num_channels> samples at a time
>> pca_add(s[0][0], s[1][0], s[2][0], s[3][0])
>> pca_add(s[0][1], s[1][1], s[2][1], s[3][1])
>> ...
>> pca_add(s[0][x], s[1][x], s[2][x], s[3][x])
>>
>> Solve the pca and get the eigenvectors (I'll name it here as e[][])
>>
>> Multiply them both to a new buffer (s2[channels][samples])
>> s2[0][0] = s[0][0] * e[0][0] + s[1][0] * e[0][1] +
>>            s[2][0] * e[0][2] + s[3][0] * e[0][3]
>> s2[0][1] = s[0][1] * e[0][0] + s[1][1] * e[0][1] +
>>            s[2][1] * e[0][2] + s[3][1] * e[0][3]
>> ...
>> s2[0][x] = s[0][x] * e[0][0] + s[1][x] * e[0][1] +
>>            s[2][x] * e[0][2] + s[3][x] * e[0][3]
>>
>> s2[1][0] = s[0][0] * e[1][0] + s[1][0] * e[1][1] +
>>            s[2][0] * e[1][2] + s[3][0] * e[1][3]
>> s2[1][1] = s[0][1] * e[1][0] + s[1][1] * e[1][1] +
>>            s[2][1] * e[1][2] + s[3][1] * e[1][3]
>> ...
>> s2[1][x] = s[0][x] * e[1][0] + s[1][x] * e[1][1] + s[2][x] * e[1][2] +
>> s[3][x] * e[1][3]
>>
>> In this case I'd get some values in s2[0][x], and 0 for s2[i>0][x],
>> since all channels are equal.
>>
>> I then multiply again by the eigenvectors (with something transposed
>> in the middle, I forgot which), and get s[][] again.
>>
>> But MLP doesn't have the intermediate s2 buffer. It overwrites s[][]
>> directly. From my stereo samples, it usually only infers s[1][] out of
>> s[0,2,3][] (2, 3 being noise channels). So there's only one lossless
>> matrix for channel 1. It works like:
>> s[1][0] = s[0][0] * e[1][0] + s[1][0] * e[1][1] +
>>           s[2][0] * e[1][2] + s[3][0] * e[1][3]
>> s[1][1] = s[0][1] * e[1][0] + s[1][1] * e[1][1] +
>>           s[2][1] * e[1][2] + s[3][1] * e[1][3]
>> ...
>> s[1][x] = s[0][x] * e[1][0] + s[1][x] * e[1][1] +
>>           s[2][x] * e[1][2] + s[3][x] * e[1][3]
>>
>> If there were more matrices, it would always use the previously
>> overwritten values.
>>
>> Can I still achieve this with PCA?
>
> yes
>
> there are many things that can be tried ...
> first is to simply just decorrelate only one channel
> and leave the others as is, this should be easy and similar to the
> other encoder
>
> s[r] = s[0]e[0]/e[r] + s[1]e[1]/e[r] + ...+ s[n]e[n]/e[r]
>
> e here is the eigenvector with the smallest eigenvalue
> r is choosen so that e[r] is largest

I'm still stuck on this. I couldn't even get the first one (which
should work like the other encoder).

If I divide e[n]/e[r], I have to use different coeff values for the
encoder and for the decoder. The matrix coeffs in the encoder and the
decoder aren't exactly the same I suppose...

Also in your example you give only one index for each value. what
channel does s[n] relate to, and what row is e[r]?

Could you be a little more verbose (or more C-ish =) on how to achieve this?

> second, lets assume E is our eigenvector matrix with which we want to
> transform the channels, S is a vector of the samples of all channels at a
> single time
> what we want is
> E*S
>
> and when we decorrelated one channel (as for example in "first" above)
> then we have applied a linear transform to S that is S1=L*S (L being a matrix
> with one row of the e[x]/e[r] coeffs and otherwise the identify matrix)
> this would make our original goal of E*S look like E*L^-1*S1, now
> we can multiply the left side out to get E1*S1 with E1=(E*L^-1)
> and then repeat "first" with the next eigenvector.
>
> note1 L^-1 is just the identify matrix with one row replaced by e[x]/e[r]
> and signs fliped on all but one element of that row if i didnt make an
> error, so this should all be rather trivial and fast 1-2 line loops in C
> theres not even any need to ever store L or L^-1 as matrixes, its just
> convenient to think of them that way ...
>
> note2 i do not know which order is the best when more than 1 channel is
> decorrelated like this, but it does matter, different order leads to
> different scalings for the channels due to preserving of losslessness

Ramiro Polla




More information about the ffmpeg-devel mailing list