[FFmpeg-devel] [PATCH] ALAC Encoder
Michael Niedermayer
michaelni
Mon Aug 18 16:35:14 CEST 2008
On Mon, Aug 18, 2008 at 04:01:22PM +0200, Michael Niedermayer wrote:
> On Mon, Aug 18, 2008 at 09:38:53AM -0300, Ramiro Polla wrote:
> > Hi,
> >
> > On Sun, Aug 17, 2008 at 12:55 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> > > On Sun, Aug 17, 2008 at 11:17:27AM -0300, Ramiro Polla wrote:
> > >> On Sun, Aug 17, 2008 at 10:15 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
> > >> > On Sun, Aug 17, 2008 at 09:09:00AM +0530, Jai Menon wrote:
> > >> >> Hi,
> > >> >>
> > >> >> On Sunday 17 Aug 2008 8:05:14 am Michael Niedermayer wrote:
> > >> >> > On Sun, Aug 17, 2008 at 04:14:43AM +0530, Jai Menon wrote:
> > >> > [...]
> > >> >> > > +static void alac_stereo_decorrelation(AlacEncodeContext *s)
> > >> >> > > +{
> > >> >> > > + int32_t *left = s->sample_buf[0], *right = s->sample_buf[1];
> > >> >> > > + int32_t tmp;
> > >> >> > > + int i;
> > >> >> > > +
> > >> >> > > + for(i=0; i<s->avctx->frame_size; i++) {
> > >> >> > > + tmp = left[i];
> > >> >> > > + left[i] = (tmp + right[i]) >> 1;
> > >> >> > > + right[i] = tmp - right[i];
> > >> >> > > + }
> > >> >> > >
> > >> >> > > + s->interlacing_leftweight = 1;
> > >> >> > > + s->interlacing_shift = 1;
> > >> >> >
> > >> >> > i do not belive this is optimal
> > >> >> >
> > >> >>
> > >> >> It may not be optimal in the sense that I do not adaptively select the
> > >> >> decorrelation scheme, but this is just the first iteration which aims at
> > >> >> getting a basic encoder into svn. And it is better than doing no
> > >> >> deorrelation. I did initially try out an adaptive approach but the difference
> > >> >> in compression wasn't that great. I'm looking into how this can be done in a
> > >> >> better manner. Till then, I was hoping if we could go with this.
> > >> >
> > >> > see the pca.c/h i posted in a reply to ramiro a few days ago
> > >> > it might be worth a try ...
> > >>
> > >> Speaking of that... I haven't finished integrating it in MLP (I'm
> > >> working on some other stuff atm), but it seems to be what I need.
> > >
> > >> Could you get it cleaned up and committed like you suggested?
> > >
> > > done
> >
> > I can get it working in my tests, but not in MLP =(
> >
> > I take something like this (I'll name it here s[channels][samples]):
> > [samples]
> > [channel 0] 0 1 2 3 4 5 6 7 8
> > [channel 1] 0 1 2 3 4 5 6 7 8
> > [noise 0] 0 1 2 3 4 5 6 7 8
> > [noise 1] 0 1 2 3 4 5 6 7 8
> >
> > Pass it through the pca, <num_channels> samples at a time
> > pca_add(s[0][0], s[1][0], s[2][0], s[3][0])
> > pca_add(s[0][1], s[1][1], s[2][1], s[3][1])
> > ...
> > pca_add(s[0][x], s[1][x], s[2][x], s[3][x])
> >
> > Solve the pca and get the eigenvectors (I'll name it here as e[][])
> >
> > Multiply them both to a new buffer (s2[channels][samples])
> > s2[0][0] = s[0][0] * e[0][0] + s[1][0] * e[0][1] +
> > s[2][0] * e[0][2] + s[3][0] * e[0][3]
> > s2[0][1] = s[0][1] * e[0][0] + s[1][1] * e[0][1] +
> > s[2][1] * e[0][2] + s[3][1] * e[0][3]
> > ...
> > s2[0][x] = s[0][x] * e[0][0] + s[1][x] * e[0][1] +
> > s[2][x] * e[0][2] + s[3][x] * e[0][3]
> >
> > s2[1][0] = s[0][0] * e[1][0] + s[1][0] * e[1][1] +
> > s[2][0] * e[1][2] + s[3][0] * e[1][3]
> > s2[1][1] = s[0][1] * e[1][0] + s[1][1] * e[1][1] +
> > s[2][1] * e[1][2] + s[3][1] * e[1][3]
> > ...
> > s2[1][x] = s[0][x] * e[1][0] + s[1][x] * e[1][1] + s[2][x] * e[1][2] +
> > s[3][x] * e[1][3]
> >
> > In this case I'd get some values in s2[0][x], and 0 for s2[i>0][x],
> > since all channels are equal.
> >
> > I then multiply again by the eigenvectors (with something transposed
> > in the middle, I forgot which), and get s[][] again.
> >
> > But MLP doesn't have the intermediate s2 buffer. It overwrites s[][]
> > directly. From my stereo samples, it usually only infers s[1][] out of
> > s[0,2,3][] (2, 3 being noise channels). So there's only one lossless
> > matrix for channel 1. It works like:
> > s[1][0] = s[0][0] * e[1][0] + s[1][0] * e[1][1] +
> > s[2][0] * e[1][2] + s[3][0] * e[1][3]
> > s[1][1] = s[0][1] * e[1][0] + s[1][1] * e[1][1] +
> > s[2][1] * e[1][2] + s[3][1] * e[1][3]
> > ...
> > s[1][x] = s[0][x] * e[1][0] + s[1][x] * e[1][1] +
> > s[2][x] * e[1][2] + s[3][x] * e[1][3]
> >
> > If there were more matrices, it would always use the previously
> > overwritten values.
> >
> > Can I still achieve this with PCA?
>
> yes
>
> there are many things that can be tried ...
> first is to simply just decorrelate only one channel
> and leave the others as is, this should be easy and similar to the
> other encoder
>
> s[r] = s[0]e[0]/e[r] + s[1]e[1]/e[r] + ...+ s[n]e[n]/e[r]
>
> e here is the eigenvector with the smallest eigenvalue
> r is choosen so that e[r] is largest
>
> second, lets assume E is our eigenvector matrix with which we want to
> transform the channels, S is a vector of the samples of all channels at a
> single time
> what we want is
> E*S
>
> and when we decorrelated one channel (as for example in "first" above)
> then we have applied a linear transform to S that is S1=L*S (L being a matrix
> with one row of the e[x]/e[r] coeffs and otherwise the identify matrix)
> this would make our original goal of E*S look like E*L^-1*S1, now
> we can multiply the left side out to get E1*S1 with E1=(E*L^-1)
> and then repeat "first" with the next eigenvector.
>
> note1 L^-1 is just the identify matrix with one row replaced by e[x]/e[r]
> and signs fliped on all but one element of that row if i didnt make an
> error, so this should all be rather trivial and fast 1-2 line loops in C
> theres not even any need to ever store L or L^-1 as matrixes, its just
> convenient to think of them that way ...
>
> note2 i do not know which order is the best when more than 1 channel is
> decorrelated like this, but it does matter, different order leads to
> different scalings for the channels due to preserving of losslessness
note, if above is too hard iam also fine with you just trying a few fixed
decorrelation combinations like i suggested for jai
like r,l r-l,r r-l,l, (r+l)>>1,r-l
also it might be worth to apply a high pass filter before feeding samples
to PCA, like s[0][0] - 2s[0][1] + s[0][2] as the low frequency stuff likely
has only a small effect on the bitrate and it might be better to optimize the
decorrelation just based on the more bitrate relevant high frquency parts.
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Everything should be made as simple as possible, but not simpler.
-- Albert Einstein
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080818/eac2f2cf/attachment.pgp>
More information about the ffmpeg-devel
mailing list