[FFmpeg-devel] [RFC] AAC Encoder

Mon Aug 18 09:01:56 CEST 2008

On Sun, Aug 17, 2008 at 04:41:39PM +0200, Michael Niedermayer wrote:
[...]
> > > anyway before the AAC encoder can reach svn it MUST be significantly improved
> > > in terms of optimality as well as code cleanliness
> > > basically everything should be RD optimal unless either a faster and equally
> > > good heuristic exists or the RD optimal code is too slow.
> > 
> > well, that requires developing a much better RD-aware psy model.
> 
> Are you saying that what you implemented is much worse than whats possible?
> Well if you are saying that then ill belive it, so please implement the best
> ;)

well, the goal was AAC-LC bitstream writer with basic psychoacoustic framework,
optimal RD-encoder and world domination are aside tasks.
I'm not against it, but it will be implemented during FFmpeg Autumn of Patching.

> Besides a psychoacoustic model IMHO produces perceptual weights, either per
> bands or coefficients.
> Everything else should be done per RD theory.
> Now in principle other decissions could also be done on a psychoacoustic
> aware way but as we have seen from the quantization this is clearly not the
> case in the current model.
> What the current model does is it calculates these weights (in the form of
> scale factors) and the rest has absolutely nothing to do with psychoacoustics
> its just a trivial reference quantizer, trivial M/S selection based on better
> decorrelation, trivial IIR filter based short window selection [and this one
> is even suboptimal in its own way as it limits itself to 9 out of 128
> groupings].
> the scalefactors from the psy model should be useable as RD factors for
> weighting between rate and distortion. Iam pretty sure a relation like
> lambda = A*sf^B  with A and B constants should be more than good enough
> for our purposes, it is for mpeg4 ASP. I guess loren or dark shikari can
> comment on what the relation commonly used in h.264 between the quantization
> factor (NOT QP which has a log scale) and lambda is?
> 
> And to preempt the question about the values of A and B, they can be found
> simply by comparing a RD based encoder (which selects scalefactors based on
> the lamda values) to the 3gpp model, A and B should be selected so that
> both encoders choose most similar scalefactors

any papers reference aside mp3-tech.org would be good. 

> i will review the patch later, you have plenty of ideas to work on left as
> far as i can see.

I will try to get enough sleep this week to work on it with fresh head.

> [...]
> -- 
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
> 
> Those who are too smart to engage in politics are punished by being
> governed by those who are dumber. -- Plato