[FFmpeg-devel] AAC psychoacoustic model suggestions?

Thu Jun 19 19:51:46 CEST 2008

Kostya a ?crit :
> I know such words as ATH, Bark, GPsycho, AoTuV and
> ISO 13818-7 Annex C.
> 
> Can you give more tips/suggestions/whatever on
> psychoacoustic model implementations worth trying.
> I can have several models implemented, so good ideas won't
> be thrown away :)

I'd strongly suggest you to check the 3gpp AAC reference code, and its 
associated docs:
http://www.3gpp.org/ftp/Specs/html-info/26-series.htm
It is a quite clean encoder (compared to messy ones like Lame), and the 
docs provide some good introductions.

You might notice that this encoder doesn't bother about tonality 
estimation. While this is relatively unusual, it allows to have a 
simpler model, while reducing the risk of errors due to heuristic 
failures. I'd suggest you to also not bother implementing tonality 
estimation at first glance.

Regarding ISO 13818-7, you have to know that while it provides a 
suggestion of a psy model, it is far from providing a description of a 
GOOD psymodel. You can read it, understand it, but trying a direct 
algorithm to code transcription would probably be a waste of time.

Regarding Lame, we switched from the initial GPsycho model to NSPsytune 
a few years ago. Main differences between both are:
*Tonality estimation: Gpsycho uses predictability measure, while 
NSPsytune uses spectral flatness
*NSPsytune uses additive masking. I'd suggest you to not bother with 
additive masking, which is full of potential traps

It seems to be that what you would have to implement is (unordered list):
* Spreading(even a simple one)
* Computation of quantization error compared to masking, in order to use 
it within the potential quantization loop (a bit of perceptual RD?)
*LR or MS decision
* Block switching decision (beware: you should do it in advance for the 
next frame in order to know the block type of the current frame)
* A lowpass would not hurt
* probably some kind of dropout/spectral hole prevention

Of course, there are many more things that could also be implemented 
latter (multichannel, pns,...)

-- 
Gabriel Bouvigne
www.mp3-tech.org
personal page: http://gabriel.mp3-tech.org