[FFmpeg-devel] [RFC] Generic psychoacoustic model interface

Wed Aug 27 15:05:45 CEST 2008

On Wed, Aug 27, 2008 at 05:21:51PM +0600, Alexander E. Patrakov wrote:
> Kostya wrote:
> > Here's my first attempt to define codec-agnostic psy model.
> > Here's an interface for it. I'm not sure about AC3, but
> > it should be possible to use it with DCA, Vorbis,
> > MPEG Audio Layers I-III and NBC, maybe WMA too.
> > In case somebody codes an implementation, of course.
> > Personally I plan to make my encoder use it backed with
> > already implemented 3GPP model.
> 
> 1) The general issue of using _any_ psychoacoustical model with HD (>48 kHz) 
> audio. How is the whole spectrum supposed to be split into bands? I.e., with 
> 192 kHz sampling rate (think DCA with proprietary extensions), are you really 
> sure to split the whole 96 kHz spectrum into just 128 equal subbands?

a) Gabriel said it's mostly for better time resolution, not frequency, so
coefficients will tend to gather at the beginning, leaving high bands empty
b) who said equal? Look at AAC, for example - lower frequency bands are smaller
(usually 4 coeffs), while high frequency bands are wider (32 or even 96 samples)

> 2) In FFPsyContext, the distinction between only _two_ frame types (long and 
> short) is hard-coded. For some codecs, this model makes no sense. E.g. for 
> DCA, a subframe always contains 4 subsubframes each corresponding to 256 PCM 
> samples (but the synthesis FIR is 512 taps long). One can either say "this is 
> a common scale factor for all 4 subsubframes", or define a transient location 
> at one subsubframe boundary and say "here is the scale factor before the 
> transient, and here it is after the transient". This doesn't really map into 
> the above model.
> 
> Moreover, who (codec or psy model) is responsible for transient detection (and 
> for non-DCA codecs, choice of short vs long blocks)?

that's window switching decision, it may even suggest window sizes sequence for
WMA, I think
And what you described is called grouping in AAC, so it should fit here

> 3) The whole "scalefactor band lengths for long frame" business assumes 
> non-overlapping (or almost non-overlapping) bands. This is simply not the 
> case for DCA. For DCA, each subband (i.e., the entity for which one can 
> specify a scale factor [ignoring transients here]) except the first and the 
> last, has a bell-shaped form, and subbands overlap in half. I.e. something 
> like this ASCII art attempts to depict:
> 
> .
>     .
>         .
>          .
> ,        .
>     ,  .
>     .  ,
> .       ,
> _       ,
>     _  ,
>     ,  _
> ,       _
>         _
>       _
>     _
> _

that looks a lot like AAC 8 short windows sequence
I think when the time comes, we'll be able to adapt it for DCA

> -- 
> Alexander E. Patrakov