[FFmpeg-devel] [RFC] AAC Encoder
Michael Niedermayer
michaelni
Thu Aug 14 23:42:44 CEST 2008
On Thu, Aug 14, 2008 at 08:48:42AM +0300, Kostya wrote:
> On Wed, Aug 13, 2008 at 04:44:18PM +0200, Michael Niedermayer wrote:
> > On Wed, Aug 13, 2008 at 04:42:56PM +0300, Kostya wrote:
> > > On Wed, Aug 13, 2008 at 02:57:50PM +0200, Michael Niedermayer wrote:
> > [...]
> > >
> > > > > 3. based on psy model suggestions, encoder performs windowing and MDCT
> > > >
> > > > ok
> > > >
> > > >
> > > > > 4. encoder feeds coefficients to psy model
> > > > > 5. psy model by some magic determines scalefactors and use them to convert
> > > > > coefficients into integer form
> > > > > 6. encoder encodes obtained scalefactors and integer coefficients
> > > > >
> > > > > There are 11 codebooks for AAC, each designed to code either pairs or quads
> > > > > of values with sign coded separately or incorporated into value,
> > > > > each has a maximum value limit.
> > > > > While it's feasible to find the best encoding (like take raw coeff, quantize
> > > > > it and round up or down, then see which vector takes less bits), I feel
> > > > > it would be too slow.
> > > >
> > > > thats fine, you already have the fast variant implemented i do not suggest
> > > > that to be removed, what we need is a high quality variant. The encoder should
> > > > be better than other encoders ...
> > > > Also as the max value you mentioned is another example of where your code
> > > > fails fatally, a single +3 that would sound nearly as good when encoded as +2
> > > > could force a less efficient code book to be choosen. Also the +3 could be
> > > > encoded as a pulse, i dont remember if your code optimally choose between
> > > > pulse and normal codebook encodings?
> > >
> > > not optimally, unfortunately, but it can search for pulses and encode them
> > >
> > > in any case, here's a new encoder version
> >
> > please commit the parts ive ok-ed and/or send a patch without them
>
> done (there were okayed parts only in aacenc.c)
[...]
psy model review below
> /*
> * AAC encoder psychoacoustic model
> * Copyright (C) 2008 Konstantin Shishkov
> *
> * This file is part of FFmpeg.
> *
> * FFmpeg is free software; you can redistribute it and/or
> * modify it under the terms of the GNU Lesser General Public
> * License as published by the Free Software Foundation; either
> * version 2.1 of the License, or (at your option) any later version.
> *
> * FFmpeg is distributed in the hope that it will be useful,
> * but WITHOUT ANY WARRANTY; without even the implied warranty of
> * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> * Lesser General Public License for more details.
> *
> * You should have received a copy of the GNU Lesser General Public
> * License along with FFmpeg; if not, write to the Free Software
> * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> */
>
> #ifndef FFMPEG_AACPSY_H
> #define FFMPEG_AACPSY_H
>
> #include "avcodec.h"
> #include "aac.h"
> #include "lowpass.h"
>
> enum AACPsyModelType{
> AAC_PSY_NULL, ///< do nothing with frequencies
> AAC_PSY_NULL8, ///< do nothing with frequencies but work with short windows
> AAC_PSY_3GPP, ///< model following recommendations from 3GPP TS 26.403
>
> AAC_NB_PSY_MODELS ///< total number of psychoacoustic models, since it's not a part of the ABI new models can be added freely
> };
ok
>
> enum AACPsyModelMode{
> PSY_MODE_CBR, ///< follow bitrate as closely as possible
> PSY_MODE_ABR, ///< try to achieve bitrate but actual bitrate may differ significantly
> PSY_MODE_QUALITY, ///< try to achieve set quality instead of bitrate
> };
>
> #define PSY_MODEL_MODE_MASK 0x0000000F ///< bit fields for storing mode (CBR, ABR, VBR)
please use bitrate tolterance/bitrate/max/min bitrate/buffer size/...
from AVCodecContext for selecting the mode
> #define PSY_MODEL_NO_PULSE 0x00000010 ///< disable pulse searching
> #define PSY_MODEL_NO_SWITCH 0x00000020 ///< disable window switching
> #define PSY_MODEL_NO_ST_ATT 0x00000040 ///< disable stereo attenuation
> #define PSY_MODEL_NO_LOWPASS 0x00000080 ///< disable low-pass filtering
How does the user pass these to the codec?
I suspect in AVCodecContext, if so above would be redundant and unneeded
as AVCodecContext is availabe to the psy model
also i think that the choice of how encode a coefficient, that is as a
pulse or not is not a psychoacoustic question but one of entropy coding.
"which way needs fewer bits has better RD"
>
> #define PSY_MODEL_NO_PREPROC (PSY_MODEL_NO_ST_ATT | PSY_MODEL_NO_LOWPASS)
>
> #define PSY_MODEL_MODE(a) ((a) & PSY_MODEL_MODE_MASK)
>
> /**
> * context used by psychoacoustic model
> */
> typedef struct AACPsyContext {
> AVCodecContext *avctx; ///< encoder context
>
> int flags; ///< model flags
> const uint8_t *bands1024; ///< scalefactor band sizes for long (1024 samples) frame
> int num_bands1024; ///< number of scalefactor bands for long frame
> const uint8_t *bands128; ///< scalefactor band sizes for short (128 samples) frame
> int num_bands128; ///< number of scalefactor bands for short frame
This is a little AAC specific but then its called AACPsyContext
so iam not sure. Is the code supposed to be a generic psychoacoustic model
or AAC specific?
[...]
> /**
> * Convert coefficients to integers.
> * @return sum of coefficients
> * @see 3GPP TS26.403 5.6.2 "Scalefactor determination"
> */
> static inline int convert_coeffs(float *in, int *out, int size, int scale_idx)
quantize_coeffs
and scale_idx should be replaced by a quantization factor.
> {
> int i, sign, sum = 0;
> for(i = 0; i < size; i++){
> sign = in[i] > 0.0;
> out[i] = (int)(pow(FFABS(in[i]) * ff_aac_pow2sf_tab[200 - scale_idx + SCALE_ONE_POS - SCALE_DIV_512], 0.75) + 0.4054);
fabs()
> out[i] = av_clip(out[i], 0, 8191);
> sum += out[i];
> if(sign) out[i] = -out[i];
> }
> return sum;
> }
>
> static inline float unquant(int q, int scale_idx){
> return (FFABS(q) * cbrt(q*1.0)) * ff_aac_pow2sf_tab[200 + scale_idx - SCALE_ONE_POS];
> }
also please replace scale_idx by a factor, repeatly doing these lookups is
likely inefficient, also it is unflexible in relation to non aac
> static inline float calc_distortion(float *c, int size, int scale_idx)
> {
> int i;
> int q;
> float coef, unquant, sum = 0.0f;
> for(i = 0; i < size; i++){
> coef = FFABS(c[i]);
> q = (int)(pow(FFABS(coef) * ff_aac_pow2sf_tab[200 - scale_idx + SCALE_ONE_POS - SCALE_DIV_512], 0.75) + 0.4054);
> q = av_clip(q, 0, 8191);
> unquant = (q * cbrt(q)) * ff_aac_pow2sf_tab[200 + scale_idx - SCALE_ONE_POS + SCALE_DIV_512];
> sum += (coef - unquant) * (coef - unquant);
> }
> return sum;
> }
I think this and previous functions have some common code that can be
factorized out
[...]
> static void psy_null8_process(AACPsyContext *apc, int tag, int type, ChannelElement *cpe)
> {
> int start;
> int w, ch, g, i;
> int chans = type == ID_CPE ? 2 : 1;
>
> //detect M/S
> if(chans > 1 && cpe->common_window){
> start = 0;
> for(w = 0; w < cpe->ch[0].ics.num_windows; w++){
> for(g = 0; g < cpe->ch[0].ics.num_swb; g++){
> float diff = 0.0f;
>
> for(i = 0; i < cpe->ch[0].ics.swb_sizes[g]; i++)
> diff += fabs(cpe->ch[0].coeffs[start+i] - cpe->ch[1].coeffs[start+i]);
> cpe->ms.mask[w][g] = diff == 0.0;
> }
> }
> }
the mid side bits should also be detected ideally by encoding both ways
and choosing by rate distortion
above really looks a little lame, one should at least calculate either
bits or distortion and choose based on that if both are not ...
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
it is not once nor twice but times without number that the same ideas make
their appearance in the world. -- Aristotle
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080814/dae05de3/attachment.pgp>
More information about the ffmpeg-devel
mailing list