[FFmpeg-devel] [RFC] Generic psychoacoustic model interface
Michael Niedermayer
michaelni
Thu Aug 28 22:36:57 CEST 2008
On Thu, Aug 28, 2008 at 08:10:26PM +0300, Kostya wrote:
> On Wed, Aug 27, 2008 at 04:33:17PM +0200, Michael Niedermayer wrote:
> > On Wed, Aug 27, 2008 at 11:35:20AM +0300, Kostya wrote:
> > > Here's my first attempt to define codec-agnostic psy model.
> > > Here's an interface for it. I'm not sure about AC3, but
> > > it should be possible to use it with DCA, Vorbis,
> > > MPEG Audio Layers I-III and NBC, maybe WMA too.
> > > In case somebody codes an implementation, of course.
> > > Personally I plan to make my encoder use it backed with
> > > already implemented 3GPP model.
> >
> > [...]
> > > /**
> > > * windowing related information
> > > */
> > > typedef struct FFWindowInfo{
> > > int window_type[2]; ///< window type (short/long/transitional, etc.) - current and previous
> > > int window_shape; ///< window shape (sine/KBD/whatever)
> >
> > > void *additional_info; ///< codec-dependent window information
> >
> > passing opaque data from psy to encoder is not clean, it requires
> > both to maintain a "hidden" compatible API
>
> Of course, unless we can decide on what will be needed for all encoders.
whenever a encoder needs somethig that isnt there it can be added.
[...]
> /*
> * audio encoder psychoacoustic model
> * Copyright (C) 2008 Konstantin Shishkov
> *
> * This file is part of FFmpeg.
> *
> * FFmpeg is free software; you can redistribute it and/or
> * modify it under the terms of the GNU Lesser General Public
> * License as published by the Free Software Foundation; either
> * version 2.1 of the License, or (at your option) any later version.
> *
> * FFmpeg is distributed in the hope that it will be useful,
> * but WITHOUT ANY WARRANTY; without even the implied warranty of
> * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> * Lesser General Public License for more details.
> *
> * You should have received a copy of the GNU Lesser General Public
> * License along with FFmpeg; if not, write to the Free Software
> * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> */
>
> #ifndef FFMPEG_AACPSY_H
> #define FFMPEG_AACPSY_H
>
> #include "avcodec.h"
>
> /** maximum possible number of bands */
> #define MAX_BANDS 128
ok
[...]
> /**
> * windowing related information
> */
> typedef struct FFWindowInfo{
> int window_type[2]; ///< window type (short/long/transitional, etc.) - current and previous
How is this "transitional" going to work with many different frame lengths?
is there 1? N*N ?
> int window_shape; ///< window shape (sine/KBD/whatever)
> void *additional_info; ///< codec-dependent window information, should be consistent between encoder and psy model
> }FFWindowInfo;
>
> /**
> * context used by psychoacoustic model
> */
> typedef struct FFPsyContext{
> AVCodecContext *avctx; ///< encoder context
>
> FFPsyBand bands[MAX_BANDS]; ///< frame bands information
> FFWindowInfo *win_info; ///< frame window info
>
> const uint8_t *bands; ///< scalefactor band sizes for possible fram sizes
fram?
> const int *num_bands; ///< number of scalefactor bands for possible frame sizes
> const uint8_t *short_bands; ///< scalefactor band sizes for short frame
> int num_short_bands; ///< number of scalefactor bands for short frame
this looks a little odd and inconsistant, why this special short_bands?
>
> void* model_priv_data; ///< psychoacoustic model implementation private data
> }FFPsyContext;
>
> /**
> * Initialize psychoacoustic model.
> *
> * @param ctx model context
> * @param avctx codec context
> * @param bands scalefactor band lengths for all frame lengths
> * @param num_bands number of scalefactor bands for all frame lengths
> *
> * @return zero if successful, a negative value if not
> */
> int ff_psy_init(FFPsyContext *ctx, AVCodecContext *avctx,
> const uint8_t **bands, const int* num_bands);
isnt that missing a the number of entries in num_bands?
>
> /**
> * Suggest window sequence for channel.
> *
> * @param ctx model context
> * @param audio samples for the current frame
> * @param la lookahead samples (NULL when unavailable)
> * @param channel number of channel element to analyze
> * @param prev_type previous window type
> *
> * @return suggested window information in a structure
> */
> FFWindowInfo ff_psy_suggest_window(FFPsyContext *ctx,
> const int16_t *audio, const int16_t *la,
> int channel, int prev_type);
>
> /**
> * Get psychoacoustic model suggestion about coding two bands as M/S
> */
> enum FFPsyMSDecision ff_psy_suggest_ms(FFPsyContext *ctx, FFPsyBand *left, FFPsyBand *right);
iam a little unsure about this one, but iam not objecting ...
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
I count him braver who overcomes his desires than him who conquers his
enemies for the hardest victory is over self. -- Aristotle
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080828/936dc1d4/attachment.pgp>
More information about the ffmpeg-devel
mailing list