[FFmpeg-devel] Nellymoser encoder
Michael Niedermayer
michaelni
Fri Aug 29 22:36:10 CEST 2008
On Fri, Aug 29, 2008 at 08:55:23PM +0200, Bartlomiej Wolowiec wrote:
> Friday 29 August 2008 15:54:32 Michael Niedermayer napisa?(a):
> > On Fri, Aug 29, 2008 at 03:11:59PM +0200, Bartlomiej Wolowiec wrote:
> > > Friday 29 August 2008 00:02:36 Michael Niedermayer napisa?(a):
> > > > > +#define LUT_init_add -3134
> > > > > +#define LUT_init_size 31355 + LUT_init_add
> > > > > +static int LUT_init_table[LUT_init_size];
> > > >
> > > > i do not belive that the table needs to be that large
> > > >
> > > > > +
> > > > > +#define LUT_delta_add 11725
> > > > > +#define LUT_delta_size 12975 + LUT_delta_add
> > > > > +static int LUT_delta_table[LUT_delta_size];
> > > > > +
> > > > > +#define LUT_dequantization_mul 128.0
> > > > > +#define LUT_dequantization_add LUT_dequantization_mul * 2.7
> > > > > +#define LUT_dequantization_size (int)(LUT_dequantization_mul * 2.5 +
> > > > > LUT_dequantization_add) +#define LUT_dequantization_maxbits 6
> > > > > +static int
> > > > > LUT_dequantization_table[LUT_dequantization_maxbits][LUT_dequantizati
> > > > >on_s ize];
> > > >
> > > > neither do i belive that this one needs to be that large
> > > > besides they both can be uint8_t instead of int
> > > >
> > > > and the tables for fewer bits dont need to be as large as the largest
> > >
> > > Ok, I've tried to change sizes of these arrays. Unfortunately, now I have
> > > a problem, because I don't know how I can simply allocate memory for
> > > LUT_dequantization_table so that the whole is thread-safety.
> >
> > drop all the messy stuff and the problems will disapear
>
> Ok, I cleared it significantly. Now it looks much better.
yes, i also like it much more now
>
> > > > > +
> > > > > +void apply_mdct(NellyMoserEncodeContext *s, float *in, float *coefs)
> > > > > +{
> > > > > + DECLARE_ALIGNED_16(float, in_buff[NELLY_SAMPLES]);
> > > > > +
> > > > > + memcpy(&in_buff[0], &in[0], NELLY_SAMPLES * sizeof(float));
> > > > > + s->dsp.vector_fmul(in_buff, ff_sine_128, NELLY_BUF_LEN);
> > > > > + s->dsp.vector_fmul_reverse(in_buff + NELLY_BUF_LEN, in_buff +
> > > > > NELLY_BUF_LEN, ff_sine_128, NELLY_BUF_LEN); +
> > > > > ff_mdct_calc(&s->mdct_ctx, coefs, in_buff);
> > > > > +}
> > > >
> > > > The data is copied once in encode_frame and twice here
> > > > There is no need to copy the data 3 times.
> > > > vector_fmul can be used with a singl memcpy to get the data into any
> > > > destination, and vector_fmul_reverse doesnt even need 1 memcpy, so
> > > > overall a single memcpy is enough
> > >
> > > Hope that you meant something similar to my solution.
> >
> > no, you still do 2 memcpy() but now the code is really messy as well.
> >
> > what you should do is, for each block of samples you get from the user
> > 1. apply one half of the window onto it with vector_fmul_reverse and
> > destination of some internal buffer
> > 2. memcpy into the 2nd destination and apply the other half of the
> > window onto it with vector_fmul
> > 3. run the mdct as appropriate on the internal buffers.
>
> Hmm, I considered it, but I don't understand exactly what should I change...
> In the code I copy data two times:
> a) in encode_frame - I convert int16_t to float and copy data to s->buf - I
> need to do it somewhere because vector_mul requires float *. Additionally,
> part of the data is needed to the next call of encode_frame
> b) in apply_mdct - here I think that some additional part of buffer is needed.
> If I understood correctly I have to get rid of a), but how to get access to
> old data when the next call of encode_frame is performed and how call
> vector_fmul on int16_t?
have you tried setting AVCodec.sample_fmts to SAMPLE_FMT_FLT ?
I think ffmpeg should support this already. If it does not work then we can
keep int16 for now which would implicate more copying
[...]
> Index: libavcodec/nellymoserenc.c
> ===================================================================
> --- libavcodec/nellymoserenc.c (wersja 0)
> +++ libavcodec/nellymoserenc.c (wersja 0)
> @@ -0,0 +1,294 @@
> +/*
> + * Nellymoser encoder
> + * This code is developed as part of Google Summer of Code 2008 Program.
> + *
> + * Copyright (c) 2008 Bartlomiej Wolowiec
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +/**
> + * @file nellymoserenc.c
> + * Nellymoser encoder
> + * by Bartlomiej Wolowiec
> + *
> + * Generic codec information: libavcodec/nellymoserdec.c
> + *
> + * Some information also from: http://www1.mplayerhq.hu/ASAO/ASAO.zip
> + * (Copyright Joseph Artsimovich and UAB "DKD")
> + *
> + * for more information about nellymoser format, visit:
> + * http://wiki.multimedia.cx/index.php?title=Nellymoser
> + */
> +
> +#include "nellymoser.h"
> +#include "avcodec.h"
> +#include "dsputil.h"
> +
> +#define BITSTREAM_WRITER_LE
> +#include "bitstream.h"
> +
> +#define POW_TABLE_SIZE (1<<11)
> +#define POW_TABLE_OFFSET 3
> +
> +typedef struct NellyMoserEncodeContext {
> + AVCodecContext *avctx;
> + int last_frame;
ok (that is all the code from the empty line to here can be commited)
> + int bufsel;
> + int have_saved;
> + DSPContext dsp;
> + MDCTContext mdct_ctx;
> + DECLARE_ALIGNED_16(float, mdct_out[NELLY_SAMPLES]);
> + DECLARE_ALIGNED_16(float, buf[2][3 * NELLY_BUF_LEN]); ///< sample buffer
> +} NellyMoserEncodeContext;
> +
> +static float pow_table[POW_TABLE_SIZE]; ///< -pow(2, -i / 2048.0 - 3.0);
> +
> +static const uint8_t sf_lut[96] = {
> + 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 4, 4,
> + 5, 5, 5, 6, 7, 7, 8, 8, 9, 10, 11, 11, 12, 13, 13, 14,
> + 15, 15, 16, 17, 17, 18, 19, 19, 20, 21, 22, 22, 23, 24, 25, 26,
> + 27, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 37, 38, 39, 40,
> + 41, 41, 42, 43, 44, 45, 45, 46, 47, 48, 49, 50, 51, 52, 52, 53,
> + 54, 55, 55, 56, 57, 57, 58, 59, 59, 60, 60, 60, 61, 61, 61, 62,
> +};
> +
> +static const uint8_t sf_delta_lut[78] = {
> + 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 4, 4,
> + 4, 5, 5, 5, 6, 6, 7, 7, 8, 8, 9, 10, 10, 11, 11, 12,
> + 13, 13, 14, 15, 16, 17, 17, 18, 19, 19, 20, 21, 21, 22, 22, 23,
> + 23, 24, 24, 25, 25, 25, 26, 26, 26, 26, 27, 27, 27, 27, 27, 28,
> + 28, 28, 28, 28, 28, 29, 29, 29, 29, 29, 29, 29, 29, 30,
> +};
> +
> +static const uint8_t quant_lut[230] = {
> + 0,
> +
> + 0, 1, 2,
> +
> + 0, 1, 2, 3, 4, 5, 6,
> +
> + 0, 1, 1, 2, 2, 3, 3, 4, 5, 6, 7, 8, 9, 10, 11, 11,
> + 12, 13, 13, 13, 14,
> +
> + 0, 1, 1, 2, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 8,
> + 8, 9, 10, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
> + 22, 23, 23, 24, 24, 25, 25, 26, 26, 27, 27, 28, 28, 29, 29, 29,
> + 30,
> +
> + 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3,
> + 4, 4, 4, 5, 5, 5, 6, 6, 7, 7, 7, 8, 8, 9, 9, 9,
> + 10, 10, 11, 11, 11, 12, 12, 13, 13, 13, 13, 14, 14, 14, 15, 15,
> + 15, 15, 16, 16, 16, 17, 17, 17, 18, 18, 18, 19, 19, 20, 20, 20,
> + 21, 21, 22, 22, 23, 23, 24, 25, 26, 26, 27, 28, 29, 30, 31, 32,
> + 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 42, 43, 44, 44, 45, 45,
> + 46, 47, 47, 48, 48, 49, 49, 50, 50, 50, 51, 51, 51, 52, 52, 52,
> + 53, 53, 53, 54, 54, 54, 55, 55, 55, 56, 56, 56, 57, 57, 57, 57,
> + 58, 58, 58, 58, 59, 59, 59, 59, 60, 60, 60, 60, 60, 61, 61, 61,
> + 61, 61, 61, 61, 62,
> +};
> +
> +static const float quant_lut_mul[7] = { 0.0, 0.0, 2.0, 2.0, 5.0, 12.0, 36.6 };
> +static const float quant_lut_add[7] = { 0.0, 0.0, 2.0, 7.0, 21.0, 56.0, 157.0 };
> +static const uint8_t quant_lut_offset[8] = { 0, 0, 1, 4, 11, 32, 81, 230 };
ok (yes all the tables can be commited)
> +
> +void apply_mdct(NellyMoserEncodeContext *s)
> +{
> + DECLARE_ALIGNED_16(float, in_buff[NELLY_SAMPLES]);
> +
> + memcpy(in_buff, s->buf[s->bufsel], NELLY_BUF_LEN * sizeof(float));
> + s->dsp.vector_fmul(in_buff, ff_sine_128, NELLY_BUF_LEN);
> + s->dsp.vector_fmul_reverse(in_buff + NELLY_BUF_LEN, s->buf[s->bufsel] + NELLY_BUF_LEN, ff_sine_128,
> + NELLY_BUF_LEN);
> + ff_mdct_calc(&s->mdct_ctx, s->mdct_out, in_buff);
> +
> + s->dsp.vector_fmul(s->buf[s->bufsel] + NELLY_BUF_LEN, ff_sine_128, NELLY_BUF_LEN);
> + s->dsp.vector_fmul_reverse(s->buf[s->bufsel] + 2 * NELLY_BUF_LEN, s->buf[1 - s->bufsel], ff_sine_128,
> + NELLY_BUF_LEN);
> + ff_mdct_calc(&s->mdct_ctx, s->mdct_out + NELLY_BUF_LEN, s->buf[s->bufsel] + NELLY_BUF_LEN);
> +}
> +
> +static av_cold int encode_init(AVCodecContext *avctx)
> +{
> + NellyMoserEncodeContext *s = avctx->priv_data;
> + int i;
> +
> + if (avctx->channels != 1) {
> + av_log(avctx, AV_LOG_ERROR, "Nellymoser supports only 1 channel\n");
> + return -1;
> + }
ok
> +
> + if (avctx->sample_rate != 8000 && avctx->sample_rate != 11025 &&
> + avctx->sample_rate != 22050 && avctx->sample_rate != 44100) {
> + av_log(avctx, AV_LOG_ERROR, "Nellymoser works only with 8000, 11025, 22050 and 44100 sample rate\n");
> + return -1;
> + }
Maybe this could be limited to normal strict_std_compliance values.
> +
> + avctx->frame_size = NELLY_SAMPLES;
> + s->avctx = avctx;
> + ff_mdct_init(&s->mdct_ctx, 8, 0);
> + dsputil_init(&s->dsp, avctx);
> +
> + /* Generate overlap window */
> + ff_sine_window_init(ff_sine_128, 128);
> + for (i = 0; i < POW_TABLE_SIZE; i++)
> + pow_table[i] = -pow(2, -i / 2048.0 - 3.0 + POW_TABLE_OFFSET);
> +
> + return 0;
> +}
ok
> +
> +static av_cold int encode_end(AVCodecContext *avctx)
> +{
> + NellyMoserEncodeContext *s = avctx->priv_data;
> +
> + ff_mdct_end(&s->mdct_ctx);
> + return 0;
> +}
ok
> +
> +#define find_best(val, table, LUT, LUT_add, LUT_size) \
> + best_idx = \
> + LUT[av_clip ((((int)val) >> 8) + LUT_add, 0, LUT_size - 1)]; \
> + if (abs(val - table[best_idx]) > abs(val - table[best_idx + 1])) \
> + best_idx++;
(int)some_float is slow, lrintf() should be faster
also if val is a float instead of an int then fabs() may actually be better
than abs()
> +
> +/**
> + * Encodes NELLY_SAMPLES samples. It assumes, that samples contains 3 * NELLY_BUF_LEN values
> + * @param s encoder context
> + * @param output output buffer
> + * @param output_size size of output buffer
> + */
> +static void encode_block(NellyMoserEncodeContext *s, unsigned char *output, int output_size)
> +{
> + PutBitContext pb;
> + int i, band, block, best_idx, power_idx = 0;
> + float power_val, power_candidate, coeff, coeff_sum;
> + int band_start, band_end;
> + float pows[NELLY_FILL_LEN];
> + int bits[NELLY_BUF_LEN];
> +
> + const float C = 1.0;
> + const float D = 2.0;
> +
> + apply_mdct(s);
> +
> + init_put_bits(&pb, output, output_size * 8);
> +
> + band_start = 0;
> + band_end = ff_nelly_band_sizes_table[0];
> + for (band = 0; band < NELLY_BANDS; band++) {
> + coeff_sum = 0;
> + for (i = band_start; i < band_end; i++) {
> + //coeff_sum += s->mdct_out[i ] * s->mdct_out[i ]
> + // + s->mdct_out[i + NELLY_BUF_LEN] * s->mdct_out[i + NELLY_BUF_LEN];
> + coeff_sum += pow(fabs(s->mdct_out[i]), D) + pow(fabs(s->mdct_out[i + NELLY_BUF_LEN]), D);
> + }
> + power_candidate =
> + //log(FFMAX(1.0, coeff_sum / (ff_nelly_band_sizes_table[band] << 7))) * 1024.0 / M_LN2;
> + C * log(FFMAX(1.0, coeff_sum / (ff_nelly_band_sizes_table[band] << 7))) * 1024.0 / log(D);
> +
> + if (band) {
> + power_candidate -= power_idx;
> + find_best(power_candidate, ff_nelly_delta_table, sf_delta_lut, 37, 78);
> + put_bits(&pb, 5, best_idx);
> + power_idx += ff_nelly_delta_table[best_idx];
> + } else {
> + //base exponent
> + find_best(power_candidate, ff_nelly_init_table, sf_lut, -20, 96);
> + put_bits(&pb, 6, best_idx);
> + power_idx = ff_nelly_init_table[best_idx];
> + }
the choice of power_idx/best_idx values could still be tried to be found
with viterbi. Its somewhat similar (and simpler) than our viterbi/trellis
ADPCM encoder
[...]
> +AVCodec nellymoser_encoder = {
> + .name = "nellymoser",
> + .type = CODEC_TYPE_AUDIO,
> + .id = CODEC_ID_NELLYMOSER,
> + .priv_data_size = sizeof(NellyMoserEncodeContext),
> + .init = encode_init,
> + .encode = encode_frame,
> + .close = encode_end,
> + .capabilities = CODEC_CAP_SMALL_LAST_FRAME | CODEC_CAP_DELAY,
> + .long_name = NULL_IF_CONFIG_SMALL("Nellymoser Asao Codec"),
> +};
ok
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Opposition brings concord. Out of discord comes the fairest harmony.
-- Heraclitus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080829/5e942834/attachment.pgp>
More information about the ffmpeg-devel
mailing list