[FFmpeg-devel] [PATCH] wmapro decoder
Michael Niedermayer
michaelni
Mon Aug 3 02:45:46 CEST 2009
On Sun, Aug 02, 2009 at 03:11:50PM +0200, Sascha Sommer wrote:
[...]
> >
> > [...]
> >
> > > + /** decode transform type */
> > > + if (chgroup->num_channels == 2) {
> > > + if (get_bits1(&s->gb)) {
> > > + if (get_bits1(&s->gb)) {
> > > + av_log_ask_for_sample(s->avctx,
> > > + "unsupported channel transform type\n");
> > > + }
> > > + } else {
> > >
> > > + if (s->num_channels == 2) {
> > > + chgroup->transform = 1;
> > > + } else {
> > > + chgroup->transform = 2;
> > > + /** cos(pi/4) */
> > > + chgroup->decorrelation_matrix[0] = 0.70703125;
> > > + chgroup->decorrelation_matrix[1] = -0.70703125;
> > > + chgroup->decorrelation_matrix[2] = 0.70703125;
> > > + chgroup->decorrelation_matrix[3] = 0.70703125;
> > > + }
> >
> > why the special handling of 2 vs. >2 channels here?
> >
>
> When the stream has only 2 channels, the channels are M/S stereo coded
> (transform 1)
> When the stream has more than 2 channels, the matrix multiplication is used.
> for the 2 channels that contain data for the current subframe length/offset.
> (num_channels in the channel group != num_channels in the stream)
iam not sure if we talk about the same thing or if i misunderstand you but
the 2channel in a subframe and the M/S case look pretty much the same to me
if so i wonder why they are not handled by the same code ...
[...]
> +/**
> + * @brief frame specific decoder context for a single channel
> + */
> +typedef struct {
> + int16_t prev_block_len; ///< length of the previous block
> + uint8_t transmit_coefs;
> + uint8_t num_subframes;
> + uint16_t subframe_len[MAX_SUBFRAMES]; ///< subframe length in samples
> + uint16_t subframe_offset[MAX_SUBFRAMES]; ///< subframe positions in the current frame
> + uint8_t cur_subframe; ///< current subframe number
> + uint16_t channel_len; ///< channel frame length in samples
do we need that in the context or can it be a local var?
also if i understand the code the variable name is not too good
> + uint16_t decoded_samples; ///< already processed samples
the _number_of_ already processed samples ?
> + uint8_t grouped; ///< channel is part of a group
> + int quant_step; ///< quantization step for the current subframe
> + int8_t transmit_sf; ///< transmit scale factors for the current subframe
flag indicating that ... ?
> + int8_t reuse_sf; ///< share scale factors between subframes
> + int8_t scale_factor_step; ///< scaling step for the current subframe
> + int max_scale_factor; ///< maximum scale factor for the current subframe
> + int scale_factors[MAX_BANDS]; ///< scale factor values for the current subframe
> + int saved_scale_factors[MAX_BANDS]; ///< scale factors from a previous subframe
> + int16_t scale_factor_block_len; ///< scale factor reference block length
> + float* coeffs; ///< pointer to the subframe decode buffer
> + DECLARE_ALIGNED_16(float, out[2*WMAPRO_BLOCK_MAX_SIZE]); ///< output buffer
> +} WMA3ChannelCtx;
> +
> +/**
> + * @brief channel group for channel transformations
> + */
> +typedef struct {
> + uint8_t num_channels; ///< number of channels in the group
> + int8_t transform; ///< controls the type of the transform
> + int8_t transform_band[MAX_BANDS]; ///< controls if the transform is enabled for a certain band
> + float decorrelation_matrix[WMAPRO_MAX_CHANNELS*WMAPRO_MAX_CHANNELS];
> + float* channel_data[WMAPRO_MAX_CHANNELS]; ///< transformation coefficients
> +} WMA3ChannelGroup;
> +
> +/**
> + * @brief main decoder context
> + */
> +typedef struct WMA3DecodeContext {
> + /* generic decoder variables */
> + AVCodecContext* avctx; ///< codec context for av_log
> + DSPContext dsp; ///< accelerated dsp functions
> + uint8_t frame_data[MAX_FRAMESIZE +
> + FF_INPUT_BUFFER_PADDING_SIZE];///< compressed frame data
> + MDCTContext mdct_ctx[WMAPRO_BLOCK_SIZES]; ///< MDCT context per block size
> + DECLARE_ALIGNED_16(float, tmp[WMAPRO_BLOCK_MAX_SIZE]); ///< imdct output buffer
> + float* windows[WMAPRO_BLOCK_SIZES]; ///< windows for the different block sizes
> +
> + /* frame size dependent frame information (set during initialization) */
> + uint8_t lossless; ///< lossless mode
> + uint32_t decode_flags; ///< used compression features
> + uint8_t len_prefix; ///< frame is prefixed with its length
> + uint8_t dynamic_range_compression; ///< frame contains DRC data
> + uint8_t bits_per_sample;
i think this should be explained more completely as similarly named vars
exist in AVCodecContext, that is in how far is that different ...
> + uint16_t samples_per_frame; ///< number of samples to output
> + uint16_t log2_frame_size;
> + int8_t num_channels;
same issue
> + int8_t lfe_channel; ///< lfe channel index
> + uint8_t max_num_subframes; ///< maximum number of subframes
that doxy is redundant
> + int8_t num_possible_block_sizes; ///< number of distinct block sizes that can be found in the file
> + uint16_t min_samples_per_subframe; ///< minimum samples per subframe
same
> + int8_t num_sfb[WMAPRO_BLOCK_SIZES]; ///< scale factor bands per block size
> + int16_t sfb_offsets[WMAPRO_BLOCK_SIZES][MAX_BANDS]; ///< scale factor band offsets (multiples of 4)
> + int16_t sf_offsets[WMAPRO_BLOCK_SIZES][WMAPRO_BLOCK_SIZES][MAX_BANDS]; ///< scale factor resample matrix
does this really need to be 16bit ?
[...]
> @@ -8,11 +249,6 @@
> WMA3DecodeContext *s = avctx->priv_data;
> int i;
>
> - av_freep(&s->num_sfb);
> - av_freep(&s->sfb_offsets);
> - av_freep(&s->subwoofer_cutoffs);
> - av_freep(&s->sf_offsets);
> -
> for (i=0 ; i<WMAPRO_BLOCK_SIZES ; i++)
> ff_mdct_end(&s->mdct_ctx[i]);
>
hunk ok of course
> @@ -20,6 +256,374 @@
> }
>
> /**
> + *@brief Initialize the decoder.
> + *@param avctx codec context
> + *@return 0 on success, -1 otherwise
> + */
> +static av_cold int decode_init(AVCodecContext *avctx)
> +{
> + WMA3DecodeContext *s = avctx->priv_data;
> + uint8_t *edata_ptr = avctx->extradata;
> + unsigned int channel_mask;
> + int i;
> + int log2_num_subframes;
> +
> + s->avctx = avctx;
> + dsputil_init(&s->dsp, avctx);
> +
> + avctx->sample_fmt = SAMPLE_FMT_FLT;
> +
> + if (avctx->extradata_size >= 18) {
> + s->decode_flags = AV_RL16(edata_ptr+14);
> + channel_mask = AV_RL32(edata_ptr+2);
> + s->bits_per_sample = AV_RL16(edata_ptr);
> +#ifdef DEBUG
> + /** dump the extradata */
> + for (i=0 ; i<avctx->extradata_size ; i++)
> + av_log(avctx, AV_LOG_DEBUG, "[%x] ",avctx->extradata[i]);
> + av_log(avctx, AV_LOG_DEBUG, "\n");
> +#endif
dprintf()
> +
> + } else {
> + av_log_ask_for_sample(avctx, "Unknown extradata size\n");
> + return AVERROR_INVALIDDATA;
> + }
> +
> + /** generic init */
> + s->log2_frame_size = av_log2(avctx->block_align) + 4;
> +
> + /** frame info */
> + s->skip_frame = 1; /** skip first frame */
> + s->packet_loss = 1;
> + s->len_prefix = (s->decode_flags & 0x40);
> +
> + if (!s->len_prefix) {
> + av_log_ask_for_sample(avctx, "no length prefix\n");
> + return AVERROR_INVALIDDATA;
> + }
odd indention depth
> +
> + /** get frame len */
> + s->samples_per_frame = 1 << ff_wma_get_frame_len_bits(avctx->sample_rate,
> + 3, s->decode_flags);
> +
> + /** init previous block len */
> + for (i=0;i<avctx->channels;i++)
> + s->channel[i].prev_block_len = s->samples_per_frame;
> +
> + /** subframe info */
> + log2_num_subframes = ((s->decode_flags & 0x38) >> 3);
log2_max_num_subframes ?
[...]
> +/**
> + *@brief Decode how the data in the frame is split into subframes.
> + * Every WMA frame contains the encoded data for a fixed number of
> + * samples per channel. The data for every channel might be split
> + * into several subframes. This function will reconstruct the list of
> + * subframes for every channel.
> + *
> + * If the subframes are not evenly split, the algorithm estimates the
> + * channels with the lowest number of total samples.
> + * Afterwards, for each of these channels a bit is read from the
> + * bitstream that indicates if the channel contains a subframe with the
> + * next subframe size that is going to be read from the bitstream or not.
> + * If a channel contains such a subframe, the subframe size gets added to
> + * the channel's subframe list.
> + * The algorithm repeats these steps until the frame is properly divided
> + * between the individual channels.
> + *
> + *@param s context
> + *@return 0 on success, < 0 in case of an error
> + */
> +static int decode_tilehdr(WMA3DecodeContext *s)
> +{
> + int c;
> +
> + /* should never consume more than 3073 bits (256 iterations for the
> + * while loop when always the minimum amount of 128 samples is substracted
> + * from missing samples in the 8 channel case)
> + * 1 + BLOCK_MAX_SIZE * MAX_CHANNELS / BLOCK_MIN_SIZE * (MAX_CHANNELS + 4)
> + */
> +
> + /** reset tiling information */
> + for (c=0;c<s->num_channels;c++) {
> + s->channel[c].num_subframes = 0;
> + s->channel[c].channel_len = 0;
> + }
> +
> + /** handle the easy case with one constant-sized subframe per channel */
> + if (s->max_num_subframes == 1) {
> + for (c=0;c<s->num_channels;c++) {
> + s->channel[c].num_subframes = 1;
> + s->channel[c].subframe_len[0] = s->samples_per_frame;
> + }
> + } else { /** subframe length and number of subframes is not constant */
> + int missing_samples = s->num_channels * s->samples_per_frame;
> + int subframe_len_bits = 0; /** bits needed for the subframe length */
> + int subframe_len_zero_bit = 0; /** first bit indicates if length is zero */
> + int fixed_channel_layout; /** all channels have the same subframe layout */
> +
> + fixed_channel_layout = get_bits1(&s->gb);
> +
> + /** calculate subframe len bits */
> + if (s->lossless) {
> + subframe_len_bits = av_log2(s->max_num_subframes - 1) + 1;
> + } else {
> + if (s->max_num_subframes == 16)
> + subframe_len_zero_bit = 1;
> + subframe_len_bits = av_log2(av_log2(s->max_num_subframes)) + 1;
> + }
> +
> + /** loop until the frame data is split between the subframes */
> + while (missing_samples > 0) {
isnt that the same as a simple check on min_channel_len, which at the end
should be frame len?
> + unsigned int channel_mask = 0;
> + int min_channel_len;
> + int read_channel_mask = 1;
> + int channels_for_cur_subframe = 0;
> + int subframe_len;
> + /** minimum number of samples that need to be read */
> + int min_samples = s->min_samples_per_subframe;
> +
> + if (fixed_channel_layout) {
> + read_channel_mask = 0;
> + channels_for_cur_subframe = s->num_channels;
> + min_samples *= channels_for_cur_subframe;
> + min_channel_len = s->channel[0].channel_len;
> + } else {
> + min_channel_len = s->samples_per_frame;
> + /** find channels with the smallest overall length */
> + for (c=0;c<s->num_channels;c++) {
> + if (s->channel[c].channel_len <= min_channel_len) {
> + if (s->channel[c].channel_len < min_channel_len) {
> + channels_for_cur_subframe = 0;
> + min_channel_len = s->channel[c].channel_len;
> + }
> + ++channels_for_cur_subframe;
> + }
> + }
> + min_samples *= channels_for_cur_subframe;
> +
> + if (channels_for_cur_subframe == 1 ||
> + min_samples == missing_samples)
these 2 look redundant
also the condition for reading the mask could just be used instead of
the temporary var read_channel_mask
> + read_channel_mask = 0;
> + }
> +
> + /** For every channel with the minimum length, 1 bit
> + might be transmitted that informs us if the channel
> + contains a subframe with the next subframe_len. */
> + if (read_channel_mask) {
> + channel_mask = get_bits(&s->gb,channels_for_cur_subframe);
> + if (!channel_mask) {
> + av_log(s->avctx, AV_LOG_ERROR,
> + "broken frame: zero frames for subframe_len\n");
> + return AVERROR_INVALIDDATA;
> + }
> + } else
> + channel_mask = -1;
> +
> + /** if we have the choice get next subframe length from the
> + bitstream */
> + if (min_samples != missing_samples) {
> + int log2_subframe_len = 0;
> + /* 1 bit indicates if the subframe length is zero */
no, its never zero, that would also make no sense
> + if (subframe_len_zero_bit) {
> + if (get_bits1(&s->gb)) {
> + log2_subframe_len = 1 +
> + get_bits(&s->gb,subframe_len_bits-1);
> + }
> + } else
> + log2_subframe_len = get_bits(&s->gb,subframe_len_bits);
> +
> + if (s->lossless) {
> + subframe_len =
> + s->samples_per_frame / s->max_num_subframes;
> + subframe_len *= log2_subframe_len + 1;
> + } else {
> + subframe_len =
> + s->samples_per_frame / (1 << log2_subframe_len);
> + }
> +
> + /** sanity check the length */
> + if (subframe_len < s->min_samples_per_subframe
> + || subframe_len > s->samples_per_frame) {
> + av_log(s->avctx, AV_LOG_ERROR,
> + "broken frame: subframe_len %i\n", subframe_len);
> + return AVERROR_INVALIDDATA;
> + }
> + } else
> + subframe_len = s->min_samples_per_subframe;
> +
> + for (c=0; c<s->num_channels;c++) {
> + WMA3ChannelCtx* chan = &s->channel[c];
> +
> + /** add subframes to the individual channels */
> + if (min_channel_len == chan->channel_len) {
> + --channels_for_cur_subframe;
> + if (channel_mask & (1<<channels_for_cur_subframe)) {
id do a get_bits1() here instead of loading it in a mask and then extracting
it
(btw you can just do GetBitContext mask_gb= *s->gb)
[...]
> +/**
> + *@brief Extract the coefficients from the bitstream.
> + *@param s codec context
> + *@param c current channel number
> + *@return 0 on success, < 0 in case of bitstream errors
> + */
> +static int decode_coeffs(WMA3DecodeContext *s, int c)
> +{
> + int vlctable;
> + VLC* vlc;
> + WMA3ChannelCtx* ci = &s->channel[c];
> + int rl_mode = 0;
> + int cur_coeff = 0;
> + int num_zeros = 0;
> + const uint16_t* run;
> + const uint16_t* level;
> +
> + dprintf(s->avctx, "decode coefficients for channel %i\n",c);
> +
> + vlctable = get_bits1(&s->gb);
> + vlc = &coef_vlc[vlctable];
> +
> + if (vlctable) {
> + run = coef1_run;
> + level = coef1_level;
> + } else {
> + run = coef0_run;
> + level = coef0_level;
> + }
have you tried run = coef_run[vlctable] ... or so?
i mean it might be faster as it doesnt do a conditional branch ...
> +
> + /** decode vector coefficients (consumes up to 167 bits per iteration for
> + 4 vector coded large values) */
> + while (!rl_mode && cur_coeff + 3 < s->subframe_len) {
> + int vals[4];
> + int i;
> + unsigned int idx;
> +
> + idx = get_vlc2(&s->gb, vec4_vlc.table, VLCBITS, VEC4MAXDEPTH);
> +
> + if ( idx == HUFF_VEC4_SIZE - 1 ) {
> + for (i=0 ; i < 4 ; i+= 2) {
> + idx = get_vlc2(&s->gb, vec2_vlc.table, VLCBITS, VEC2MAXDEPTH);
> + if ( idx == HUFF_VEC2_SIZE - 1 ) {
> + vals[i] = get_vlc2(&s->gb, vec1_vlc.table, VLCBITS, VEC1MAXDEPTH);
> + if (vals[i] == HUFF_VEC1_SIZE - 1)
> + vals[i] += ff_wma_get_large_val(&s->gb);
> + vals[i+1] = get_vlc2(&s->gb, vec1_vlc.table, VLCBITS, VEC1MAXDEPTH);
> + if (vals[i+1] == HUFF_VEC1_SIZE - 1)
> + vals[i+1] += ff_wma_get_large_val(&s->gb);
> + } else {
> + vals[i] = symbol_to_vec2[idx] >> 4;
> + vals[i+1] = symbol_to_vec2[idx] & 0xF;
> + }
> + }
> + } else {
> + vals[0] = symbol_to_vec4[idx] >> 12;
> + vals[1] = (symbol_to_vec4[idx] >> 8) & 0xF;
> + vals[2] = (symbol_to_vec4[idx] >> 4) & 0xF;
> + vals[3] = symbol_to_vec4[idx] & 0xF;
the & 0xF; can be vertically aligned
[...]
> +/**
> + *@brief Extract scale factors from the bitstream.
> + *@param s codec context
> + *@return 0 on success, < 0 in case of bitstream errors
> + */
> +static int decode_scale_factors(WMA3DecodeContext* s)
> +{
> + int i;
> +
> + /** should never consume more than 5344 bits
> + * MAX_CHANNELS * (1 + MAX_BANDS * 23)
> + */
> +
> + for (i=0;i<s->channels_for_cur_subframe;i++) {
> + int c = s->channel_indexes_for_cur_subframe[i];
> + int* sf;
> + int* sf_end = s->channel[c].scale_factors + s->num_bands;
> +
> + /** resample scale factors for the new block size */
> + if (s->channel[c].reuse_sf) {
> + const int blocks_per_frame = s->samples_per_frame/s->subframe_len;
> + const int res_blocks_per_frame = s->samples_per_frame /
> + s->channel[c].scale_factor_block_len;
> + const int idx0 = av_log2(blocks_per_frame);
> + const int idx1 = av_log2(res_blocks_per_frame);
> + const int16_t* sf_offsets = s->sf_offsets[idx0][idx1];
> + int b;
> + for (b=0;b<s->num_bands;b++)
> + s->channel[c].scale_factors[b] =
> + s->channel[c].saved_scale_factors[*sf_offsets++];
> + }
> +
> + if (s->channel[c].cur_subframe > 0) {
> + s->channel[c].transmit_sf = get_bits1(&s->gb);
> + } else
> + s->channel[c].transmit_sf = 1;
> +
> + if (s->channel[c].transmit_sf) {
> +
> + if (!s->channel[c].reuse_sf) {
> + int val;
> + /** decode DPCM coded scale factors */
> + s->channel[c].scale_factor_step = get_bits(&s->gb,2) + 1;
> + val = 45 / s->channel[c].scale_factor_step;
> + for (sf = s->channel[c].scale_factors; sf < sf_end; sf++) {
> + val += get_vlc2(&s->gb, sf_vlc.table, SCALEVLCBITS, SCALEMAXDEPTH) - 60;
> + *sf = val;
> + }
> + } else {
> + int i;
> + /** run level decode differences to the resampled factors */
> + for (i=0;i<s->num_bands;i++) {
> + int idx;
> + int skip;
> + int val;
> + int sign;
> +
> + idx = get_vlc2(&s->gb, sf_rl_vlc.table, VLCBITS, SCALERLMAXDEPTH);
> +
> + if ( !idx ) {
> + uint32_t code = get_bits(&s->gb,14);
> + val = code >> 6;
> + sign = (code & 1) - 1;
> + skip = (code & 0x3f)>>1;
> + } else if (idx == 1) {
> + break;
> + } else {
> + skip = scale_rl_run[idx];
> + val = scale_rl_level[idx];
vertical align
> + sign = get_bits1(&s->gb)-1;
> + }
> +
> + i += skip;
> + if (i >= s->num_bands) {
> + av_log(s->avctx,AV_LOG_ERROR,
> + "invalid scale factor coding\n");
> + return AVERROR_INVALIDDATA;
> + } else
> + s->channel[c].scale_factors[i] += (val ^ sign) - sign;
the else is superflous
> + }
> + }
> +
> + /** save transmitted scale factors so that they can be reused for
> + the next subframe */
> + memcpy(s->channel[c].saved_scale_factors,
> + s->channel[c].scale_factors,
> + sizeof(int) * s->num_bands);
exchanging 2 pointers should avoid that but maybe its not worth it
[...]
> +/**
> + *@brief Decode a single subframe (block).
> + *@param s codec context
> + *@return 0 on success, < 0 when decoding failed
> + */
> +static int decode_subframe(WMA3DecodeContext *s)
> +{
> + int offset = s->samples_per_frame;
> + int subframe_len = s->samples_per_frame;
> + int i;
> + int total_samples = s->samples_per_frame * s->num_channels;
> + int transmit_coeffs = 0;
> + int frame_offset;
> +
> + s->subframe_offset = get_bits_count(&s->gb);
> +
> + /** reset channel context and find the next block offset and size
> + == the next block of the channel with the smallest number of
> + decoded samples
> + */
> + for (i=0;i<s->num_channels;i++) {
> + s->channel[i].grouped = 0;
> + if (offset > s->channel[i].decoded_samples) {
> + offset = s->channel[i].decoded_samples;
> + subframe_len =
> + s->channel[i].subframe_len[s->channel[i].cur_subframe];
> + }
> + }
> +
> + dprintf(s->avctx,
> + "processing subframe with offset %i len %i\n",offset,subframe_len);
> +
> + /** get a list of all channels that contain the estimated block */
> + s->channels_for_cur_subframe = 0;
> + for (i=0;i<s->num_channels;i++) {
> + const int cur_subframe = s->channel[i].cur_subframe;
> + /** substract already processed samples */
> + total_samples -= s->channel[i].decoded_samples;
> +
> + /** and count if there are multiple subframes that match our profile */
> + if (offset == s->channel[i].decoded_samples &&
> + subframe_len == s->channel[i].subframe_len[cur_subframe]) {
> + total_samples -= s->channel[i].subframe_len[cur_subframe];
> + s->channel[i].decoded_samples +=
> + s->channel[i].subframe_len[cur_subframe];
> + s->channel_indexes_for_cur_subframe[s->channels_for_cur_subframe] = i;
> + ++s->channels_for_cur_subframe;
> + }
> + }
> +
> + /** check if the frame will be complete after processing the
> + estimated block */
> + if (!total_samples)
> + s->parsed_all_subframes = 1;
> +
> +
> + dprintf(s->avctx, "subframe is part of %i channels\n",
> + s->channels_for_cur_subframe);
> +
> + /** calculate number of scale factor bands and their offsets */
> + frame_offset = av_log2(s->samples_per_frame/subframe_len);
> + s->num_bands = s->num_sfb[frame_offset];
> + s->cur_sfb_offsets = s->sfb_offsets[frame_offset];
> + s->cur_subwoofer_cutoff = s->subwoofer_cutoffs[frame_offset];
> +
> + /** configure the decoder for the current subframe */
> + for (i=0;i<s->channels_for_cur_subframe;i++) {
> + int c = s->channel_indexes_for_cur_subframe[i];
> +
> + s->channel[c].coeffs = &s->channel[c].out[(s->samples_per_frame>>1)
> + + offset];
> + memset(s->channel[c].coeffs,0,sizeof(float) * subframe_len);
cant that be avoided?
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Concerning the gods, I have no means of knowing whether they exist or not
or of what sort they may be, because of the obscurity of the subject, and
the brevity of human life -- Protagoras
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090803/576d78c0/attachment.pgp>
More information about the ffmpeg-devel
mailing list