[FFmpeg-devel] [PATCH] wmapro decoder
Sascha Sommer
saschasommer
Sat Jul 4 17:10:52 CEST 2009
Hi Michael,
On Dienstag, 23. Juni 2009, Michael Niedermayer wrote:
> sorry for the late review, but the patch was kinda big ...
>
No problem. I also did not have much time to work on it during the last days.
New patch with float output attached.
> On Sat, Jun 20, 2009 at 02:50:24PM +0200, Sascha Sommer wrote:
> [...]
>
> > > > + *@param s codec context
> > > > + */
> > > > +static void wma_inverse_channel_transform(WMA3DecodeContext *s)
> > > > +{
> > > > + int i;
> > > > +
> > > > + for (i=0;i<s->num_chgroups;i++) {
> > > > +
> > > > + if (s->chgroup[i].transform == 1) {
> > > > + /** M/S stereo decoding */
> > > > + int16_t* sfb_offsets = s->cur_sfb_offsets;
> > > > + float* ch0 = *sfb_offsets + s->channel[0].coeffs;
> > > > + float* ch1 = *sfb_offsets++ + s->channel[1].coeffs;
> > > > + const char* tb = s->chgroup[i].transform_band;
> > > > + const char* tb_end = tb + s->num_bands;
> > > > +
> > > > + while (tb < tb_end) {
> > > > + const float* ch0_end = s->channel[0].coeffs +
> > > > +
> > > > FFMIN(*sfb_offsets,s->subframe_len); + if (*tb++ == 1)
> > > > { + while (ch0 < ch0_end) {
> > > > + const float v1 = *ch0;
> > > > + const float v2 = *ch1;
> > > > + *ch0++ = v1 - v2;
> > > > + *ch1++ = v1 + v2;
> > > > + }
> > > > + }else{
> > > > + while (ch0 < ch0_end) {
> > > > + *ch0++ *= 181.0 / 128;
> > > > + *ch1++ *= 181.0 / 128;
> > > > + }
> > > > + }
> > > > + ++sfb_offsets;
> > > > + }
> > > > + }else if (s->chgroup[i].transform) {
> > > > + float data[MAX_CHANNELS];
> > > > + const int num_channels = s->chgroup[i].num_channels;
> > > > + float** ch_data = s->chgroup[i].channel_data;
> > > > + float** ch_end = ch_data + num_channels;
> > > > + const int8_t* tb = s->chgroup[i].transform_band;
> > > > + int16_t* sfb;
> > > > +
> > > > + /** multichannel decorrelation */
> > > > + for (sfb = s->cur_sfb_offsets ;
> > > > + sfb < s->cur_sfb_offsets + s->num_bands;sfb++) {
> > > > + if (*tb++ == 1) {
> > > > + int y;
> > > > + /** multiply values with the
> > > > decorrelation_matrix */ + for
> > > > (y=sfb[0];y<FFMIN(sfb[1], s->subframe_len);y++) { +
> > > > const float* mat =
> > > > s->chgroup[i].decorrelation_matrix; + const
> > > > float* data_end= data + num_channels; + float*
> > > > data_ptr= data;
> > > > + float** ch;
> > > > +
> > > > + for (ch = ch_data;ch < ch_end; ch++)
> > > > + *data_ptr++ = (*ch)[y];
> > > > +
> > > > + for (ch = ch_data; ch < ch_end; ch++) {
> > > > + float sum = 0;
> > > > + data_ptr = data;
> > > > + while (data_ptr < data_end)
> > > > + sum += *data_ptr++ * *mat++;
> > > > +
> > > > + (*ch)[y] = sum;
> > > > + }
> > > > + }
> > > > + }
> > > > + }
> > > > + }
> > > > + }
> > > > +}
> > >
> > > isnt the if() a special case of te else ? if so this can be simplified
> > > and if its faster as is the special case could be limited to the inner
> > > loop
> >
> > I don't understand what you mean here. Transform may either be 0, 1, 2.
> > Of course both 1 and 2 are larger than 0 but they stand for the transform
> > type so keeping all of them on the same level does not sound wrong to me
> > even if the 0 case does nothing. Maybe it should be changed to
> > else if(s->chgroup[i].transform == 2)?
>
> I meant that "isnt M/S a special case of multichannel and cant the
> multicannel code be used for it" ?
>
I don't think so. One would need more conditions in the loop.
> >
> > Btw. should the code be renamed to wmaprodec.c and wmaprodata.h?
> > wma3 includes the voice and lossless codecs so wma3dec.c may not be the
> > best name for the pro decoder.
>
> whichever way you prefer ...
>
Renamed then.
> > +/**
> > + * @file libavcodec/wma3dec.c
> > + * @brief wmapro decoder implementation
> > + * Wmapro is an MDCT based codec comparable to wma standard or AAC.
> > + * The decoding therefore consist of the following steps:
> > + * - bitstream decoding
> > + * - reconstruction of per channel data
> >
> > + * - rescaling and requantization
>
> dequantization ?
>
inverse quantization seems to be the term that is used in the AAC specs.
> > +/** current decoder limitations */
> > +#define WMAPRO_MAX_CHANNELS 8 ///< max
> > number of handled channels +#define MAX_SUBFRAMES 32
> > ///< max number of subframes per channel +#define
> > MAX_BANDS 29 ///< max number of
> > scale factor bands +#define MAX_FRAMESIZE 16384
> > ///< maximum compressed frame size +
> > +#define WMAPRO_BLOCK_MAX_BITS 12
> > ///< log2 of max block size +#define WMAPRO_BLOCK_MAX_SIZE (1 <<
> > WMAPRO_BLOCK_MAX_BITS) ///< maximum block size +#define
> > WMAPRO_BLOCK_SIZES (WMAPRO_BLOCK_MAX_BITS - BLOCK_MIN_BITS + 1) ///<
> > possible block sizes
>
> some of these have WMAPRO prefix some not, why?
>
The ones with WMAPRO are wmapro specific. wma.h defines confliciting values
for wma standard.
> > +/**
> > + * @brief decoder context for a single channel
> > + */
> > +typedef struct {
> > + int16_t prev_block_len; ///< length of the
> > previous block
> >
> > + uint8_t transmit_coefs; ///< transmit
> > coefficients
>
> useless comment
>
removed
> > + uint8_t num_subframes; ///< number of
> > subframes
>
> same
>
removed
>
> [...]
>
> > +/**
> > + * @brief main decoder context
> > + */
> > +typedef struct WMA3DecodeContext {
> > + /** generic decoder variables */
> > + AVCodecContext* avctx; ///< codec context
> > for av_log + DSPContext dsp; ///<
> > accelerated dsp functions + uint8_t frame_data[MAX_FRAMESIZE
> > +
> > + FF_INPUT_BUFFER_PADDING_SIZE];///< compressed
> > frame data + MDCTContext mdct_ctx[WMAPRO_BLOCK_SIZES]; ///< MDCT
> > context per block size + DECLARE_ALIGNED_16(float,
> > tmp[WMAPRO_BLOCK_MAX_SIZE]); ///< imdct output buffer
> >
> > + float* windows[WMAPRO_BLOCK_SIZES]; ///< window per
> > block size
>
> size?
>
fixed
> > + int coef_max[2]; ///< max length of
> > vlc codes
>
> unused
>
Thanks, removed.
> > +
> > + /** frame size dependent frame information (set during
> > initialization) */ + uint8_t lossless;
> > ///< lossless mode + uint32_t decode_flags;
> > ///< used compression features + uint8_t len_prefix;
> > ///< frame is prefixed with its length + uint8_t
> > dynamic_range_compression; ///< frame contains DRC data
> >
> > + uint8_t sample_bit_depth; ///< bits per sample
>
> i think the comment makes a better name ...
>
changed
> its also more consistent with the next:
> > + uint16_t samples_per_frame; ///< number of
> > samples to output
> >
> >
> > + uint16_t log2_frame_size; ///< frame size
>
> that comment is not strictly correct but at least its redundant
>
removed
> > + int8_t num_channels; ///< number of
> > channels
>
> also redundant
>
removed
> > + int8_t lfe_channel; ///< lfe channel
> > index + uint8_t max_num_subframes; ///< maximum
> > number of subframes
> >
> > + int8_t num_possible_block_sizes; ///< nb of supported
> > block sizes
>
> num in the variable explained as nb in the comment is really not that clear
> also is it possible or supported, theres a subtile difference
>
fixed
> > + uint16_t min_samples_per_subframe; ///< minimum samples
> > per subframe + int8_t* num_sfb; ///<
> > scale factor bands per block size
> >
> > + int16_t* sfb_offsets; ///< scale factor
> > band offsets
>
> if i understood the code correctly all sfb_offsets are multiplies of 4,
> maybe this should be documented of true
>
comment added
> > + /** generic init */
> > + s->log2_frame_size = av_log2(avctx->block_align*8)+1;
>
> the *8 and +1 can be merged
>
changed
> >
> > + s->dynamic_range_compression = (s->decode_flags & 0x80) >> 7;
>
> the >>7 is unneeded
>
changed
> > + for (x=0;x < MAX_BANDS-1 && sfb_offsets[band-1] <
> > subframe_len;x++) { + int offset = (subframe_len * 2 *
> > critical_freq[x])
> > + / s->avctx->sample_rate + 2;
> > + offset -= offset % 4;
>
> & ~3
>
changed
> > + if ( offset > sfb_offsets[band - 1] ) {
> > + sfb_offsets[band] = offset;
> > + ++band;
>
> sfb_offsets[band++] = offset;
>
changed
> > + /** calculate subwoofer cutoff values */
> > +
> > + for (i=0;i< s->num_possible_block_sizes;i++) {
>
> the empty line after /** looks like a typo, the others dont have it ...
>
changed
> > + int block_size = s->samples_per_frame / (1 << i);
>
> / (1<<i) -> >>i
>
> > + int cutoff = ceil(block_size * 440.0
> > + / (double)s->avctx->sample_rate + 0.5);
>
> this might be identical to: (avoiding floats ...)
> (440*block_size + 3*s->avctx->sample_rate/2 - 1)/s->avctx->sample_rate
>
changed
> > + /** handle the easy case with one constant-sized subframe per
> > channel */ + if (s->max_num_subframes == 1) {
> > + for (c=0;c<s->num_channels;c++) {
> > + s->channel[c].num_subframes = 1;
> > + s->channel[c].subframe_len[0] = s->samples_per_frame;
> >
> > + s->channel[c].channel_len = 0;
>
> redundant
>
removed
> > + }
> > + } else { /** subframe length and number of subframes is not constant
> > */ + /** bits needed for the subframe length */
> > + int subframe_len_bits = 0;
> > + /** first bit indicates if length is zero */
> > + int subframe_len_zero_bit = 0;
> > + /** all channels have the same subframe layout */
> > + int fixed_channel_layout;
>
> i think the comments would be more readable at the right of the variables
> the whole looks like a undifferentiated block ...
>
fixed
> > +
> > + fixed_channel_layout = get_bits1(&s->gb);
> > +
> > + /** calculate subframe len bits */
> > + if (s->lossless) {
> > + subframe_len_bits = av_log2(s->max_num_subframes - 1) + 1;
> > + } else if (s->max_num_subframes == 16) {
> > + subframe_len_zero_bit = 1;
> > + subframe_len_bits = 3;
> > + } else
> > + subframe_len_bits = av_log2(av_log2(s->max_num_subframes)) +
> > 1; +
> > + /** loop until the frame data is split between the subframes */
> >
> > + while (missing_samples > 0) {
>
> the missing_samples variable can be put in the else clause, instead of just
> in the function
>
changed
> > + unsigned int channel_mask = 0;
> >
> > + int min_channel_len = s->samples_per_frame;
>
> the init can be moved in the else below, as its not used in the if
>
changed
> > + log2_subframe_len =
> > + get_bits(&s->gb,subframe_len_bits-1);
> > + ++log2_subframe_len;
>
> +1 can be merged in the previous statement
>
changed
> >
> > + if (!read_channel_mask ||
> > + channel_mask & (1<<channels_for_cur_subframe)) {
>
> if read_channel_mask == 0 then you could set channel_mask= -1 to simpliy
> this
>
changed
> >
> > + char rotation_offset[WMAPRO_MAX_CHANNELS * WMAPRO_MAX_CHANNELS];
>
> int8_t ? uint8_t ?
>
changed
> > + memset(chgroup->decorrelation_matrix,0,
> > + sizeof(float) *s->num_channels * s->num_channels);
> > +
> > + for (i=0;i<chgroup->num_channels * (chgroup->num_channels - 1) >>
> > 1;i++) + rotation_offset[i] = get_bits(&s->gb,6);
> > +
> > + for (i=0;i<chgroup->num_channels;i++) {
> > + if (get_bits1(&s->gb)) {
> > + chgroup->decorrelation_matrix[chgroup->num_channels * i +
> > i]= 1.0; + } else
> > + chgroup->decorrelation_matrix[chgroup->num_channels * i +
> > i]= -1.0; + }
> > +
> > + for (i=1;i<chgroup->num_channels;i++) {
> > + int x;
> > + for (x=0;x<i;x++) {
> > + int y;
> > + float tmp1[WMAPRO_MAX_CHANNELS];
> > + float tmp2[WMAPRO_MAX_CHANNELS];
> > + memcpy(tmp1,
> > + &chgroup->decorrelation_matrix[x *
> > chgroup->num_channels], + sizeof(float) * (i + 1));
> > + memcpy(tmp2,
> > + &chgroup->decorrelation_matrix[i *
> > chgroup->num_channels], + sizeof(float) * (i + 1));
> > + for (y=0;y < i + 1 ; y++) {
> > + float v1 = tmp1[y];
> > + float v2 = tmp2[y];
> > + int n = rotation_offset[offset + x];
> > + float sinv;
> > + float cosv;
> > +
> > + if (n<32) {
> > + sinv = sin64[n];
> > + cosv = sin64[32-n];
> > + } else {
> > + sinv = sin64[64-n];
> > + cosv = -sin64[n-32];
> > + }
> > +
> > + chgroup->decorrelation_matrix[y + x *
> > chgroup->num_channels] = +
> > (v1 * sinv) - (v2 * cosv); +
> > chgroup->decorrelation_matrix[y + i * chgroup->num_channels] = +
> > (v1 * cosv) + (v2 * sinv); +
> > }
>
> the memcpy and tmp1/tmp2 appear unneeded
>
fixed
> >
> > + vals[0] = (symbol_to_vec4[idx] >> 8) >> 4;
> > + vals[1] = (symbol_to_vec4[idx] >> 8) & 0xF;
> > + vals[2] = (symbol_to_vec4[idx] >> 4) & 0xF;
> > + vals[3] = symbol_to_vec4[idx] & 0xF;
>
> vertical align
>
changed
> > + }
> > +
> > + /** decode sign */
> > + for (i=0;i<4;i++) {
> > + if (vals[i]) {
> > + int sign = get_bits1(&s->gb) - 1;
> > + ci->coeffs[cur_coeff] = (vals[i]^sign) - sign;
> > + num_zeros = zero_init;
> > + } else {
> > + /** switch to run level mode when subframe_len / 128
> > zeros + were found in a row */
> > + rl_mode |= (num_zeros & rl_switchmask);
> > + ++num_zeros;
> > + }
> > + ++cur_coeff;
> > + }
>
> i guess you tried
> num_zeros=0
> and rl_mode |= (++num_zeros > X)
> ?
>
Your version is faster. Changed.
> > +static void window(WMA3DecodeContext *s)
> > +{
> > + int i;
> > + for (i=0;i<s->channels_for_cur_subframe;i++) {
> > + int c = s->channel_indexes_for_cur_subframe[i];
> > + float* start;
> > + float* window;
> > + int winlen = s->channel[c].prev_block_len;
> >
> > + start = s->channel[c].coeffs - (winlen >> 1);
>
> declaration and init can be merged
>
changed
>
> [...]
>
> > +/**
> > + *@brief Decode a single subframe (block).
> > + *@param s codec context
> >
> > + *@return 0 if decoding failed, 1 on success
> > + */
>
> in ffmpeg <0 is error and the value is the AVERROR code
>
reworked
> > + /** skip extended header if any */
> > + if (get_bits1(&s->gb)) {
> > + int num_fill_bits;
> > + if (!(num_fill_bits = get_bits(&s->gb,2))) {
> > + num_fill_bits = get_bits(&s->gb,4);
> > + num_fill_bits = get_bits(&s->gb,num_fill_bits) + 1;
> > + }
> > +
> > + if (num_fill_bits >= 0) {
> > + if (get_bits_count(&s->gb) + num_fill_bits >
> > s->num_saved_bits) { +
> > av_log(s->avctx,AV_LOG_ERROR,"invalid number of fill bits\n"); +
> > return 0;
> > + }
> > +
> > + skip_bits_long(&s->gb,num_fill_bits);
> > + }
>
> i think the reuse of num_fill_bits is a little obfuscating the code
>
fixed
Regards
Sascha
-------------- next part --------------
A non-text attachment was scrubbed...
Name: wmapro_try3.patch
Type: text/x-diff
Size: 97531 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090704/f06d6e02/attachment.patch>
More information about the ffmpeg-devel
mailing list