[FFmpeg-devel] [PATCH] wmapro decoder

Sat Jul 4 17:10:52 CEST 2009

Hi Michael,

On Dienstag, 23. Juni 2009, Michael Niedermayer wrote:
> sorry for the late review, but the patch was kinda big ...
>

No problem. I also did not have much time to work on it during the last days.
New patch with float output attached.

> On Sat, Jun 20, 2009 at 02:50:24PM +0200, Sascha Sommer wrote:
> [...]
>
> > > > + *@param s codec context
> > > > + */
> > > > +static void wma_inverse_channel_transform(WMA3DecodeContext *s)
> > > > +{
> > > > +    int i;
> > > > +
> > > > +    for (i=0;i<s->num_chgroups;i++) {
> > > > +
> > > > +        if (s->chgroup[i].transform == 1) {
> > > > +            /** M/S stereo decoding */
> > > > +            int16_t* sfb_offsets = s->cur_sfb_offsets;
> > > > +            float* ch0 = *sfb_offsets + s->channel[0].coeffs;
> > > > +            float* ch1 = *sfb_offsets++ + s->channel[1].coeffs;
> > > > +            const char* tb = s->chgroup[i].transform_band;
> > > > +            const char* tb_end = tb + s->num_bands;
> > > > +
> > > > +            while (tb < tb_end) {
> > > > +                const float* ch0_end = s->channel[0].coeffs +
> > > > +
> > > > FFMIN(*sfb_offsets,s->subframe_len); +                if (*tb++ == 1)
> > > > { +                    while (ch0 < ch0_end) {
> > > > +                        const float v1 = *ch0;
> > > > +                        const float v2 = *ch1;
> > > > +                        *ch0++ = v1 - v2;
> > > > +                        *ch1++ = v1 + v2;
> > > > +                    }
> > > > +                }else{
> > > > +                    while (ch0 < ch0_end) {
> > > > +                        *ch0++ *= 181.0 / 128;
> > > > +                        *ch1++ *= 181.0 / 128;
> > > > +                    }
> > > > +                }
> > > > +                ++sfb_offsets;
> > > > +            }
> > > > +        }else if (s->chgroup[i].transform) {
> > > > +            float data[MAX_CHANNELS];
> > > > +            const int num_channels = s->chgroup[i].num_channels;
> > > > +            float** ch_data = s->chgroup[i].channel_data;
> > > > +            float** ch_end = ch_data + num_channels;
> > > > +            const int8_t* tb = s->chgroup[i].transform_band;
> > > > +            int16_t* sfb;
> > > > +
> > > > +            /** multichannel decorrelation */
> > > > +            for (sfb = s->cur_sfb_offsets ;
> > > > +                sfb < s->cur_sfb_offsets + s->num_bands;sfb++) {
> > > > +                if (*tb++ == 1) {
> > > > +                    int y;
> > > > +                    /** multiply values with the
> > > > decorrelation_matrix */ +                    for
> > > > (y=sfb[0];y<FFMIN(sfb[1], s->subframe_len);y++) { +                  
> > > >      const float* mat =
> > > > s->chgroup[i].decorrelation_matrix; +                        const
> > > > float* data_end= data + num_channels; +                        float*
> > > > data_ptr= data;
> > > > +                        float** ch;
> > > > +
> > > > +                        for (ch = ch_data;ch < ch_end; ch++)
> > > > +                           *data_ptr++ = (*ch)[y];
> > > > +
> > > > +                        for (ch = ch_data; ch < ch_end; ch++) {
> > > > +                            float sum = 0;
> > > > +                            data_ptr = data;
> > > > +                            while (data_ptr < data_end)
> > > > +                                sum += *data_ptr++ * *mat++;
> > > > +
> > > > +                            (*ch)[y] = sum;
> > > > +                        }
> > > > +                    }
> > > > +                }
> > > > +            }
> > > > +        }
> > > > +    }
> > > > +}
> > >
> > > isnt the if() a special case of te else ? if so this can be simplified
> > > and if its faster as is the special case could be limited to the inner
> > > loop
> >
> > I don't understand what you mean here. Transform may either be 0, 1, 2.
> > Of course both 1 and 2 are larger than 0 but they stand for the transform
> > type so keeping all of them on the same level does not sound wrong to me
> > even if the 0 case does nothing. Maybe it should be changed to
> > else if(s->chgroup[i].transform == 2)?
>
> I meant that "isnt M/S a special case of multichannel and cant the
> multicannel code be used for it" ?
>

I don't think so. One would need more conditions in the loop.

> >
> > Btw. should the code be renamed to wmaprodec.c and wmaprodata.h?
> > wma3 includes the voice and lossless codecs so wma3dec.c may not be the
> > best name for the pro decoder.
>
> whichever way you prefer ...
>

Renamed then.

> > +/**
> > + * @file  libavcodec/wma3dec.c
> > + * @brief wmapro decoder implementation
> > + * Wmapro is an MDCT based codec comparable to wma standard or AAC.
> > + * The decoding therefore consist of the following steps:
> > + * - bitstream decoding
> > + * - reconstruction of per channel data
> >
> > + * - rescaling and requantization
>
> dequantization ?
>

inverse quantization seems to be the term that is used in the AAC specs.

> > +/** current decoder limitations */
> > +#define WMAPRO_MAX_CHANNELS    8                             ///< max
> > number of handled channels +#define MAX_SUBFRAMES  32                    
> >                ///< max number of subframes per channel +#define
> > MAX_BANDS      29                                    ///< max number of
> > scale factor bands +#define MAX_FRAMESIZE  16384                         
> >        ///< maximum compressed frame size +
> > +#define WMAPRO_BLOCK_MAX_BITS 12                                        
> >   ///< log2 of max block size +#define WMAPRO_BLOCK_MAX_SIZE (1 <<
> > WMAPRO_BLOCK_MAX_BITS)                 ///< maximum block size +#define
> > WMAPRO_BLOCK_SIZES    (WMAPRO_BLOCK_MAX_BITS - BLOCK_MIN_BITS + 1) ///<
> > possible block sizes
>
> some of these have WMAPRO prefix some not, why?
>

The ones with WMAPRO are wmapro specific. wma.h defines confliciting values 
for wma standard.

> > +/**
> > + * @brief decoder context for a single channel
> > + */
> > +typedef struct {
> > +    int16_t  prev_block_len;                          ///< length of the
> > previous block
> >
> > +    uint8_t  transmit_coefs;                          ///< transmit
> > coefficients
>
> useless comment
>

removed

> > +    uint8_t  num_subframes;                           ///< number of
> > subframes
>
> same
>

removed

>
> [...]
>
> > +/**
> > + * @brief main decoder context
> > + */
> > +typedef struct WMA3DecodeContext {
> > +    /** generic decoder variables */
> > +    AVCodecContext*  avctx;                         ///< codec context
> > for av_log +    DSPContext       dsp;                           ///<
> > accelerated dsp functions +    uint8_t          frame_data[MAX_FRAMESIZE
> > +
> > +                      FF_INPUT_BUFFER_PADDING_SIZE];///< compressed
> > frame data +    MDCTContext      mdct_ctx[WMAPRO_BLOCK_SIZES];  ///< MDCT
> > context per block size +    DECLARE_ALIGNED_16(float,
> > tmp[WMAPRO_BLOCK_MAX_SIZE]); ///< imdct output buffer
> >
> > +    float*           windows[WMAPRO_BLOCK_SIZES];   ///< window per
> > block size
>
> size?
>

fixed

> > +    int              coef_max[2];                   ///< max length of
> > vlc codes
>
> unused
>

Thanks, removed.

> > +
> > +    /** frame size dependent frame information (set during
> > initialization) */ +    uint8_t          lossless;                     
> > ///< lossless mode +    uint32_t         decode_flags;                 
> > ///< used compression features +    uint8_t          len_prefix;         
> >           ///< frame is prefixed with its length +    uint8_t         
> > dynamic_range_compression;     ///< frame contains DRC data
> >
> > +    uint8_t          sample_bit_depth;              ///< bits per sample
>
> i think the comment makes a better name ...
>

changed

> its also more consistent with the next:
> > +    uint16_t         samples_per_frame;             ///< number of
> > samples to output
> >
> >
> > +    uint16_t         log2_frame_size;               ///< frame size
>
> that comment is not strictly correct but at least its redundant
>

removed

> > +    int8_t           num_channels;                  ///< number of
> > channels
>
> also redundant
>

removed

> > +    int8_t           lfe_channel;                   ///< lfe channel
> > index +    uint8_t          max_num_subframes;             ///< maximum
> > number of subframes
> >
> > +    int8_t           num_possible_block_sizes;      ///< nb of supported
> > block sizes
>
> num in the variable explained as nb in the comment is really not that clear
> also is it possible or supported, theres a subtile difference
>

fixed

> > +    uint16_t         min_samples_per_subframe;      ///< minimum samples
> > per subframe +    int8_t*          num_sfb;                       ///<
> > scale factor bands per block size
> >
> > +    int16_t*         sfb_offsets;                   ///< scale factor
> > band offsets
>
> if i understood the code correctly all sfb_offsets are multiplies of 4,
> maybe this should be documented of true
>

comment added

> > +    /** generic init */
> > +    s->log2_frame_size = av_log2(avctx->block_align*8)+1;
>
> the *8 and +1 can be merged
>

changed

> >
> > +    s->dynamic_range_compression = (s->decode_flags & 0x80) >> 7;
>
> the >>7 is unneeded
>

changed

> > +        for (x=0;x < MAX_BANDS-1 && sfb_offsets[band-1] <
> > subframe_len;x++) { +            int offset = (subframe_len * 2 *
> > critical_freq[x])
> > +                          / s->avctx->sample_rate + 2;
> > +            offset -= offset % 4;
>
> & ~3
>

changed

> > +            if ( offset > sfb_offsets[band - 1] ) {
> > +                sfb_offsets[band] = offset;
> > +                ++band;
>
> sfb_offsets[band++] = offset;
>

changed

> > +    /** calculate subwoofer cutoff values */
> > +
> > +    for (i=0;i< s->num_possible_block_sizes;i++) {
>
> the empty line after /** looks like a typo, the others dont have it ...
>

changed

> > +        int block_size = s->samples_per_frame / (1 << i);
>
> / (1<<i) -> >>i
>
> > +        int cutoff = ceil(block_size * 440.0
> > +                          / (double)s->avctx->sample_rate + 0.5);
>
> this might be identical to: (avoiding floats ...)
> (440*block_size + 3*s->avctx->sample_rate/2 - 1)/s->avctx->sample_rate
>

changed

> > +    /** handle the easy case with one constant-sized subframe per
> > channel */ +    if (s->max_num_subframes == 1) {
> > +        for (c=0;c<s->num_channels;c++) {
> > +            s->channel[c].num_subframes = 1;
> > +            s->channel[c].subframe_len[0] = s->samples_per_frame;
> >
> > +            s->channel[c].channel_len = 0;
>
> redundant
>

removed

> > +        }
> > +    } else { /** subframe length and number of subframes is not constant
> > */ +        /** bits needed for the subframe length */
> > +        int subframe_len_bits = 0;
> > +        /** first bit indicates if length is zero */
> > +        int subframe_len_zero_bit = 0;
> > +        /** all channels have the same subframe layout */
> > +        int fixed_channel_layout;
>
> i think the comments would be more readable at the right of the variables
> the whole looks like a undifferentiated block ...
>

fixed

> > +
> > +        fixed_channel_layout = get_bits1(&s->gb);
> > +
> > +        /** calculate subframe len bits */
> > +        if (s->lossless) {
> > +            subframe_len_bits = av_log2(s->max_num_subframes - 1) + 1;
> > +        } else if (s->max_num_subframes == 16) {
> > +            subframe_len_zero_bit = 1;
> > +            subframe_len_bits = 3;
> > +        } else
> > +            subframe_len_bits = av_log2(av_log2(s->max_num_subframes)) +
> > 1; +
> > +        /** loop until the frame data is split between the subframes */
> >
> > +        while (missing_samples > 0) {
>
> the missing_samples variable can be put in the else clause, instead of just
> in the function
>

changed

> > +            unsigned int channel_mask = 0;
> >
> > +            int min_channel_len = s->samples_per_frame;
>
> the init can be moved in the else below, as its not used in the if
>

changed

> > +                        log2_subframe_len =
> > +                            get_bits(&s->gb,subframe_len_bits-1);
> > +                        ++log2_subframe_len;
>
> +1 can be merged in the previous statement
>

changed

> >
> > +                    if (!read_channel_mask ||
> > +                       channel_mask & (1<<channels_for_cur_subframe)) {
>
> if read_channel_mask == 0 then you could set channel_mask= -1 to simpliy
> this
>

changed

> >
> > +    char rotation_offset[WMAPRO_MAX_CHANNELS * WMAPRO_MAX_CHANNELS];
>
> int8_t ? uint8_t ?
>

changed

> > +    memset(chgroup->decorrelation_matrix,0,
> > +           sizeof(float) *s->num_channels * s->num_channels);
> > +
> > +    for (i=0;i<chgroup->num_channels  * (chgroup->num_channels - 1) >>
> > 1;i++) +        rotation_offset[i] = get_bits(&s->gb,6);
> > +
> > +    for (i=0;i<chgroup->num_channels;i++) {
> > +        if (get_bits1(&s->gb)) {
> > +            chgroup->decorrelation_matrix[chgroup->num_channels * i +
> > i]=  1.0; +        } else
> > +            chgroup->decorrelation_matrix[chgroup->num_channels * i +
> > i]= -1.0; +    }
> > +
> > +    for (i=1;i<chgroup->num_channels;i++) {
> > +        int x;
> > +        for (x=0;x<i;x++) {
> > +            int y;
> > +            float tmp1[WMAPRO_MAX_CHANNELS];
> > +            float tmp2[WMAPRO_MAX_CHANNELS];
> > +            memcpy(tmp1,
> > +                   &chgroup->decorrelation_matrix[x *
> > chgroup->num_channels], +                   sizeof(float) * (i + 1));
> > +            memcpy(tmp2,
> > +                   &chgroup->decorrelation_matrix[i *
> > chgroup->num_channels], +                   sizeof(float) * (i + 1));
> > +            for (y=0;y < i + 1 ; y++) {
> > +                float v1 = tmp1[y];
> > +                float v2 = tmp2[y];
> > +                int n = rotation_offset[offset + x];
> > +                float sinv;
> > +                float cosv;
> > +
> > +                if (n<32) {
> > +                    sinv = sin64[n];
> > +                    cosv = sin64[32-n];
> > +                } else {
> > +                    sinv = sin64[64-n];
> > +                    cosv = -sin64[n-32];
> > +                }
> > +
> > +                chgroup->decorrelation_matrix[y + x *
> > chgroup->num_channels] = +                                              
> > (v1 * sinv) - (v2 * cosv); +               
> > chgroup->decorrelation_matrix[y + i * chgroup->num_channels] = +         
> >                                      (v1 * cosv) + (v2 * sinv); +        
> >    }
>
> the memcpy and tmp1/tmp2 appear unneeded
>

fixed

> >
> > +             vals[0] = (symbol_to_vec4[idx] >> 8) >> 4;
> > +             vals[1] = (symbol_to_vec4[idx] >> 8) & 0xF;
> > +             vals[2] = (symbol_to_vec4[idx] >> 4) & 0xF;
> > +             vals[3] = symbol_to_vec4[idx] & 0xF;
>
> vertical align
>

changed

> > +        }
> > +
> > +        /** decode sign */
> > +        for (i=0;i<4;i++) {
> > +            if (vals[i]) {
> > +                int sign = get_bits1(&s->gb) - 1;
> > +                ci->coeffs[cur_coeff] = (vals[i]^sign) - sign;
> > +                num_zeros = zero_init;
> > +            } else {
> > +                /** switch to run level mode when subframe_len / 128
> > zeros +                   were found in a row */
> > +                rl_mode |= (num_zeros & rl_switchmask);
> > +                ++num_zeros;
> > +            }
> > +            ++cur_coeff;
> > +        }
>
> i guess you tried
> num_zeros=0
> and rl_mode |= (++num_zeros > X)
> ?
>

Your version is faster. Changed.

> > +static void window(WMA3DecodeContext *s)
> > +{
> > +    int i;
> > +    for (i=0;i<s->channels_for_cur_subframe;i++) {
> > +        int c = s->channel_indexes_for_cur_subframe[i];
> > +        float* start;
> > +        float* window;
> > +        int winlen = s->channel[c].prev_block_len;
> >
> > +        start = s->channel[c].coeffs - (winlen >> 1);
>
> declaration and init can be merged
>

changed

>
> [...]
>
> > +/**
> > + *@brief Decode a single subframe (block).
> > + *@param s codec context
> >
> > + *@return 0 if decoding failed, 1 on success
> > + */
>
> in ffmpeg <0 is error and the value is the AVERROR code
>

reworked

> > +    /** skip extended header if any */
> > +    if (get_bits1(&s->gb)) {
> > +        int num_fill_bits;
> > +        if (!(num_fill_bits = get_bits(&s->gb,2))) {
> > +            num_fill_bits = get_bits(&s->gb,4);
> > +            num_fill_bits = get_bits(&s->gb,num_fill_bits) + 1;
> > +        }
> > +
> > +        if (num_fill_bits >= 0) {
> > +            if (get_bits_count(&s->gb) + num_fill_bits >
> > s->num_saved_bits) { +               
> > av_log(s->avctx,AV_LOG_ERROR,"invalid number of fill bits\n"); +         
> >       return 0;
> > +            }
> > +
> > +            skip_bits_long(&s->gb,num_fill_bits);
> > +        }
>
> i think the reuse of num_fill_bits is a little obfuscating the code
>

fixed

Regards

Sascha

-------------- next part --------------
A non-text attachment was scrubbed...
Name: wmapro_try3.patch
Type: text/x-diff
Size: 97531 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090704/f06d6e02/attachment.patch>