[FFmpeg-devel] [PATCH] WMA Voice decoder

Mon Feb 1 03:46:43 CET 2010

Hi,

all comments taken care of unless said otherwise below:

On Sat, Jan 30, 2010 at 6:48 PM, Reimar D?ffinger
<Reimar.Doeffinger at gmx.de> wrote:
> On Sat, Jan 30, 2010 at 05:06:45PM -0500, Ronald S. Bultje wrote:
>> +static const struct frame_type_desc {
>> + ? ?short ? n_blocks, ? ? ///< amount of blocks per frame
>> + ? ? ? ? ? ? ? ? ? ? ? ? ?///< (each block contains 160/#n_blocks samples)
>> + ? ? ? ? ? ?log_n_blocks, ///< log2(n_blocks)
>> + ? ? ? ? ? ?acb_type, ? ? ///< Adaptive codebook type in frame/block:
>> + ? ? ? ? ? ? ? ? ? ? ? ? ?///< - 0: no adaptive codebook (only hardcoded fixed)
>> + ? ? ? ? ? ? ? ? ? ? ? ? ?///< - 1: adaptive codebook with per-frame pitch,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ?///< - 2: adaptive codebook with per-block pitch
>> + ? ? ? ? ? ?fcb_type, ? ? ///< Fixed codebook type in frame/block:
>> + ? ? ? ? ? ? ? ? ? ? ? ? ?///< - ? 0: hardcoded codebook, per-frame gain,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ?///< ? ? ? ?is used as comfort noise during silence,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ?///< - ? 1: hardcoded codebook, per-block gain,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ?///< - ? 2: pitch-adaptive window (AW) pulse signal,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ?///< - 3-5: innovation (fixed) codebook pulses with
>> + ? ? ? ? ? ? ? ? ? ? ? ? ?///< ? ? ? ?different combinations of single/double
>> + ? ? ? ? ? ? ? ? ? ? ? ? ?///< ? ? ? ?pulses (see #dbl_pulses)
>> + ? ? ? ? ? ?dbl_pulses, ? ///< how many pulse vectors have pulse pairs
>> + ? ? ? ? ? ? ? ? ? ? ? ? ?///< (rather than just one single pulse)
>> + ? ? ? ? ? ? ? ? ? ? ? ? ?///< only if #fcb_type >= 3 && <= 5
>> + ? ? ? ? ? ?frame_size; ? ///< the amount of bits that make up the block
>> + ? ? ? ? ? ? ? ? ? ? ? ? ?///< data (per frame)
>
> "short"? Also specifying the type only once is ugly.
> Also if you use frame_size with a shift they all can be uint8_t
> (though probably it's better to just "waste" those 34 bytes and
> leave frame_size 16 bits, but the others don't need to be.

OK, so I changed this to be holdable in 32bits per frame type, maybe
that's a bit too much just let me know how far I should go.

>> + ? ?memset(cntr, ? ? ?0, sizeof(cntr));
>> + ? ?for (n = 0; n < 17; n++) {
>> + ? ? ? ?res = get_bits(gb, 3);
>> + ? ? ? ?if (cntr[res] >= 3 + (res == 7))
>> + ? ? ? ? ? ?return -1;
>
> What is the reason this can't just be cntr[res] > 3?
> (assuming performance matters, documenting can't hurt
> either way).

It might invalidly mark invalid bitstreams as valid. I guess we don't
care, so changed.

>> + ? ?if (flags & 0x1000) {
>
> A name for that wouldn't hurt.

I wasn't sure what you meant here. I added a comment indicating what
the flag means, but I might have misunderstood you.

>> +static int get_vbm_bits(GetBitContext *gb)
>> +{
>> + ? ?int n, res;
>> +
>> + ? ?for (n = 0; ; n++) {
>> + ? ? ? ?res = get_bits(gb, 2);
>> + ? ? ? ?if (res < 3 || n == 6 /** don't increase n to 7 */)
>> + ? ? ? ? ? ?break;
>> + ? ?}
>
> So
> for (n = -1; n < 7 && res < 3; n++)
> or maybe better
> for (n = 0; n < 8 && res < 3; n++) {
> ...
> }
> return 3 * (n - 1) + res;
> or whatever
> Also don't use doxy-style comments for things that doxygen can not
> make any sense of.

I removed the whole function and use get_vlc2() now.

>> + ? ? ? ?for (m = 0; m < num; m++)
>> + ? ? ? ? ? ?lsps[m] += base_q[n] + mul_q[n] * table[m + values[n] * num];
>
> I consider it better to extract constant stuff like values[n]*num outside the loop.
> Particularly since due to aliasing rules the compiler might not be able to
> to pull e.g. the base_q[n] out without that (hm, I admit I don't know for sure
> if the const might be enough for that).

I did this for the values[n] * num. For the base_q[n], it seems
overkill, I mean, it's marked const so the compiler should know
better. I think expanding base_q[n] outside the loop is too little
honour for gcc (maybe deserved, but still).

>> + ? ?for (n = 0; n < 10; n++) {
>> + ? ? ? ?a1[n] ? ? ?= ipol_tab[interpol][0][n] * (old[n] - i_lsps[n]) +
>> + ? ? ? ? ? ? ? ? ? ? i_lsps[n];
>> + ? ? ? ?a1[10 + n] = ipol_tab[interpol][1][n] * (old[n] - i_lsps[n]) +
>> + ? ? ? ? ? ? ? ? ? ? i_lsps[n];
>
> The subtraction is duplicated.
> Also if speed matters it might make sense to check if gcc handles a flattened
> ipol_tab better (i.e. ipol_tab[interpol][10 + n] instead of ipol_tab[interpol][1][n]).

(I didn't test flattening.)

>> + ? ?dequant_lsps(&lsps[5], ?5, &v[2], &vec_sizes[2], 2,
>> + ? ? ? ? ? ? ? ? ff_wmavoice_dq_lsp16i2, &mul_lsf[2], &base_lsf[2]);
>
> I think lsps + 5, v + 2 etc. is the more common style in FFmpeg.

Uhm, so that will be a huge change, I use &..[off] inistead of ..+off
everywhere in my code, not just here but also in rdt.c etc. Is that a
big deal?

>> + ? ?static const uint8_t start_offset[94] = {
>> + ? ? ? ? ?0, ? 2, ? 4, ? 6, ? 8, ?10, ?12, ?14, ?16, ?18, ?20, ?22,
>> + ? ? ? ? 24, ?26, ?29, ?28, ?30, ?31, ?32, ?33, ?34, ?35, ?36, ?37,
>> + ? ? ? ? 38, ?39, ?40, ?41, ?42, ?43, ?44, ?46, ?48, ?50, ?52, ?54,
>> + ? ? ? ? 56, ?58, ?60, ?62, ?64, ?66, ?68, ?70, ?72, ?74, ?76, ?78,
>> + ? ? ? ? 80, ?82, ?84, ?86, ?88, ?90, ?92, ?94, ?96, ?98, 100, 102,
>> + ? ? ? ?104, 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126,
>> + ? ? ? ?128, 130, 132, 134, 136, 138, 140, 142, 144, 146, 148, 150,
>> + ? ? ? ?152, 154, 156, 158, 160, 162, 164, 166, 168, 170
[..]
> And is that one non-monotonous value really correct?

Yes, it's really correct. I was surprised by that also.

>> + ? ? ? ? ? ?for (idx = 0; idx < MAX_FRAMESIZE / 2; idx++)
>> + ? ? ? ? ? ? ? ?if (BIT_IS_SET(idx)) break;
>
> But that looks about like the most inefficient way to scan 80 bits for
> a non-zero one ever conceived.

OK, so what do I use then? (I really don't know. :-).)

>> + ? ? ? ?if (s->aw_pitch_range == 24) { ///< 3 pulses, 1:sign + 3:index each
>> + ? ? ? ?} else { ///< 4 pulses, 1:sign + 2:index each
>
> I guess you can find all of those on your own.

???

>> + ? ? ? ? ? ?while (fcb->x[fcb->n] < 0)
>> + ? ? ? ? ? ? ? ?fcb->x[fcb->n] += fcb->pitch_lag;
>
> What are the values here? This might be a particularly slow way to calculate
> if (fcb->x[fcb->n] < 0)
> ? ?fcb->x[fcb->n] = (fcb->x[fcb->n] % fcb->pitch_lag) + fcb->pitch_lag;

They could be 1-2 pitch values below zero, so -10, -20 or so.

>> + ? ?for (n = 0; n < size; n++)
>> + ? ? ? ?excitation[n] = ff_wmavoice_std_codebook[r_idx + n] * gain;
>
> No dsputil function for that I guess?

Maybe, but optimization for dsputil will come after commit, not
before. I haven't looked at dsputil at all so far.

>> + ? ? ? ? ? ?float pulse = get_bits1(gb) ? 1.0 : -1.0;
>
> I've seen that a lot. Maybe worth optimizing?

Same as above (also goes for AMR, SIPR).

>> + ? ? ? ? ? ?ff_acelp_interpolatef(&excitation[n], &excitation[n - pitch],
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ff_wmavoice_ipol1_coeffs, 17,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?(idx >= 0) | (abs(idx) << 1), 9, 1);
>
> Maybe FFABS? Also that kind of thing is quite common, there may be a function for it.

it's hard to test the speed implifcation of that because it's so
insignificant. I think at some point I said we should test abs() vs.
FFABS(), there's no clear guideline for that right now.

I couldn't find a function, but I may just be inexperienced in finding
them. :-).

All other comments fixed as suggested.

Ronald