[FFmpeg-devel] [PATCH] G.729 Fixed-codebook vector decoding

Sun Jun 28 21:09:12 CEST 2009

2009/6/28 Vladimir Voroshilov <voroshil at gmail.com>:
> 2009/6/28 Vladimir Voroshilov <voroshil at gmail.com>:
>> 2009/6/28 Vladimir Voroshilov <voroshil at gmail.com>:
>>> 2009/6/28 Michael Niedermayer <michaelni at gmx.at>:
>>>> On Sat, Jun 27, 2009 at 09:27:23AM +0700, Vladimir Voroshilov wrote:
>>>>> 2009/6/27 Michael Niedermayer <michaelni at gmx.at>:
>>>>> > On Sat, Jun 27, 2009 at 02:05:27AM +0700, Vladimir Voroshilov wrote:
>>>>> >> Memset is required, since vector should contain only several non-zero pulses
>>>>> >
>>>>> > if the thing is sparse, memset could maybe be replaced by just setting
>>>>> > the non zero values to zero, in that sense many operationson on sparse
>>>>> > vectors can be done more efficiently than working on an array of zeros
>>>>> > with one non zero value, which if it applies here means you have to
>>>>> > rewrite the related code to be more efficient
>>>>>
>>>>> Hmm..
>>>>>
>>>>> Does you mean "invert" vector (replace zero with non-zero and vice versa) ?
>>>>> What values whould contain non-zero parts in your suggestion?
>>>>>
>>>>> Here is how i uderstand fixed-codebook vector generation.
>>>>>
>>>>> Fixed-codebook vector defines where four (or three, depends on mode)
>>>>> pulses of sound are placed in
>>>>> current frame (which is zero at this stage). Afaik, fixed-codebook
>>>>> vector defines signal which in "new"
>>>>> in current frame and is not continuation oprevious frame's sound.
>>>>> Routine sets 1 at this positions, setting all other to 0.
>>>>>
>>>>> "1" related to pulse of sound, "0" means silence.
>>>>>
>>>>> Later this vector will be weightly summed with excitation signal.
>>>>>
>>>>> Thus, either your suggestion is deep math or heavy optimization which
>>>>> i can't understand.
>>>>
>>>> adding a sparse vector with 4 non zero pulses to another vector is a O(1)
>>>> operation, your code needs O(n) that is not acceptable. (that is based on
>>>> your description above)
>>>>
>>>> Its simply that you add n-4 values with n-4 zeros and 4 values with
>>>> 4 non zero values, thats not deep math
>>>
>>> Ok. Now i see what is the battle for. O(1) is obviously better even for me.
>>> But, i'm afraid O(1) implementation is much complex task.
>>>
>>> Well. Fixed-vector (called fc and fc_v in severral places) decribes contribution
>>> of non-periodic signal introduced in current frame.
>>>
>>> Initially it contains few (fixed number of, depending on mode)
>>> non-zero pulses and is a sparse vector.
>>> This pulses generated ?fixed-codebook indexes and signs. The latter
>>> either are decoded from frame or
>>> initialized with random values in case of frame erasure.
>>>
>>> Working with sparse vector at this stage is clean for me (small array
>>> of non-zero values' indexes/signs only, as i see).
>>>
>>> But already applying simple harmonic filter:
>>>
>>> for(i=pitch_delay; i<subframe_size; i++)
>>> fc[i]+=gain_pitch * fc[i-pitch_delay]
>>>
>>> becames much complex task for such sparse vector.
>>>
>>> Later fixed-codebook vector is used for decoding gain_code.
>>> There is scalarproduct_int16, which is called with fc_v. Of course i
>>> can compute scalarproduct
>>> on sparse vector too, but mmx optiomization can not be used.
>>>
>>> According to above, i see only two troubles (for me):
>>> 1. Complex code with harmonic filter. Sparse case will require either
>>> "if"s or some sorting of indexes.
>>> This is because accumulation is proceeded in-place and each step of
>>> loop affects next elements of array.
>>> 2. Inpossible reusage of existing MMX optiomizations. Perhaps
>>> calculation of scalarproduct can be merged with harmonic filter.
>>>
>>> How to implement "1" i didn't yet know and will be thankful for advice.
>>
>> I tried to make some complexity aproximations of sparse vector usage
>> in harmonic filter.
>> Assuming there are four pulses and pitch delay can be between 20 and 149.
>> subframe size is 80.
>>
>> 0. Simplest case. pitch_delay>=80. filter is not applied.
>> 0.1 non-sparse vector.
>> Requires zero-initialized array of length 80.
>> 80-pitch_delay (60 in worst case) number of "+" and "*" and one memset
>> of size 80.
>> Adding check for non-zero will lead to 60 if's and 12 "+" and "*" in worst case.
>>
>> 1. sparse vector. pitch_delay<80
>> Requires array of size 16 with indexes and values (total 32)
>>
>> All pulses are inside first one chunk of size 80/pitch_delay.
>> Filter will do int(80/pitch_delay)-1 iterations (three in worst case)
>> with four "*" and "+" inside.
>> In worst case this will introduce 12 additional non-zero pulses.
>>
>> Pulses are not intersects in this case.
>>
>> 2. Pulses can be in different chunks of size 80/pitch_delay.
>> In this code filter have to check whether non zero values are
>> intersects in different chunks or not.
>> If not, complexity is the same as for previous case.
>> if they are, early pulses should change values of the next which they
>> intersects with.
>> This will require either sorting of index/value array at each
>> iteration with next item's index check or
>> searching this arrays to find intersects.
>> Both cases will require (in worst case) 12-1 iterations of inner loop.
>> which leads to 11*12=132 "if"s and assignments.
>>
>> According to above i don't see any advantage of introducing sparse vector.
>> If i'm wrong or missing something, please fix me.
>>
>> P.S. Adding scalar product calculation to harmonic filter will remove
>> one extra loop, but will prevent using mmx optimization.
>
> Please disregard this mail, since calculations are wrong (subframe
> size os 40, not 80).
>
> Any ideas about sparse vector are still welcome.
>

Further investigation about sparse fixed-codebook vector.

Where this vector is used:
1. is constructed (creating several non-pulses in zero array)

2. harminoc filter:
for(i=pitch_delay; i<subframe_size;i++)
fc[i] += fc[i-pitch_delay] * gp;

3. gain_code decoding:
sum = scalarproduct(fc);

4. attenuating excitation signal
for(i=0;i<subframe_size;i++)
exc[i] = exc[i] * gp + fc[i] *gc;

(1),(3) can be easyli implemented with sparse vector.
(2) implementation is harder, but  looks possible even for me.

But compare code:

    for (i=0; i<fc.count; i++) {
        int new_index = fc.index[i] + pitch_delay;
        if (new_index < SUBFRAME_SIZE) {
            for(j=1; j<fc.count; j++)
                if (new_index == fc.index[j])
                    break;
            if (j == fc.count) {
                fc.index[out->count] = new_index;
                fc.value[out->count] = 0;
                fc.count++;
            }
            fc.value[j] = av_clip_int16(((fc.value[j] << shift) +
fc.value[i] * gain_pitch + rounder)>>shift);
        }
    }

fc.count is less or equal 8 in above code.

and:

for(i=pitch_delay; i<40;i++)
    fc[i] += fc[i-pitch_delay] * gp;

(4) i can't see effective implementation with sparse vector.
because elements corresponding sparse vector elemens are calculated as
the only way i see is:

for(i=0; i<40; i++)
exc[i] *=gp;
for(i=0;i<fc.count; i++)
exc[fc.index[i]] += fc.value[i] * gc

vs.

for(i=0;i<40;i++)
exc[i] = exc[i] * gp + fc[i] *gc;

I prefer second (non-sparse) form. If you think that first is better,
even if harder to understand (at least for me),
i'll change code.
I'm not also sure if this sparse-vector code can be reused by other
celp-based codecs.
The non-sparse form uses more usual primitives like "scalarproduct of
vectors", "weighted sum of vectors" and so on.

I hope i've given you enough arguments to protect  my point of view.

-- 
Regards,
Vladimir Voroshilov     mailto:voroshil at gmail.com
JID: voroshil at gmail.com, voroshil at jabber.ru
ICQ: 95587719