[FFmpeg-devel] [PATCH] G.729 Fixed-codebook vector decoding

Mon Jun 29 00:07:01 CEST 2009

2009/6/29 Michael Niedermayer <michaelni at gmx.at>:
> On Mon, Jun 29, 2009 at 02:09:12AM +0700, Vladimir Voroshilov wrote:
>> 2009/6/28 Vladimir Voroshilov <voroshil at gmail.com>:
>> > 2009/6/28 Vladimir Voroshilov <voroshil at gmail.com>:
>> >> 2009/6/28 Vladimir Voroshilov <voroshil at gmail.com>:
> [...]
>> > Please disregard this mail, since calculations are wrong (subframe
>> > size os 40, not 80).
>> >
>> > Any ideas about sparse vector are still welcome.
>> >
>>
>> Further investigation about sparse fixed-codebook vector.
>>
>> Where this vector is used:
>> 1. is constructed (creating several non-pulses in zero array)
>>
>> 2. harminoc filter:
>> for(i=pitch_delay; i<subframe_size;i++)
>> fc[i] += fc[i-pitch_delay] * gp;
>>
>> 3. gain_code decoding:
>> sum = scalarproduct(fc);
>>
>> 4. attenuating excitation signal
>> for(i=0;i<subframe_size;i++)
>> exc[i] = exc[i] * gp + fc[i] *gc;
>>
>> (1),(3) can be easyli implemented with sparse vector.
>> (2) implementation is harder, but ?looks possible even for me.
>>
>> But compare code:
>>
>> ? ? for (i=0; i<fc.count; i++) {
>> ? ? ? ? int new_index = fc.index[i] + pitch_delay;
>> ? ? ? ? if (new_index < SUBFRAME_SIZE) {
>> ? ? ? ? ? ? for(j=1; j<fc.count; j++)
>> ? ? ? ? ? ? ? ? if (new_index == fc.index[j])
>> ? ? ? ? ? ? ? ? ? ? break;
>> ? ? ? ? ? ? if (j == fc.count) {
>> ? ? ? ? ? ? ? ? fc.index[out->count] = new_index;
>> ? ? ? ? ? ? ? ? fc.value[out->count] = 0;
>> ? ? ? ? ? ? ? ? fc.count++;
>> ? ? ? ? ? ? }
>> ? ? ? ? ? ? fc.value[j] = av_clip_int16(((fc.value[j] << shift) +
>> fc.value[i] * gain_pitch + rounder)>>shift);
>> ? ? ? ? }
>> ? ? }
>>
>> fc.count is less or equal 8 in above code.
>
> have you tested this code?

Just for case when index[i] are already sorted (see attached patch)
and with knowledge that
minimum value of pitch_delay is equal to subframe_size/2. Without this
restrictions code is not
equal to original. compress_sparse routine in the patch implicitly
sorts pulses in ascending order.

I'm afraid  i can't generate vector from pulse_indexes/pulse_signs
with ordered indexes without explicit sorting.

>> and:
>>
>> for(i=pitch_delay; i<40;i++)
>> ? ? fc[i] += fc[i-pitch_delay] * gp;
>>
>>
>> (4) i can't see effective implementation with sparse vector.
>> because elements corresponding sparse vector elemens are calculated as
>> the only way i see is:
>>
>> for(i=0; i<40; i++)
>> exc[i] *=gp;
>> for(i=0;i<fc.count; i++)
>> exc[fc.index[i]] += fc.value[i] * gc
>>
>> vs.
>>
>> for(i=0;i<40;i++)
>> exc[i] = exc[i] * gp + fc[i] *gc;
>>
>>
>> I prefer second (non-sparse) form. If you think that first is better,
>> even if harder to understand (at least for me),
>> i'll change code.
>
> what is the speed difference of the 2 ?

5920 dezicycles in harmonic_filter, 1 runs, 0 skips
3520 dezicycles in harmonic_filter, 2 runs, 0 skips
2920 dezicycles in harmonic_filter, 4 runs, 0 skips
2120 dezicycles in harmonic_filter, 8 runs, 0 skips
1900 dezicycles in harmonic_filter, 16 runs, 0 skips
1795 dezicycles in harmonic_filter, 32 runs, 0 skips
1702 dezicycles in harmonic_filter, 64 runs, 0 skips
1932 dezicycles in harmonic_filter, 128 runs, 0 skips
1778 dezicycles in harmonic_filter, 256 runs, 0 skips
1628 dezicycles in harmonic_filter, 512 runs, 0 skips
1605 dezicycles in harmonic_filter, 1024 runs, 0 skips
1561 dezicycles in harmonic_filter, 2048 runs, 0 skips
1583 dezicycles in harmonic_filter, 4096 runs, 0 skips

vs.

1920 dezicycles in harmonic filter(sparse), 1 runs, 0 skips
1600 dezicycles in harmonic filter(sparse), 2 runs, 0 skips
2320 dezicycles in harmonic filter(sparse), 4 runs, 0 skips
1900 dezicycles in harmonic filter(sparse), 8 runs, 0 skips
1740 dezicycles in harmonic filter(sparse), 16 runs, 0 skips
1655 dezicycles in harmonic filter(sparse), 32 runs, 0 skips
1695 dezicycles in harmonic filter(sparse), 64 runs, 0 skips
1810 dezicycles in harmonic filter(sparse), 128 runs, 0 skips
1675 dezicycles in harmonic filter(sparse), 256 runs, 0 skips
1531 dezicycles in harmonic filter(sparse), 512 runs, 0 skips
1490 dezicycles in harmonic filter(sparse), 1024 runs, 0 skips
1447 dezicycles in harmonic filter(sparse), 2048 runs, 0 skips
1460 dezicycles in harmonic filter(sparse), 4096 runs, 0 skips

> also, are you sure the *gp can not be merged into a prior step?

If i it is possible to generate pulses in proper order, it can
(generation and harmonic filter wil be merged
in this case). otherwise - not. I'm afraid i cant get ordering without
explicit sorting.

>> I'm not also sure if this sparse-vector code can be reused by other
>> celp-based codecs.
>> The non-sparse form uses more usual primitives like "scalarproduct of
>> vectors", "weighted sum of vectors" and so on.
>>
>> I hope i've given you enough arguments to protect ?my point of view.
>
> no, your argument of 2 extra lines of code being bad is a little weak,
> the rest i could not really follow and wont bother with before you confirm
> that the code has been tested and thus doesnt follow under your prior
> comment of "Please disregard this mail, since calculations are wrong"

My sentence about disregarding should be applied only
to complexity estimation when subframe_size=80 was mentioned. The  later mail
are not covered by it.

Anyway, I've tested it via attached dirty proof-of-concept code.
The result is the same.

I didn't test (1),(3) and (4) yet.

-- 
Regards,
Vladimir Voroshilov     mailto:voroshil at gmail.com
JID: voroshil at gmail.com, voroshil at jabber.ru
ICQ: 95587719
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sparse.diff
Type: text/x-diff
Size: 4840 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090629/3ba2b19e/attachment.diff>