[FFmpeg-devel] [PATCH] G.729A (floating-point) decoder and ACT demuxer

Vladimir Voroshilov voroshil
Wed Feb 27 20:10:46 CET 2008


Here is first answer to first review .

On Sun, Feb 24, 2008 at 8:06 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Sat, Feb 23, 2008 at 11:08:25PM +0600, Vladimir Voroshilov wrote:
>

[...]

>  First review below, this will need quite a few more patch resend-review passes
>  though.

Sure.

>  > +Validation results on ITU test vectors for Fixed-point G729.A:
>  > +
>  > +   All vectors are not bit-exactly equal (but reference C code for
>  > +   floating point implementation fails to provide bit-exact equal files
>  > +   while fixed-point code passes all tests). Thus test was based
>  > +   on presence of hearable artefacts/differences
>
>  PSNR / standard deviation would be nice instead of just
>  "hearable artefacts/differences"

tiny_psnr over results of decoding ITU's test vectors:

My code with float tables (from AnnexC) vs AnnexA (fixed-point):

ALGTHM.BIT stddev:156.93 PSNR:52.40 bytes:4096
ERASURE.BIT stddev:772.37 PSNR:38.56 bytes:47104
FIXED.BIT stddev:  3.71 PSNR:84.91 bytes:18432
LSP.BIT stddev: 34.91 PSNR:65.46 bytes:356352
OVERFLOW.BIT stddev:17843.69 PSNR:11.29 bytes:61440
PARITY.BIT stddev: 40.32 PSNR:64.21 bytes:47104
PITCH.BIT stddev: 79.25 PSNR:58.34 bytes:292864
SPEECH.BIT stddev: 35.97 PSNR:65.20 bytes:598016
TAME.BIT stddev:207.70 PSNR:49.97 bytes:20480
TEST.BIT stddev: 38.46 PSNR:64.62 bytes:26624

My code with float tables from (from AnnexC)  vs AnnexC (floating-point):

ALGTHM.BIT stddev: 33.88 PSNR:65.72 bytes:4096
ERASURE.BIT stddev: 44.68 PSNR:63.32 bytes:47104
FIXED.BIT stddev:  2.63 PSNR:87.90 bytes:18432
LSP.BIT stddev:  9.97 PSNR:76.34 bytes:356352
OVERFLOW.BIT stddev:101.51 PSNR:56.19 bytes:61440
PARITY.BIT stddev: 17.76 PSNR:71.33 bytes:47104
PITCH.BIT stddev: 70.31 PSNR:59.38 bytes:292864
SPEECH.BIT stddev: 16.58 PSNR:71.92 bytes:598016
TAME.BIT stddev: 52.17 PSNR:61.97 bytes:20480
TEST.BIT stddev: 20.62 PSNR:70.03 bytes:26624

My code with fixed-point tables vs vs AnnexA (fixed-point):

ALGTHM.BIT stddev:2287.52 PSNR:29.13 bytes:4096
ERASURE.BIT stddev:823.40 PSNR:38.01 bytes:47104
FIXED.BIT stddev: 66.35 PSNR:59.88 bytes:18432
LSP.BIT stddev: 79.64 PSNR:58.30 bytes:356352
OVERFLOW.BIT stddev:18501.70 PSNR:10.98 bytes:61440
PARITY.BIT stddev:420.79 PSNR:43.84 bytes:47104
PITCH.BIT stddev:1011.28 PSNR:36.22 bytes:292864
SPEECH.BIT stddev:434.21 PSNR:43.57 bytes:598016
TAME.BIT stddev:9310.47 PSNR:16.94 bytes:20480
TEST.BIT stddev:198.69 PSNR:50.36 bytes:26624

My code with fixed-point tables vs AnnexC (floating-point):

> > +/**
>  > + * L1 codebook (10-dimensional, with 128 entries (3.2.4)
>  > + */
>  > +static const float cb_L1[128][10] = {
> >
> > [...]
> >
>
>  *10000 and it fits in uint16_t

This and tables below replaced with fixed-point tables from AnnexA
reference code but this decreased PSNR significantly.
Multiplying by 10000 did not decrease  PSNR, though

>  > +/**
>  > + * \brief Check parity bit (3.7.2)
>  > + * \param P1 Pitch delay first subframe
>  > + * \param P0 Parity bit for Pitch delay
>  > + *
>  > + * \return 1 if parity check is ok, 0 - otherwise
>  > + */
>  > +int g729_parity_check(int P1, int P0)
>  > +{
>  > +    int P=P1>>2;
>  > +    int S=P0&1;
>  > +    int i;
>  > +
>  > +    for(i=0; i<6; i++)
>  > +    {
>  > +        S ^= P&1;
>  > +        P >>= 1;
>  > +    }
>  > +    S ^= 1;
>  > +    return (!S);
>  > +}
>
>  ehh?
>  you have 6 bits, to find the parity of 6bit you do:
>
>  P1>>=2;
>  return ((0xMAGIC ULL>>P1)^P0)&1;

Nice :)
Done as   return ((0x6996966996696996ULL >> (P1>>2)) ^ P0) & 1;
P1 parameter also converted to uint8_t from int

>  > +
>  > +/**
>  > + * \brief Decoding of the adaptive-codebook vector delay for first subframe (4.1.3)
>  > + * \param ctx private data structure
>  > + * \param P1 Pitch delay first subframe
>  > + * \param intT [out] integer part of delay
>  > + * \param frac [out] fractional part of delay [-1, 0, 1]
>  > + */
>  > +static void g729_decode_ac_delay_subframe1(G729A_Context* ctx, int P1, int* intT, int* frac)
>  > +{
>  > +    /* if no parity error */
>  > +    if(!ctx->bad_pitch)
>  > +    {
>  > +        if(P1<197)
>  > +        {
>  > +            *intT=1.0*(P1+2)/3+19;
>  > +            *frac=P1-3*(*intT)+58;
>
>  remove the float (1.0) use proper / and %

1.0 is unnecessary here.
Reference code uses the same expression but without 1.0..
My fault. Fixed.

>  Use the return value of your functions, avoid returning values through
>  pointers. Its harder to read
>

How can i return two values via return ?
this routine should return integer and fractional parts of pitch
delay. this code also stores integer part in context for using in next
subframe.

Cleaner solution for me will be like this:
==CUT==
g729_decode_ac_delay_subframe1(ctx, P1, &intT, &frac);
//save integer part for using in next subframe
ctx->intT1=intT;
===CUT==
With removed
"ctx->intT1=*intT" from inside routine.
Assigning can be done even after second subfrme decoding.

But i don't see how return these two values via "return"
(packing them into one "int" will be small overhead. imho)

>  Also i must say that the reference implemetation is cleaner than
>  your code
>
>  if intT and frac are integer and frac of something they should have
>  matching names, the ref impl uses T0 T0_frac, now we shouldnt copy
>  that but still.
>  P1 which you document as "Pitch delay first subframe"
>  is called index and documented as "input : received pitch index"
>  in the ref impl.

P1 comment fixed.
k and t will be renamed to pitch_delay_int and pitch_delay_frac

>  > +    int tmin=FFMIN(FFMAX(intT1-5, PITCH_MIN)+9, PITCH_MAX)-9;
>
>  int tmin=FFMIN(FFMAX(intT1-5, PITCH_MIN), PITCH_MAX-9);
>
>
>  > +
>  > +    if(ctx->data_error)
>  > +    {
>  > +        *intT=intT1;
>  > +        *frac=0;
>  > +        ctx->intT2_prev=FFMIN(intT1+1, PITCH_MAX);
>  > +        return;
>  > +    }
>  > +
>
>  > +    *intT=(P2+2)/3-1;
>  > +    *frac=P2-2-3*(*intT);
>
>  *T_int = (P2+2)/3 - 1;
>  *T_frac= (P2+2)%3 - 1;
>
>
>  > +
>  > +    *intT+=tmin;
>  > +
>  > +    ctx->intT2_prev=*intT;
>  > +}
>
>  as already said use the return value!

See comment above

>  > +            /*  R(x):=ac_v[-k+x] */
>  > +            v+=ac_v[n-k-i]*b30[t+3*i];     //R(n-i)*b30(t+3i)
>  > +            v+=ac_v[n-k+i+1]*b30[3-t+3*i]; //R(n+i+1)*b30(3-t+3i)
>
>  vertical align please, and rename b30 to something remotely sane.
>  hammsinc would be a random idea, interpol_filter would be another

renamed to interpol_filter

>  > + * \brief Decoding fo the fixed-codebook vector (3.8)
>
> > + * \param ctx private data structure
>
>  > + * \param C Fixed codebook
>  > + * \param S Signs of fixed-codebook pulses (0 bit value means negative sign)
>
>  use meaningfull variable names and upper case should only be used for #defines

C and S renamed to fc_indexes and pulse_signs
>
>
>  > + * \param fc_v [out] decoded fixed codebook vector
>  > + *
>  > + * bit allocations:
>  > + *   8k mode: 3+3+3+1+3
>  > + * 4.4k mode: 4+4+4+1+4 (non-standard)
>  > + *
>  > + * FIXME: error handling required
>  > + */
>  > +static void g729_decode_fc_vector(G729A_Context* ctx, int C, int S, float* fc_v)
>  > +{
>  > +    int accC=C;
>  > +    int accS=S;
>  > +    int i;
>  > +    int index;
>  > +    int bits=formats[ctx->format].fc_index_bits;
>
>  all values which are the same in all formats should be replaced by named
>  constants (#define FC_INDEX_BITS or a better name) theres no sense in
>  having them in these structs

Entire vector_bits structure disappeared

fc_index_bits and bits_per_frame  are only things that differs between
formats :)
All others are replaced with #define

>  > +    for(i=T; i<ctx->subframe_size;i++)
>  > +        fc_v[i]+=fc_v[i-T]*ctx->gain_pitch;
>  > +}
>
>  please use more spaces in your expressions

Will fix while fixing another issues.

>  > +    if(!gain_after)
>  > +        return;
>
>  testing floats for equallity is risky ...

How can i avoid division by zero in this case?

>  [...]
>  > +    /*
>  > +       First pass: searching the best T0 (pitch delay)
>  > +       Second pass is not used in G.729A: fractional part is always zero
>  > +    */
>  > +    k=minT0;
>  > +    corellation=0;
>  > +    /* 4.2.1, Equation 80 */
>
> > +    for(n=0; n<ctx->subframe_size; n++)
>  > +        corellation+=ctx->residual[n+PITCH_MAX]*ctx->residual[n+PITCH_MAX-k];
>  > +
>  > +    corr_max=corellation;
>  > +    intT0=k;
>  > +
>  > +    for(; k<=maxT0; k++)
>  > +    {
>  > +        corellation=0;
>  > +        /* 4.2.1, Equation 80 */
>
> > +        for(n=0; n<ctx->subframe_size; n++)
>  > +            corellation+=ctx->residual[n+PITCH_MAX]*ctx->residual[n+PITCH_MAX-k];
>  > +        if(corellation>corr_max)
>  > +        {
>  > +            corr_max=corellation;
>  > +            intT0=k;
>  > +        }
>  > +    }
>
>  the first loop does nothing, remove it

Hm. Not sure.
This is not sum of squeres, thus can be negative.
replacing "intT0=k" with "intT0=k++" before loop will avoid calculating
the same value twice.

>  > +    for(i=0; i<ctx->subframe_size; i++)
>  > +    {
>
>  > +        tmp_speech[i] = FFMIN(tmp_speech[i],32767.0);
>  > +        tmp_speech[i] = FFMAX(tmp_speech[i],-32768.0);
>  > +        speech[i]=g729_round(tmp_speech[i]);
>
>  lrintf() or float_to_int16()

even simple assignment will be enough.
+/-1 on resulted PCM sample will not hurt signal, imho.

>  > +    /* 3.2.6, Equation 25*/
>  > +    for(i=0;i<5;i++)
>  > +    {
>  > +        ff1[i]=f1[i+1]+f1[i];
>  > +        ff2[i]=f2[i+1]-f2[i];
>  > +    }
>  > +
>  > +    /* 3.2.6, Equation 26*/
>  > +    for(i=0;i<5;i++)
>  > +    {
>  > +        a[i]=(ff1[i]+ff2[i])/2;
>  > +        a[i+5]=(ff1[4-i]-ff2[4-i])/2;
>  > +    }
>
>  /* 3.2.6, Equation 25 and 26 */
>  for(i=0;i<5;i++){
>     a[  i]=
>     a[9-i]= (f1[i+1] + f1[i] - f2[i+1] + f2[i])*0.5;
>  }

Wrong. You mislooked +/- in Equation 26

>  [...]
>  > +/**
>  > + * \brief decode one G.729 frame into PCM samples
>  > + * \param serial array if bits (0x81 - 1, 0x7F -0)
>  > + * \param serial_size number of items in array
>  > + * \param out_frame array for output PCM samples
>  > + * \param out_frame_size maximum number of elements in output array
>  > + */
>  > +static int  g729a_decode_frame_internal(void* context, short* out_frame, int out_frame_size, int *parm)
>  > +{
>
>  there is no need for g729a_ prefixes

Is this related to *_decode_* *_init *close (i.e. non-g729 specific)
routines only ?

All issues not mentioned above hopefully fixed in local tree.
Waiting for comments about PSNR regression and value returning issues.


-- 
Regards,
Vladimir Voroshilov     mailto:voroshil at gmail.com
JID: voroshil at gmail.com, voroshil at jabber.ru
ICQ: 95587719




More information about the ffmpeg-devel mailing list