[FFmpeg-devel] [PATCH] Common code for Indeo interactive

Tue Mar 24 17:17:39 CET 2009

On Tue, Mar 24, 2009 at 04:27:56PM +0100, Maxim wrote:
> Michael Niedermayer schrieb:
> > On Thu, Mar 19, 2009 at 11:12:40PM +0100, Maxim wrote:
> >   
> >> Hello guys,
> >>
> >> here is the 1st part of my patch making FFmpeg decode Indeo interactive
> >> (indeo4/indeo5) streams. Both codecs are very similar but use abit
> >> different bitstream formats, so I decided myself to put all common
> >> functions and tables together and submit them as a separate patch.
> >> The 2nd part will be the long-awaited indeo5 decoder...
> >> Please review!
> >>     
> >
> >
> >   
> 
> [...]
> 
> The most things you pointed to are already changed/fixed. I'll send a
> patch to the thread "Indeo5 decoder" soon...
> Here only some questions...
> >
> >   
> >> {
> >>     int         result, pos, i, j, codes_per_row, prefix, last_row;
> >>     uint16_t    codewords[256]; /* FIXME: move this temporal storage out here? */
> >>     uint8_t     bits[256];
> >>
> >>     pos = 0; /* current position = 0 */
> >>
> >>     for (i = 0; i < cb->num_rows; i++) {
> >>         codes_per_row = 1 << cb->xbits[i];
> >>         last_row = (i - cb->num_rows + 1) != 0; /* = 0 for the last row */
> >>         prefix = ((1 << i) - 1) << (cb->xbits[i] + last_row);
> >>
> >>         for (j = 0; j < codes_per_row; j++) {
> >>             if(pos >= 256)  /* some indeo5 codebooks can have more as 256 elements */
> >>                 break;      /* but only 256 codes are allowed! */
> >>
> >>             bits[pos] = i + cb->xbits[i] + last_row;
> >>             if (bits[pos] > IVI_VLC_BITS)
> >>                 return -1; /* invalid descriptor */
> >>
> >>             codewords[pos] = ivi_inv_bits((prefix | j), bits[pos]);
> >>             if (bits[pos] == 0)
> >>                 bits[pos] = 1;
> >>
> >>             pos++;
> >>         }//for j
> >>     }//for i
> >>
> >>     /* number of codewords = pos */
> >>     result = init_vlc(pOut, IVI_VLC_BITS, pos, &bits[0], 1, 1, &codewords[0], 2, 2, (flag & INIT_VLC_USE_STATIC) | INIT_VLC_LE);
> >>     
> >
> > #define INIT_VLC_USE_STATIC 1 ///< VERY strongly deprecated and forbidden
> >   
> 
> What should be used instead? Changing INIT_VLC_USE_STATIC to
> INIT_VLC_USE_NEW_STATIC causes a crash. Am I doing smth wrong?

That flag requires setting table pointer in VLC struct to static array before init -
look at macro INIT_VLC_STATIC in libavcodec/bitstream.h

> >>     if (planes[0].buf1)
> >>         av_free(planes[0].buf1);
> >>     
> > /* allocate luma buffers (aligned on 16x16 for simplicity) */
> >
> > teh if is redundant
> >   
> 
> Hm, this function used in both initial allocation as well as
> reallocation of the internal buffers. When it's called first time from
> "ivi5_decode_init" all pointers in the context are zero. Freeing memory
> without "if (ptr)"-guards is fatal in this case, isn't?

no, it has those guards inside. Also using av_freep(&planes[0].buf1) will set that
pointer to NULL.

> An alternative to it could be to code this function with an additional
> parameter telling if it's an allocation or reallocation...
> 
> >>  *  this is a speed-up version of the inverse 2D slant transforms
> >>  *  for the case if there is a non-zero DC coeff and all AC coeffs are zero.
> >>  *  Performing the inverse slant transform in this case is equivalent to
> >>  *  spreading (DC_coeff + 1)/2 over the whole block.
> >>  *  It works much faster than performing the slant transform on a vector of zeroes.
> >>  *
> >>  *  @param  in          [in] pointer to the dc coefficient
> >>  *  @param  out         [out] pointer to the output buffer (frame)
> >>  *  @param  pitch       [in] pitch to move to the next y line
> >>  *  @param  blk_size    [in] transform block size
> >>  */
> >> void ivi_dc_slant_2d (int32_t *in, int16_t *out, uint32_t pitch, int blk_size)
> >> {
> >>     int     x, y;
> >>     int16_t dc_coeff;
> >>
> >>     dc_coeff = (*in + 1) >> 1;
> >>
> >>     for (y = 0; y < blk_size; out += pitch, y++) {
> >>         for (x = 0; x < blk_size; x++)
> >>             out[x] = dc_coeff;
> >>     }
> >> }
> >>     
> > /**
> >
> > each row/col should be likely checked, not just all or none 0
> >   
> 
> I'm not sure I understand your comment right, sorry... Should I add some
> checks? It's completely needless at this point because this function
> will be always used for the case "DC coef != 0, all AC coeffs == 0"...

I think he meant speedup version for the case when you have zero coeffs in some
column or row of matrix and set it to zero instead of performing transform
(that's for the general case transform)

> >> /**
> >>  *  8x8 block motion compensation with adding delta
> >>  *
> >>  *  @param  buf     [in,out] pointer to the block in the current frame buffer containing delta
> >>  *  @param  ref_buf [in] pointer to the corresponding block in the reference frame
> >>  *  @param  pitch   [in] pitch for moving to the next y line
> >>  *  @param  mc_type [in] interpolation type
> >>  */
> >> static void ivi_mc_8x8_delta (int16_t *buf, int16_t *ref_buf, uint32_t pitch, int mc_type)
> >> {
> >>     int     i, r1, r2, r3, r4;
> >>     int16_t *wptr;
> >>
> >>     switch(mc_type) {
> >>         case 0: /* fullpel (no interpolation) */
> >>             for (i = 0; i < 8; i++) {
> >>                 buf[0] += ref_buf[0];
> >>                 buf[1] += ref_buf[1];
> >>                 buf[2] += ref_buf[2];
> >>                 buf[3] += ref_buf[3];
> >>                 buf[4] += ref_buf[4];
> >>                 buf[5] += ref_buf[5];
> >>                 buf[6] += ref_buf[6];
> >>                 buf[7] += ref_buf[7];
> >>                 buf += pitch;
> >>                 ref_buf += pitch;
> >>             }
> >>             break;
> >>         case 1: /* horizontal halfpel interpolation */
> >>             for (i = 0; i < 8; i++) {
> >>                 r2 = ref_buf[1];
> >>                 r3 = ref_buf[2];
> >>                 r4 = ref_buf[3];
> >>                 buf[0] += (ref_buf[0] + r2) >> 1;
> >>                 buf[1] += (r2 + r3) >> 1;
> >>                 buf[2] += (r3 + r4) >> 1;
> >>                 r1 = ref_buf[4];
> >>                 buf[3] += (r4 + r1) >> 1;
> >>                 r2 = ref_buf[5];
> >>                 buf[4] += (r1 + r2) >> 1;
> >>                 r3 = ref_buf[6];
> >>                 buf[5] += (r2 + r3) >> 1;
> >>                 r4 = ref_buf[7];
> >>                 buf[6] += (r3 + r4) >> 1;
> >>                 r1 = ref_buf[8];
> >>                 buf[7] += (r4 + r1) >> 1;
> >>                 buf += pitch;
> >>                 ref_buf += pitch;
> >>             }
> >>             break;
> >>         case 2: /* vertical halfpel interpolation */
> >>             wptr = ref_buf + pitch;
> >>             for (i = 0; i < 8; i++) {
> >>                 buf[0] += (ref_buf[0] + wptr[0]) >> 1;
> >>                 buf[1] += (ref_buf[1] + wptr[1]) >> 1;
> >>                 buf[2] += (ref_buf[2] + wptr[2]) >> 1;
> >>                 buf[3] += (ref_buf[3] + wptr[3]) >> 1;
> >>                 buf[4] += (ref_buf[4] + wptr[4]) >> 1;
> >>                 buf[5] += (ref_buf[5] + wptr[5]) >> 1;
> >>                 buf[6] += (ref_buf[6] + wptr[6]) >> 1;
> >>                 buf[7] += (ref_buf[7] + wptr[7]) >> 1;
> >>                 buf += pitch;
> >>                 wptr += pitch;
> >>                 ref_buf += pitch;
> >>             }
> >>             break;
> >>         case 3: /* vertical and horizontal halfpel interpolation */
> >>             wptr = ref_buf + pitch;
> >>             for (i = 0; i < 8; i++) {
> >>                 r1 = ref_buf[1];
> >>                 r2 = wptr[1];
> >>                 buf[0] += (ref_buf[0] + r1 + wptr[0] + r2) >> 2;
> >>                 r3 = ref_buf[2];
> >>                 r4 = wptr[2];
> >>                 buf[1] += (r1 + r3 + r2 + r4) >> 2;
> >>                 r1 = ref_buf[3];
> >>                 r2 = wptr[3];
> >>                 buf[2] += (r3 + r1 + r4 + r2) >> 2;
> >>                 r3 = ref_buf[4];
> >>                 r4 = wptr[4];
> >>                 buf[3] += (r1 + r3 + r2 + r4) >> 2;
> >>                 r1 = ref_buf[5];
> >>                 r2 = wptr[5];
> >>                 buf[4] += (r3 + r1 + r4 + r2) >> 2;
> >>                 r3 = ref_buf[6];
> >>                 r4 = wptr[6];
> >>                 buf[5] += (r1 + r3 + r2 + r4) >> 2;
> >>                 r1 = ref_buf[7];
> >>                 r2 = wptr[7];
> >>                 buf[6] += (r3 + r1 + r4 + r2) >> 2;
> >>                 r3 = ref_buf[8];
> >>                 r4 = wptr[8];
> >>                 buf[7] += (r1 + r3 + r2 + r4) >> 2;
> >>                 buf += pitch;
> >>                 wptr += pitch;
> >>                 ref_buf += pitch;
> >>             }
> >>             break;
> >>     }
> >> }
> >>     
> >
> > this can be simplified alot
> >   
> 
> Could you please give me an example of how it could be done...

Something like this (correctness is not guaranteed):

for(i = 0; i < 8; i++){
 r[0] = ref_buf[0];
 r[1] = ref_buf[1];
 for(j = 0; j < 8; j++){
  r[(j & 3) + 2] = ref_buf[j];
  buf[i] = (r[0] + r[1] + r[2] + r[3]) >> 2;
 }
 buf += pitch
}

[...] 
> 
> >>                     q = dequant_tab[pos]; /* get scalefactor */
> >>                     if (q != 1 && val != 0) {
> >>                         if (val > 0)
> >>                             val = (val * q) + (q >> 1) - (q & 1);
> >>                         else
> >>                             val = (val * q) - (q >> 1) + (q & 1);
> >>                     }
> >>     
> >
> > duplicate <0 check
> >   
> Sorry, add a check or remove one?

Remove one

> Regards
> Maxim

P.S. I have one disk with files encoded with Indeo 5 beta. Your decoder decodes
them with motion and other artefacts but picture is mostly OK. Binary decoder
used to hand my computer completely.

/me waits for scalability support and Indeo 4