[FFmpeg-devel] Indeo3 replacement, take 3

Maxim max_pole
Thu Nov 5 22:50:46 CET 2009


Hi Michael,

many thanks for your review! Below some question/comments...

> [...]
>   
>
>> //! static vq delta tables (initialized at startup)
>> static int32_t  delta_tabs   [24][256];     ///< vq delta tables for all 4x4 block modes
>> static uint8_t  selector_tabs[24][256];     ///< each entry says which delta type each code is of
>>
>> //! vq delta tables for the mode 10 (8x8 blocks)
>> static DECLARE_ALIGNED_8(uint64_t, delta_m10_tab[24][256]);
>>
>>
>> /**
>>  *  Build decoder delta tables dynamically.
>>  */
>> static av_cold void build_delta_tables(void)
>> {
>>     int             n, i, j, quads, swap;
>>     int32_t         a, b, corr;
>>     int32_t         *delta_tab, *quad_tab;
>>     const vqtEntry  *rom_tab;
>>     const int8_t    *delta_rom;
>>     int8_t          *sel_tab;
>>     int64_t         *delta_64b;
>>
>>     for (n = 0; n < 24; n++) {
>>         rom_tab   = &vq_delta_rom[n];
>>         delta_rom = rom_tab->delta_tab;
>>
>>         delta_tab = &delta_tabs   [n][0];
>>         sel_tab   = &selector_tabs[n][0];
>>         delta_64b = &delta_m10_tab[n][0];
>>
>>         /* set all code selectors = forbidden */
>>         memset(sel_tab, RLE_FORBIDDEN, 249);
>>
>>         /* initialize selectors for RLE escape codes (0xF8...0xFF) */
>>         sel_tab[249] = RLE_ESC_F9;
>>         sel_tab[250] = RLE_ESC_FA;
>>         sel_tab[251] = RLE_ESC_FB;
>>         sel_tab[252] = RLE_ESC_FC;
>>         sel_tab[253] = RLE_ESC_FD;
>>         sel_tab[254] = RLE_ESC_FE;
>>         sel_tab[255] = RLE_ESC_FF;
>>
>>         for (i = 0; i < rom_tab->num_dyads; i++) {
>>             /* get two vq deltas from the ROM table */
>>             a = delta_rom[i * 2];
>>             b = delta_rom[i * 2 + 1];
>>             if (!HAVE_BIGENDIAN)
>>                 FFSWAP(int32_t, a, b);
>>
>>             /* concatenate two vq deltas to form a two-pixel corrector (dyad) */
>>             delta_tab[i] = (a << 8) + b;
>>     
>
> iam not sure if i would call this concatenation, considering these are
> signed variables and can be negative
>
> also it seems to me that the delta_rom tables are never used directly
> but rather just the softSIMD twistet form of them, if so this is a
> waste of memory, the softSIMD twiseted form could be stored directly
>   

Those "softSIMD twisted" tables are build depends on endianess of the
particular machine. We need to store two different tables, one for each
endianess type. This would indeed require less memory but would end up
in the huge file like "indeo3data.h" from the current svn...
Moreover the twisted tables are less readable IMHO...

Another approach (with the least memory requirements) would be to
prepare the twisted delta value "on the fly" before using it. This would
be surely slower...

>
> [...]
>   
>>     /* those two values are different in the binary decoder, no idea why */
>>     /* patch them in order to make the output bitexact */
>>     requant_tab[1][7] = 10;
>>     requant_tab[4][8] = 10;
>>     
>
> are these the only differences in the tables?
> or just the only used by the files you tested?
> also from the comment i assume your decoder is bitexact, is it?
>   
Those tables are hardcoded in Xanim, Windows and Mac libraries therefore
I could compare them easily...
The equation above was found from the table values using the "try and
error" approach...
And yes, the decoder is bitexact at this moment compared to Win32 output!

> and the code above is not thread safe (a simple solution is to
> never write to any spot in a global array a value that differs
> from what you previously wrote there)
>   

What would be a solution for that? The use of hardcoded tables?

>
> [...]
>   
>> /**
>>  *  Requant a 8 pixel line by duplicating each even pixel as follows:
>>  *  ABCDEFGH -> AACCEEGG
>>  */
>> static inline uint64_t requant(uint64_t a) {
>> #if HAVE_BIGENDIAN
>>     a &= 0xFF00FF00FF00FF00;
>>     a |= a >> 8;
>> #else
>>     a &= 0x00FF00FF00FF00FF;
>>     a |= a << 8;
>> #endif
>>     return a;
>> }
>>     
>
> i would not call this operation re-quantization
>   

Ok, what is it then?

>
> [...]
>   
>>     /* requantize the prediction if VQ index of this cell differs from VQ index */
>>     /* of the predicted cell in order to avoid overflows. */
>>     if (vq_index >= 8) {
>>         for (x = 0; x < cell->width << 2; x++)
>>             ref_block[x] = requant_tab[vq_index & 7][ref_block[x]];
>>     }
>>     
>
> this maybe can segfault (i assume that ref_block can drift out of 0..127
> for damaged files or did i miss something that prevents this?)
>   

Ok, I'll added a check preventing the overflow:

ref_block[x] = requant_tab[vq_index & 7][ref_block[x] & 0x7F];



>
> [...]
>   
>
>>     skip_bits   = 0;
>>     need_resync = 0;
>>
>>     /* init binary tree decoding */
>>     curr_cell = ctx->cell_stack;
>>
>>     /* initialize the 1st cell and set its dimensions to whole plane */
>>     curr_cell->xpos   = curr_cell->ypos = 0;
>>     curr_cell->width  = plane->width  >> 2;
>>     curr_cell->height = plane->height >> 2;
>>     curr_cell->tree   = 0; // we are in the MC tree now
>>     curr_cell->mv_ptr = 0; // no motion vector = INTRA cell
>>
>>     /* process the plane's binary tree until it gets exhausted */
>>     while (curr_cell >= ctx->cell_stack) {
>>         if (need_resync && !(get_bits_count(&gb) & 7)) {
>>             skip_bits_long(&gb, skip_bits);
>>             skip_bits   = 0;
>>             need_resync = 0;
>>         }
>>         switch (get_bits(&gb, 2)) {
>>         case H_SPLIT:
>>             /* split current cell into two vertical subcells */
>>             prev_cell = curr_cell;
>>             if (curr_cell >= &ctx->cell_stack[CELL_STACK_MAX]) {
>>                 av_log(avctx, AV_LOG_ERROR, "Cell stack overflow (corrupted binary tree)!\n");
>>                 return -1;
>>             }
>>             DUPLICATE_CELL(curr_cell);
>>             SPLIT_CELL(prev_cell->height, curr_cell->height);
>>             prev_cell->ypos   += curr_cell->height;
>>             prev_cell->height -= curr_cell->height;
>>             break;
>>     
>
> this function likely can be implemented much cleaner recursively instead of
> this synthetic stack thing.
>   

Ok, how to prevent the stack in a recursive function from possible
corruption in the case of damaged binary tree? Check a global counter
locked by a mutex?

>
> [...]
>   
>>         /* convert four pixels at once using softSIMD */
>>         for (x = 0; x < plane->width >> 2; x++)
>>             *dst32++ = (*src32++ & 0x7F7F7F7F) << 1;
>>     
>
> is the & needed, arent these bits 0 anyway?
>   

See the question above with the "ref_block"...





More information about the ffmpeg-devel mailing list