[FFmpeg-devel] Indeo3 replacement, take 3

Vitor Sessak vitor1001
Fri Nov 6 00:39:55 CET 2009


Maxim wrote:
> Hi Michael,
> 
> many thanks for your review! Below some question/comments...
> 
>> [...]
>>   
>>
>>> //! static vq delta tables (initialized at startup)
>>> static int32_t  delta_tabs   [24][256];     ///< vq delta tables for all 4x4 block modes
>>> static uint8_t  selector_tabs[24][256];     ///< each entry says which delta type each code is of
>>>
>>> //! vq delta tables for the mode 10 (8x8 blocks)
>>> static DECLARE_ALIGNED_8(uint64_t, delta_m10_tab[24][256]);
>>>
>>>
>>> /**
>>>  *  Build decoder delta tables dynamically.
>>>  */
>>> static av_cold void build_delta_tables(void)
>>> {
>>>     int             n, i, j, quads, swap;
>>>     int32_t         a, b, corr;
>>>     int32_t         *delta_tab, *quad_tab;
>>>     const vqtEntry  *rom_tab;
>>>     const int8_t    *delta_rom;
>>>     int8_t          *sel_tab;
>>>     int64_t         *delta_64b;
>>>
>>>     for (n = 0; n < 24; n++) {
>>>         rom_tab   = &vq_delta_rom[n];
>>>         delta_rom = rom_tab->delta_tab;
>>>
>>>         delta_tab = &delta_tabs   [n][0];
>>>         sel_tab   = &selector_tabs[n][0];
>>>         delta_64b = &delta_m10_tab[n][0];
>>>
>>>         /* set all code selectors = forbidden */
>>>         memset(sel_tab, RLE_FORBIDDEN, 249);
>>>
>>>         /* initialize selectors for RLE escape codes (0xF8...0xFF) */
>>>         sel_tab[249] = RLE_ESC_F9;
>>>         sel_tab[250] = RLE_ESC_FA;
>>>         sel_tab[251] = RLE_ESC_FB;
>>>         sel_tab[252] = RLE_ESC_FC;
>>>         sel_tab[253] = RLE_ESC_FD;
>>>         sel_tab[254] = RLE_ESC_FE;
>>>         sel_tab[255] = RLE_ESC_FF;
>>>
>>>         for (i = 0; i < rom_tab->num_dyads; i++) {
>>>             /* get two vq deltas from the ROM table */
>>>             a = delta_rom[i * 2];
>>>             b = delta_rom[i * 2 + 1];
>>>             if (!HAVE_BIGENDIAN)
>>>                 FFSWAP(int32_t, a, b);
>>>
>>>             /* concatenate two vq deltas to form a two-pixel corrector (dyad) */
>>>             delta_tab[i] = (a << 8) + b;
>>>     
>> iam not sure if i would call this concatenation, considering these are
>> signed variables and can be negative
>>
>> also it seems to me that the delta_rom tables are never used directly
>> but rather just the softSIMD twistet form of them, if so this is a
>> waste of memory, the softSIMD twiseted form could be stored directly
>>   
> 
> Those "softSIMD twisted" tables are build depends on endianess of the
> particular machine. We need to store two different tables, one for each
> endianess type. This would indeed require less memory but would end up
> in the huge file like "indeo3data.h" from the current svn...
> Moreover the twisted tables are less readable IMHO...

There is also the possibility of doing the following (untested):

#if HAVE_BIGENDIAN
#define SS(a,b) ((a << 8) + (b))
#else
#define SS(a,b) ((b << 8) + (a))
#endif

static const int32_t delta_tabs [24][256] = {
     {
         SS(0,0), SS(2,2), ....
     }
>>   
>>>     /* those two values are different in the binary decoder, no idea why */
>>>     /* patch them in order to make the output bitexact */
>>>     requant_tab[1][7] = 10;
>>>     requant_tab[4][8] = 10;
>>>     
>> are these the only differences in the tables?
>> or just the only used by the files you tested?
>> also from the comment i assume your decoder is bitexact, is it?
>>   
> Those tables are hardcoded in Xanim, Windows and Mac libraries therefore
> I could compare them easily...
> The equation above was found from the table values using the "try and
> error" approach...
> And yes, the decoder is bitexact at this moment compared to Win32 output!
> 
>> and the code above is not thread safe (a simple solution is to
>> never write to any spot in a global array a value that differs
>> from what you previously wrote there)
>>   
> 
> What would be a solution for that? The use of hardcoded tables?

I think that static tables are always initialized by the OS with zeros, 
so you can do something like (also untested)

>     static uint8_t coef1[] = {1, 1, 2, -3, -3, 3, 4, 4};
>     static uint8_t coef2[] = {0, 1, 0,  4,  4, 1, 0, 1};
> 
>     /* some last elements calculated above will have values >= 128 */
>     /* pixel values shall never exceed 127 so set them to non-overflowing values */
>     /* according with the quantization step of the respective section */
>     requant_tab[0][127] = 126;
>     requant_tab[1][119] = 118;
>     requant_tab[1][120] = 118;
>     requant_tab[2][126] = 124;
>     requant_tab[2][127] = 124;
>     requant_tab[6][124] = 120;
>     requant_tab[6][125] = 120;
>     requant_tab[6][126] = 120;
>     requant_tab[6][127] = 120;
> 
>     /* those two values are different in the binary decoder, no idea why */
>     /* patch them in order to make the output bitexact */
>     requant_tab[1][7] = 10;
>     requant_tab[4][8] = 10;
> 
>     for (i = 0; i < 128; i++)
>          for(j = 0; j < 8; j++)
>              if (!requant_tab[j][i])
>                   requant_tab[j][i] = (i + coef1[j]) / j * j + coef2[j];

And IMO it should be thread-safe.

>>
>>>     skip_bits   = 0;
>>>     need_resync = 0;
>>>
>>>     /* init binary tree decoding */
>>>     curr_cell = ctx->cell_stack;
>>>
>>>     /* initialize the 1st cell and set its dimensions to whole plane */
>>>     curr_cell->xpos   = curr_cell->ypos = 0;
>>>     curr_cell->width  = plane->width  >> 2;
>>>     curr_cell->height = plane->height >> 2;
>>>     curr_cell->tree   = 0; // we are in the MC tree now
>>>     curr_cell->mv_ptr = 0; // no motion vector = INTRA cell
>>>
>>>     /* process the plane's binary tree until it gets exhausted */
>>>     while (curr_cell >= ctx->cell_stack) {
>>>         if (need_resync && !(get_bits_count(&gb) & 7)) {
>>>             skip_bits_long(&gb, skip_bits);
>>>             skip_bits   = 0;
>>>             need_resync = 0;
>>>         }
>>>         switch (get_bits(&gb, 2)) {
>>>         case H_SPLIT:
>>>             /* split current cell into two vertical subcells */
>>>             prev_cell = curr_cell;
>>>             if (curr_cell >= &ctx->cell_stack[CELL_STACK_MAX]) {
>>>                 av_log(avctx, AV_LOG_ERROR, "Cell stack overflow (corrupted binary tree)!\n");
>>>                 return -1;
>>>             }
>>>             DUPLICATE_CELL(curr_cell);
>>>             SPLIT_CELL(prev_cell->height, curr_cell->height);
>>>             prev_cell->ypos   += curr_cell->height;
>>>             prev_cell->height -= curr_cell->height;
>>>             break;
>>>     
>> this function likely can be implemented much cleaner recursively instead of
>> this synthetic stack thing.
>>   
> 
> Ok, how to prevent the stack in a recursive function from possible
> corruption in the case of damaged binary tree? Check a global counter
> locked by a mutex?

I think it might be easier to pass to the recursive func a parameter 
with the maximum recursion depth and decrement it before calling itself 
recursively.

-Vitor



More information about the ffmpeg-devel mailing list