[Ffmpeg-devel] [PATCH] Chinese AVS video decoder

Thu Jun 29 19:05:11 CEST 2006

On Thu, Jun 29, 2006 at 02:56:22AM +0200, Michael Niedermayer wrote:
> [...]
> > +static const int cbp_tab[64][2] = {
> > +  {63, 0},{15,15},{31,63},{47,31},{ 0,16},{14,32},{13,47},{11,13},
> > +  { 7,14},{ 5,11},{10,12},{ 8, 5},{12,10},{61, 7},{ 4,48},{55, 3},
> > +  { 1, 2},{ 2, 8},{59, 4},{ 3, 1},{62,61},{ 9,55},{ 6,59},{29,62},
> > +  {45,29},{51,27},{23,23},{39,19},{27,30},{46,28},{53, 9},{30, 6},
> > +  {43,60},{37,21},{60,44},{16,26},{21,51},{28,35},{19,18},{35,20},
> > +  {42,24},{26,53},{44,17},{32,37},{58,39},{24,45},{20,58},{17,43},
> > +  {18,42},{48,46},{22,36},{33,33},{25,34},{49,40},{40,52},{36,49},
> > +  {34,50},{50,56},{52,25},{54,22},{41,54},{56,57},{38,41},{57,38}
> > +};
> 
> this fits in an uint8_t which means we could save 75% memory, same issue
> with many other tables

and for those who don't understand already, memory == performance due
to cache pollution.

> > +static inline int get_bs_p(vector_t *mvP, vector_t *mvQ) {
> > +    if((mvP->ref == REF_INTRA) || (mvQ->ref == REF_INTRA))
> > +        return 2;
> > +    if(mvP->ref != mvQ->ref)
> > +        return 1;
> > +    if( (abs(mvP->x - mvQ->x) >= 4) ||  (abs(mvP->y - mvQ->y) >= 4) )
> > +        return 1;
> > +    return 0;
> > +}
> 
> you really have alot of abs(...) </> CONSTANT in your code, maybe a
> macro like
> #define ABSCMP(x,c) ((x)+(c)) > (unsigned)(2*(c))
> 
> would make the code faster, that of course would need to be benchmarked
> if you want to try it

nice. :))
surely it's faster.

> [...]
> > +static int cavs_decode_frame(AVCodecContext * avctx,void *data, int *data_size,
> > +                             uint8_t * buf, int buf_size) {
> > +    AVSContext *h = avctx->priv_data;
> > +    MpegEncContext *s = &h->s;
> > +    int input_size;
> > +    const uint8_t *buf_end;
> > +    const uint8_t *buf_ptr;
> > +    AVFrame *picture = data;
> > +    int stc = -1;
> > +
> > +    s->avctx = avctx;
> > +
> > +    if (buf_size == 0) {
> > +        return 0;
> > +    }
> 
> this is wrong, you must output the not yet outputed delayed frames here

and should also support decode-order output, right?

rich