[FFmpeg-devel] [PATCH] Electronic Arts MAD Decoder

Sun Jun 7 11:53:38 CEST 2009

On Sun, Jun 07, 2009 at 04:24:59PM +1000, Peter Ross wrote:
> +static inline void mad_comp(unsigned char *dst, int dst_stride,
> +                            unsigned char *src, int src_stride,
> +                            int add){
> +    int j,i;
> +    for(j=0;j<8;j++)
> +    for(i=0;i<8;i++)
> +        dst[j*dst_stride + i] = av_clip_uint8(src[j*src_stride + i] + add);

Probably the compiler can figure it out by itself, but you sure don't
want the multiplies in the inner loop but instead something like
> +    for(i=0;i<8;i++)
> +        dst[i] = av_clip_uint8(src[i] + add);
> +    src += src_stride;
> +    dst += dst_stride;

> +    uint8_t *dest_y  = t->frame.data[0] + (mb_y * 16* linesize            ) + mb_x * 16;
> +    uint8_t *dest_cb = t->frame.data[1] + (mb_y * 8 * t->frame.linesize[1]) + mb_x * 8;
> +    uint8_t *dest_cr = t->frame.data[2] + (mb_y * 8 * t->frame.linesize[2]) + mb_x * 8;

Not sure if it makes things uglier, but you could factor out the * 8 / * 16 here.

> +    int ref_linesize = t->last_frame.linesize[0];
> +    uint8_t *ref_y  = t->last_frame.data[0] + ((mb_y * 16+ mv_y)    * ref_linesize             ) + mb_x * 16 + mv_x;
> +    uint8_t *ref_cb = t->last_frame.data[1] + ((mb_y * 8 + (mv_y/2))* t->last_frame.linesize[1]) + mb_x * 8  + (mv_x/2);
> +    uint8_t *ref_cr = t->last_frame.data[2] + ((mb_y * 8 + (mv_y/2))* t->last_frame.linesize[2]) + mb_x * 8  + (mv_x/2);

Here it probably isn't such a good idea though.

> +    case 0: mad_comp(dest_y                 , linesize,ref_y                 , ref_linesize, add); break;
> +    case 1: mad_comp(dest_y              + 8, linesize,ref_y              + 8, ref_linesize, add); break;
> +    case 2: mad_comp(dest_y + 8*linesize    , linesize,ref_y + 8*linesize    , ref_linesize, add); break;
> +    case 3: mad_comp(dest_y + 8*linesize + 8, linesize,ref_y + 8*linesize + 8, ref_linesize, add); break;

If it doesn't affect performance too much this could be
case 0:
case 1:
case 2:
case 3:
offset = ((j & 2) << 2) * linesize + ((j & 1) << 3);
mad_comp(dest_y + offset, linesize, ref_y + offset, ref_linesize, add); break;

except that I think your code is wrong and you should be using
ref_linesize for the ref_y offset.
Still, I think that any code that has more than one mad_comp for this is
bloaty.

> +static inline void mad_idct_put(MadContext *t, DCTELEM *block, int mb_x, int mb_y, int i){
> +    MpegEncContext *s = &t->s;
> +    int linesize= t->frame.linesize[0];
> +    uint8_t *dest_y  = t->frame.data[0] + (mb_y * 16* linesize            ) + mb_x * 16;
> +    uint8_t *dest_cb = t->frame.data[1] + (mb_y * 8 * t->frame.linesize[1]) + mb_x * 8;
> +    uint8_t *dest_cr = t->frame.data[2] + (mb_y * 8 * t->frame.linesize[2]) + mb_x * 8;
> +
> +    switch(i) {
> +    case 0: s->dsp.idct_put(dest_y                 , linesize, block); break;
> +    case 1: s->dsp.idct_put(dest_y              + 8, linesize, block); break;
> +    case 2: s->dsp.idct_put(dest_y + 8*linesize    , linesize, block); break;
> +    case 3: s->dsp.idct_put(dest_y + 8*linesize + 8, linesize, block); break;
> +    case 4: if(!(s->avctx->flags&CODEC_FLAG_GRAY))
> +                s->dsp.idct_put(dest_cb, t->frame.linesize[1], block);
> +            break;
> +    case 5: if(!(s->avctx->flags&CODEC_FLAG_GRAY))
> +                s->dsp.idct_put(dest_cr, t->frame.linesize[2], block);
> +            break;
> +    }

On thinking again about it, my feeling is that this is just the wrong
way round.
Calculating dst and linesize should be in the switch, not the function
calls. A stupid compiler might e.g. calculate a dest_cb that is never used.
The only issue is with the CODEC_FLAG_GRAY, though you could still do
case 4:
  if (s->avctx->flags&CODEC_FLAG_GRAY) return;
  dest = t->frame.data[1] + (mb_y * 8 * t->frame.linesize[1]) + mb_x * 8;
  linesize = t->frame.linesize[1];

Though I really don't like much having that check in an inner loop...

> +        OPEN_READER(re, &s->gb);
> +        /* now quantify & encode AC coefficients */
> +        for(;;) {
> +            UPDATE_CACHE(re, &s->gb);

I know a lot of codecs use it, but is using this extremely lowlevel API
really the best way to do? Seems to me like it will result in a lot of
buggy an badly maintainable code.

> +                /* escape */
> +                level = SHOW_SBITS(re, &s->gb, 10); SKIP_BITS(re, &s->gb, 10);

E.g. those are 20 bits in a row, I think 17 is the allowed maximum, so
you'd need an UPDATE_CACHE in-between?

> +    avctx->time_base.num = AV_RL16(&buf[6]);
> +    avctx->time_base.den = 1000;

Hm, shouldn't you use av_reduce on those?

> +    if (s->avctx->width!=s->width || s->avctx->height!=s->height) {
> +        avcodec_set_dimensions(s->avctx, s->width, s->height);

avcodec_check_dimensions?

> +    t->frame.buffer_hints = FF_BUFFER_HINTS_VALID;

I see other codecs have it, too, but is there a point in only
specifying FF_BUFFER_HINTS_VALID?

> +    init_get_bits(&s->gb, t->bitstream_buf, 8*(buf_end-buf));

Hm... are you actually checking for the buffer size anywhere where you
read from it?

> +    if (chunk_type!=MADe_TAG)
> +        FFSWAP(AVFrame, t->frame, t->last_frame);

Hm? You are using last_frame but set neither FF_BUFFER_HINTS_PRESERVE
nor FF_BUFFER_HINTS_READABLE?