[FFmpeg-devel] [PATCH] mpeg2: fix block_last_index when mismatch control modifies last coeff

Tue Jun 22 21:47:20 CEST 2010

On Mon, Jun 21, 2010 at 09:08:24PM -0700, Jason Garrett-Glaser wrote:
> On Mon, Jun 21, 2010 at 8:17 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> > On Mon, Jun 21, 2010 at 03:30:32PM -0700, Jason Garrett-Glaser wrote:
> > [...]
> >> I'm trying to merge parts of a local changeset I have that makes the
> >> FLV decoder 30-40% faster overall (many parts may apply to mpegvideo
> >> overall). ?Some of these parts are unmergable, but others are quite
> >> mergable.
> >
> > please merge what is mergeable, this is greatly appreciated work!
> > also iam interrested in the unmergable changes, why are they unmergable?
> > are they public somewhere? This thread shows nicely how talking about
> > code that only some have seen can lead to confusion and flames when
> > discussing it ....
> 
> Here's the changeset.  The purpose of this was to get realtime,
> full-screen video playback on the iPad at 30fps.  The iPad has an
> extraordinarily slow display driver, which is synchronous (you can't
> decode video while calling display on a texture) and takes up to
> ~8-9ms per frame.  Combined with CELT audio, this leaves about ~22ms
> per 1024x768 frame for video decoding -- a massive challenge on a 1Ghz
> Cortex chip, especially in very high motion at bitrates like ~6mbps.
> 
> I intentionally tore up the entire MPEG decoder with the intention
> that only FLV work.  Accordingly, keep in mind that this is a GIGANTIC
> UGLY HACK and you'd be insane to hold me accountable for utterly
> insanely ugly any of this is.  But some portion of the changes may be
> mergable.
> 
>  A short summary (not complete, I don't remember everything):
> 
> 1.  Use idct_dc whenever possible, for both inter and intra.  Add a
> NEON idct_dc function written by Mans.
> 2.  Move dequantization into entropy decoding, to avoid costly calls
> to the SIMD dequant function.
> 3.  Inline the flv escape code decoder -- gave a huge benefit.
> 4.  Inline the h263 block decode.
> 5.  Remove every single last case everywhere that isn't relevant to
> FLV decoding (a huge amount of this gain could be gotten via
> templatization, IMO).  Yes, some of the removals are utterly pointless
> and was just me deleting code to make my editing space smaller.
> 6.  Eliminate the mv caching code: do it all in bitstream decoding,
> even for 16x16 blocks.

> 7.  Change the mv cache to 8-bit (would break everything with unrestricted mvs).

You could just introduce a mvtype typedef and if no codecs needing 16bit
where enabled then it could be switched to 8bit by an ifdef.
In the future this could become used with templating

> 8.  Use write-combining in some places.
> 9.  Inline the values from the h263 table directly into the code --
> e.g. 102 instead of rl->n, and make the tables that aren't
> runtime-generated static const in the file to avoid pointer
> dereferences.  Generally optimize and clean up the decode_block
> function (breaking everything non-flv obviously).
> 
> Overall, this basically eliminated all overhead outside of residual
> decoding, idct, and mc.  MC was a tiny part of overall decode, and
> residual decoding was probably 30-40%+ faster.  IDCT got a ton faster
> with the addition of the idct_dc, which triggers in a shockingly large
> percentage of total cases.

the shocking effectivity of the dc idct is likely due to the shocking low
quality of flvs ;)
8x8 dc only frames would probably gain flv some rd ...
(bigger transform blocks than 8x8 of course would do better)

anyway, this is hugely nice work and i think most of it is actually easy
to merge.
If needed i would be willing to help with that, just send me a mail with
a hunk/diff that you dont want to merge yourself. And ill try to look into
it, that is except the templatization which i dont volunteer for atm due
to lack of sufficent time/motivation.

further comments below ...

> 
> It's, amazingly, still bitexact, as far as I measured.  But of course
> it breaks everything else.
> 
> Dark Shikari

>  arm/mpegvideo_arm.c    |    9 
>  arm/simple_idct_neon.S |   60 +++
>  avcodec.h              |    2 
>  error_resilience.c     |    8 
>  flvdec.c               |   11 
>  h263.c                 |   69 ---
>  h263dec.c              |  111 -----
>  h264.h                 |   16 
>  ituh263dec.c           |  959 +++++++------------------------------------------
>  mpegvideo.c            |  329 ++++------------
>  mpegvideo.h            |   38 +
>  mpegvideo_common.h     |  723 +++---------------------------------
>  12 files changed, 400 insertions(+), 1935 deletions(-)
> a9dd7b4667634ec25f6add24b8d16ea214ca0230  destroy_mpeg_decoder5.diff
> Index: libavcodec/mpegvideo_common.h
> ===================================================================
> --- libavcodec/mpegvideo_common.h	(revision 23459)
> +++ libavcodec/mpegvideo_common.h	(working copy)
[...]
>  static inline int hpel_motion(MpegEncContext *s,
>                                    uint8_t *dest, uint8_t *src,
>                                    int field_based, int field_select,
> @@ -245,65 +103,34 @@
>                   int motion_x, int motion_y, int h, int is_mpeg12, int mb_y)
>  {
>      uint8_t *ptr_y, *ptr_cb, *ptr_cr;
> -    int dxy, uvdxy, mx, my, src_x, src_y,
> -        uvsrc_x, uvsrc_y, v_edge_pos, uvlinesize, linesize;
> +    int dxy, uvdxy, src_x, src_y, uvsrc_x, uvsrc_y, v_edge_pos, uvlinesize, linesize;
>  
> -#if 0
> -if(s->quarter_sample)
> -{
> -    motion_x>>=1;
> -    motion_y>>=1;
> -}
> -#endif

> +    linesize   = s->current_picture.linesize[0];
> +    uvlinesize = s->current_picture.linesize[1];
> +    
> +    if (s->mb_skipped) {
> +        src_x = s->mb_x<<4;
> +        src_y = mb_y<<4;
> +        uvsrc_x = src_x>>1;
> +        uvsrc_y = src_y>>1;
> +        ptr_y  = ref_picture[0] + src_y * linesize + src_x;
> +        ptr_cb = ref_picture[1] + uvsrc_y * uvlinesize + uvsrc_x;
> +        ptr_cr = ref_picture[2] + uvsrc_y * uvlinesize + uvsrc_x;
> +        pix_op[0][0](dest_y, ptr_y, linesize, h);
> +        pix_op[1][0](dest_cb, ptr_cb, uvlinesize, h >> 1);
> +        pix_op[1][0](dest_cr, ptr_cr, uvlinesize, h >> 1);
> +        return;
> +    }

nice idea 
and ok, with a && !field_based && (!CONFIG_GRAY || !(s->flags&CODEC_FLAG_GRAY))
and if it passes reg tests

[...]
> @@ -580,11 +200,11 @@
>  
>      src_x = s->mb_x * 8 + mx;
>      src_y = s->mb_y * 8 + my;
> -    src_x = av_clip(src_x, -8, s->width/2);
> -    if (src_x == s->width/2)
> +    src_x = av_clip(src_x, -8, (s->width >> 1));
> +    if (src_x == (s->width >> 1))
>          dxy &= ~1;
> -    src_y = av_clip(src_y, -8, s->height/2);
> -    if (src_y == s->height/2)
> +    src_y = av_clip(src_y, -8, (s->height >> 1));
> +    if (src_y == (s->height >> 1))
>          dxy &= ~2;
>  
>      offset = (src_y * (s->uvlinesize)) + src_x;

ok

> @@ -639,246 +259,41 @@
>                                uint8_t *dest_y, uint8_t *dest_cb,
>                                uint8_t *dest_cr, int dir,
>                                uint8_t **ref_picture,
> -                              op_pixels_func (*pix_op)[4],
> -                              qpel_mc_func (*qpix_op)[16], int is_mpeg12)
> +                              op_pixels_func (*pix_op)[4])
>  {
> -    int dxy, mx, my, src_x, src_y, motion_x, motion_y;
> +    int mx, my;
>      int mb_x, mb_y, i;
> -    uint8_t *ptr, *dest;
>  
>      mb_x = s->mb_x;
>      mb_y = s->mb_y;
>  
>      prefetch_motion(s, ref_picture, dir);
>  
> -    if(!is_mpeg12 && s->obmc && s->pict_type != FF_B_TYPE){
> -        int16_t mv_cache[4][4][2];
> -        const int xy= s->mb_x + s->mb_y*s->mb_stride;
> -        const int mot_stride= s->b8_stride;
> -        const int mot_xy= mb_x*2 + mb_y*2*mot_stride;
> -
> -        assert(!s->mb_skipped);
> -
> -        memcpy(mv_cache[1][1], s->current_picture.motion_val[0][mot_xy           ], sizeof(int16_t)*4);
> -        memcpy(mv_cache[2][1], s->current_picture.motion_val[0][mot_xy+mot_stride], sizeof(int16_t)*4);
> -        memcpy(mv_cache[3][1], s->current_picture.motion_val[0][mot_xy+mot_stride], sizeof(int16_t)*4);
> -
> -        if(mb_y==0 || IS_INTRA(s->current_picture.mb_type[xy-s->mb_stride])){
> -            memcpy(mv_cache[0][1], mv_cache[1][1], sizeof(int16_t)*4);
> -        }else{
> -            memcpy(mv_cache[0][1], s->current_picture.motion_val[0][mot_xy-mot_stride], sizeof(int16_t)*4);
> -        }
> -
> -        if(mb_x==0 || IS_INTRA(s->current_picture.mb_type[xy-1])){
> -            *(int32_t*)mv_cache[1][0]= *(int32_t*)mv_cache[1][1];
> -            *(int32_t*)mv_cache[2][0]= *(int32_t*)mv_cache[2][1];
> -        }else{
> -            *(int32_t*)mv_cache[1][0]= *(int32_t*)s->current_picture.motion_val[0][mot_xy-1];
> -            *(int32_t*)mv_cache[2][0]= *(int32_t*)s->current_picture.motion_val[0][mot_xy-1+mot_stride];
> -        }
> -
> -        if(mb_x+1>=s->mb_width || IS_INTRA(s->current_picture.mb_type[xy+1])){
> -            *(int32_t*)mv_cache[1][3]= *(int32_t*)mv_cache[1][2];
> -            *(int32_t*)mv_cache[2][3]= *(int32_t*)mv_cache[2][2];
> -        }else{
> -            *(int32_t*)mv_cache[1][3]= *(int32_t*)s->current_picture.motion_val[0][mot_xy+2];
> -            *(int32_t*)mv_cache[2][3]= *(int32_t*)s->current_picture.motion_val[0][mot_xy+2+mot_stride];
> -        }
> -
> -        mx = 0;
> -        my = 0;
> -        for(i=0;i<4;i++) {
> -            const int x= (i&1)+1;
> -            const int y= (i>>1)+1;
> -            int16_t mv[5][2]= {
> -                {mv_cache[y][x  ][0], mv_cache[y][x  ][1]},
> -                {mv_cache[y-1][x][0], mv_cache[y-1][x][1]},
> -                {mv_cache[y][x-1][0], mv_cache[y][x-1][1]},
> -                {mv_cache[y][x+1][0], mv_cache[y][x+1][1]},
> -                {mv_cache[y+1][x][0], mv_cache[y+1][x][1]}};
> -            //FIXME cleanup
> -            obmc_motion(s, dest_y + ((i & 1) * 8) + (i >> 1) * 8 * s->linesize,
> -                        ref_picture[0],
> -                        mb_x * 16 + (i & 1) * 8, mb_y * 16 + (i >>1) * 8,
> -                        pix_op[1],
> -                        mv);
> -
> -            mx += mv[0][0];
> -            my += mv[0][1];
> -        }
> -        if(!CONFIG_GRAY || !(s->flags&CODEC_FLAG_GRAY))
> -            chroma_4mv_motion(s, dest_cb, dest_cr, ref_picture, pix_op[1], mx, my);
> -
> -        return;
> -    }
> -
>      switch(s->mv_type) {
>      case MV_TYPE_16X16:
> -        if(s->mcsel){
> -            if(s->real_sprite_warping_points==1){
> -                gmc1_motion(s, dest_y, dest_cb, dest_cr,
> -                            ref_picture);
> -            }else{
> -                gmc_motion(s, dest_y, dest_cb, dest_cr,
> -                            ref_picture);
> -            }
> -        }else if(!is_mpeg12 && s->quarter_sample){
> -            qpel_motion(s, dest_y, dest_cb, dest_cr,
> -                        0, 0, 0,
> -                        ref_picture, pix_op, qpix_op,
> -                        s->mv[dir][0][0], s->mv[dir][0][1], 16);
> -        }else if(!is_mpeg12 && (CONFIG_WMV2_DECODER || CONFIG_WMV2_ENCODER) && s->mspel){
> -            ff_mspel_motion(s, dest_y, dest_cb, dest_cr,
> -                        ref_picture, pix_op,
> -                        s->mv[dir][0][0], s->mv[dir][0][1], 16);
> -        }else
> -        {
> -            mpeg_motion(s, dest_y, dest_cb, dest_cr,
> -                        0, 0, 0,
> -                        ref_picture, pix_op,
> -                        s->mv[dir][0][0], s->mv[dir][0][1], 16, mb_y);
> -        }
> +        mpeg_motion(s, dest_y, dest_cb, dest_cr,
> +                    0, 0, 0,
> +                    ref_picture, pix_op,
> +                    s->mv[dir][0][0], s->mv[dir][0][1], 16, mb_y);
>          break;
>      case MV_TYPE_8X8:
> -    if (!is_mpeg12) {
>          mx = 0;
>          my = 0;
> -        if(s->quarter_sample){
> -            for(i=0;i<4;i++) {
> -                motion_x = s->mv[dir][i][0];
> -                motion_y = s->mv[dir][i][1];
> +        for(i=0;i<4;i++) {
> +            hpel_motion(s, dest_y + ((i & 1) * 8) + (i >> 1) * 8 * s->linesize,
> +                        ref_picture[0], 0, 0,
> +                        mb_x * 16 + (i & 1) * 8, mb_y * 16 + (i >>1) * 8,
> +                        s->width, s->height, s->linesize,
> +                        s->h_edge_pos, s->v_edge_pos,
> +                        8, 8, pix_op[1],
> +                        s->mv[dir][i][0], s->mv[dir][i][1]);
>  
> -                dxy = ((motion_y & 3) << 2) | (motion_x & 3);
> -                src_x = mb_x * 16 + (motion_x >> 2) + (i & 1) * 8;
> -                src_y = mb_y * 16 + (motion_y >> 2) + (i >>1) * 8;
> -
> -                /* WARNING: do no forget half pels */
> -                src_x = av_clip(src_x, -16, s->width);
> -                if (src_x == s->width)
> -                    dxy &= ~3;
> -                src_y = av_clip(src_y, -16, s->height);
> -                if (src_y == s->height)
> -                    dxy &= ~12;
> -
> -                ptr = ref_picture[0] + (src_y * s->linesize) + (src_x);
> -                if(s->flags&CODEC_FLAG_EMU_EDGE){
> -                    if(   (unsigned)src_x > s->h_edge_pos - (motion_x&3) - 8
> -                       || (unsigned)src_y > s->v_edge_pos - (motion_y&3) - 8 ){
> -                        ff_emulated_edge_mc(s->edge_emu_buffer, ptr,
> -                                            s->linesize, 9, 9,
> -                                            src_x, src_y,
> -                                            s->h_edge_pos, s->v_edge_pos);
> -                        ptr= s->edge_emu_buffer;
> -                    }
> -                }
> -                dest = dest_y + ((i & 1) * 8) + (i >> 1) * 8 * s->linesize;
> -                qpix_op[1][dxy](dest, ptr, s->linesize);
> -
> -                mx += s->mv[dir][i][0]/2;
> -                my += s->mv[dir][i][1]/2;
> -            }
> -        }else{
> -            for(i=0;i<4;i++) {
> -                hpel_motion(s, dest_y + ((i & 1) * 8) + (i >> 1) * 8 * s->linesize,
> -                            ref_picture[0], 0, 0,
> -                            mb_x * 16 + (i & 1) * 8, mb_y * 16 + (i >>1) * 8,
> -                            s->width, s->height, s->linesize,
> -                            s->h_edge_pos, s->v_edge_pos,
> -                            8, 8, pix_op[1],
> -                            s->mv[dir][i][0], s->mv[dir][i][1]);
> -
> -                mx += s->mv[dir][i][0];
> -                my += s->mv[dir][i][1];
> -            }
> +            mx += s->mv[dir][i][0];
> +            my += s->mv[dir][i][1];
>          }
>  
> -        if(!CONFIG_GRAY || !(s->flags&CODEC_FLAG_GRAY))
> -            chroma_4mv_motion(s, dest_cb, dest_cr, ref_picture, pix_op[1], mx, my);
> -    }
> +        chroma_4mv_motion(s, dest_cb, dest_cr, ref_picture, pix_op[1], mx, my);
>          break;

a patch without the reindentions would have been more readable

[...]

> @@ -1752,12 +1752,51 @@
>      }
>  }
>  
> +static inline void idct_dc_add(uint8_t *dst, int line_size, int dc)
> +{
> +    int x, y;
> +    uint8_t *cm = ff_cropTbl + MAX_NEG_CROP;

> +    dc = (16383 * dc + 1024) >> 11;
> +    dc = (16383 * (dc + 32)) >> 20;

dc= (dc + (i<0) + 3)>>3;
or
dc= (dc*2047 + 8192)>>14;

> +    for (y = 0; y < 8; y++, dst += line_size) {
> +        for (x = 0; x < 8; x++) {
> +            dst[x] = cm[dst[x] + dc];
> +        }
> +    }

cm += dc;
can be done outside the loop

[...]
> Index: libavcodec/h264.h
> ===================================================================
> --- libavcodec/h264.h	(revision 23459)
> +++ libavcodec/h264.h	(working copy)
> @@ -735,22 +735,6 @@
>   1+5*8, 2+5*8,
>  };
>  
> -static av_always_inline uint32_t pack16to32(int a, int b){
> -#if HAVE_BIGENDIAN
> -   return (b&0xFFFF) + (a<<16);
> -#else
> -   return (a&0xFFFF) + (b<<16);
> -#endif
> -}
> -
> -static av_always_inline uint16_t pack8to16(int a, int b){
> -#if HAVE_BIGENDIAN
> -   return (b&0xFF) + (a<<8);
> -#else
> -   return (a&0xFF) + (b<<8);
> -#endif
> -}
> -
>  /**
>   * gets the chroma qp.
>   */

moving these or anything else to a common place is ok of course

[...]
>  int h263_decode_motion(MpegEncContext * s, int pred, int f_code)
>  {
> -    int code, val, sign, shift, l;
> -    code = get_vlc2(&s->gb, mv_vlc.table, MV_VLC_BITS, 2);
> +    return 0;
> +}
>  
> -    if (code == 0)
> +static int h263_decode_motion2(MpegEncContext * s, int pred)
> +{
> +    int l;
> +    int val = get_vlc2(&s->gb, mv_vlc.table, MV_VLC_BITS, 2);
> +
> +    if (val == 0)
>          return pred;
> -    if (code < 0)
> +    if (val < 0)
>          return 0xffff;
>  
> -    sign = get_bits1(&s->gb);
> -    shift = f_code - 1;
> -    val = code;
> -    if (shift) {
> -        val = (val - 1) << shift;
> -        val |= get_bits(&s->gb, shift);
> -        val++;
> -    }
> -    if (sign)
> +    if (get_bits1(&s->gb))
>          val = -val;
>      val += pred;
>  
>      /* modulo decoding */
> -    if (!s->h263_long_vectors) {
> -        l = INT_BIT - 5 - f_code;
> -        val = (val<<l)>>l;
> -    } else {
> -        /* horrible h263 long vector mode */
> -        if (pred < -31 && val < -63)
> -            val += 64;
> -        if (pred > 32 && val > 63)
> -            val -= 64;
> -
> -    }
> +    l = INT_BIT - 6;
> +    val = (val<<l)>>l;
>      return val;
>  }

ok
possible function name  h263_decode_fcode1_motion()

[...]

>                  s->mv_dir = MV_DIR_FORWARD;
>                  s->mv_type = MV_TYPE_16X16;
>                  s->current_picture.mb_type[xy]= MB_TYPE_SKIP | MB_TYPE_16x16 | MB_TYPE_L0;
> -                s->mv[0][0][0] = 0;
> -                s->mv[0][0][1] = 0;
[...]
> +                M16( s->mv[0][0] ) = 0;
[...]
>                  goto end;
>              }
>              cbpc = get_vlc2(&s->gb, ff_h263_inter_MCBPC_vlc.table, INTER_MCBPC_VLC_BITS, 2);

ok

> @@ -630,18 +347,13 @@
>              }
>          }while(cbpc == 20);
>  
> -        s->dsp.clear_blocks(s->block[0]);
> -
>          dquant = cbpc & 8;
>          s->mb_intra = ((cbpc & 4) != 0);
>          if (s->mb_intra) goto intra;
>  
> -        if(s->pb_frame && get_bits1(&s->gb))
> -            pb_mv_count = h263_get_modb(&s->gb, s->pb_frame, &cbpb);
>          cbpy = get_vlc2(&s->gb, ff_h263_cbpy_vlc.table, CBPY_VLC_BITS, 1);
>  
> -        if(s->alt_inter_vlc==0 || (cbpc & 3)!=3)
> -            cbpy ^= 0xF;
> +        cbpy ^= 0xF;
>  
>          cbp = (cbpc & 3) | (cbpy << 2);
>          if (dquant) {
> @@ -650,144 +362,42 @@
>  
>          s->mv_dir = MV_DIR_FORWARD;
>          if ((cbpc & 16) == 0) {
> +            uint32_t mv;
>              s->current_picture.mb_type[xy]= MB_TYPE_16x16 | MB_TYPE_L0;
>              /* 16x16 motion prediction */
>              s->mv_type = MV_TYPE_16X16;
> -            h263_pred_motion(s, 0, 0, &pred_x, &pred_y);
> -            if (s->umvplus)
> -               mx = h263p_decode_umotion(s, pred_x);
> -            else
> -               mx = h263_decode_motion(s, pred_x, 1);
> -
> +            h263_pred_motion2(s, 0, 0, &pred_x, &pred_y);
> +            mx = h263_decode_motion2(s, pred_x);
>              if (mx >= 0xffff)
>                  return -1;
>  
> -            if (s->umvplus)
> -               my = h263p_decode_umotion(s, pred_y);
> -            else
> -               my = h263_decode_motion(s, pred_y, 1);
> +            my = h263_decode_motion2(s, pred_y);
>  
>              if (my >= 0xffff)
>                  return -1;
> -            s->mv[0][0][0] = mx;
> -            s->mv[0][0][1] = my;
> -
> -            if (s->umvplus && (mx - pred_x) == 1 && (my - pred_y) == 1)
> -               skip_bits1(&s->gb); /* Bit stuffing to prevent PSC */
> +            mv = pack8to16(mx, my);
> +            M16( s->mv[0][0] ) = mv;
> +            mv *= 0x00010001U;
> +            M32( s->current_picture.motion_val[0][mv_xy] ) = mv;
> +            M32( s->current_picture.motion_val[0][mv_xy + wrap] ) = mv;
>          } else {
>              s->current_picture.mb_type[xy]= MB_TYPE_8x8 | MB_TYPE_L0;
>              s->mv_type = MV_TYPE_8X8;
>              for(i=0;i<4;i++) {
> -                mot_val = h263_pred_motion(s, i, 0, &pred_x, &pred_y);
> -                if (s->umvplus)
> -                  mx = h263p_decode_umotion(s, pred_x);
> -                else
> -                  mx = h263_decode_motion(s, pred_x, 1);
> +                mot_val = h263_pred_motion2(s, i, 0, &pred_x, &pred_y);
> +                mx = h263_decode_motion2(s, pred_x);
>                  if (mx >= 0xffff)
>                      return -1;
>  
> -                if (s->umvplus)
> -                  my = h263p_decode_umotion(s, pred_y);
> -                else
> -                  my = h263_decode_motion(s, pred_y, 1);
> +                my = h263_decode_motion2(s, pred_y);
>                  if (my >= 0xffff)
>                      return -1;
>                  s->mv[0][i][0] = mx;
>                  s->mv[0][i][1] = my;
> -                if (s->umvplus && (mx - pred_x) == 1 && (my - pred_y) == 1)
> -                  skip_bits1(&s->gb); /* Bit stuffing to prevent PSC */
>                  mot_val[0] = mx;
>                  mot_val[1] = my;
>              }
>          }
> -    } else if(s->pict_type==FF_B_TYPE) {
> -        int mb_type;
> -        const int stride= s->b8_stride;
> -        int16_t *mot_val0 = s->current_picture.motion_val[0][ 2*(s->mb_x + s->mb_y*stride) ];
> -        int16_t *mot_val1 = s->current_picture.motion_val[1][ 2*(s->mb_x + s->mb_y*stride) ];
> -//        const int mv_xy= s->mb_x + 1 + s->mb_y * s->mb_stride;
> -
> -        //FIXME ugly
> -        mot_val0[0       ]= mot_val0[2       ]= mot_val0[0+2*stride]= mot_val0[2+2*stride]=
> -        mot_val0[1       ]= mot_val0[3       ]= mot_val0[1+2*stride]= mot_val0[3+2*stride]=
> -        mot_val1[0       ]= mot_val1[2       ]= mot_val1[0+2*stride]= mot_val1[2+2*stride]=
> -        mot_val1[1       ]= mot_val1[3       ]= mot_val1[1+2*stride]= mot_val1[3+2*stride]= 0;
> -
> -        do{
> -            mb_type= get_vlc2(&s->gb, h263_mbtype_b_vlc.table, H263_MBTYPE_B_VLC_BITS, 2);
> -            if (mb_type < 0){
> -                av_log(s->avctx, AV_LOG_ERROR, "b mb_type damaged at %d %d\n", s->mb_x, s->mb_y);
> -                return -1;
> -            }
> -
> -            mb_type= h263_mb_type_b_map[ mb_type ];
> -        }while(!mb_type);
> -
> -        s->mb_intra = IS_INTRA(mb_type);
> -        if(HAS_CBP(mb_type)){
> -            s->dsp.clear_blocks(s->block[0]);
> -            cbpc = get_vlc2(&s->gb, cbpc_b_vlc.table, CBPC_B_VLC_BITS, 1);
> -            if(s->mb_intra){
> -                dquant = IS_QUANT(mb_type);
> -                goto intra;
> -            }
> -
> -            cbpy = get_vlc2(&s->gb, ff_h263_cbpy_vlc.table, CBPY_VLC_BITS, 1);
> -
> -            if (cbpy < 0){
> -                av_log(s->avctx, AV_LOG_ERROR, "b cbpy damaged at %d %d\n", s->mb_x, s->mb_y);
> -                return -1;
> -            }
> -
> -            if(s->alt_inter_vlc==0 || (cbpc & 3)!=3)
> -                cbpy ^= 0xF;
> -
> -            cbp = (cbpc & 3) | (cbpy << 2);
> -        }else
> -            cbp=0;
> -
> -        assert(!s->mb_intra);
> -
> -        if(IS_QUANT(mb_type)){
> -            h263_decode_dquant(s);
> -        }
> -
> -        if(IS_DIRECT(mb_type)){
> -            s->mv_dir = MV_DIR_FORWARD | MV_DIR_BACKWARD | MV_DIRECT;
> -            mb_type |= ff_mpeg4_set_direct_mv(s, 0, 0);
> -        }else{
> -            s->mv_dir = 0;
> -            s->mv_type= MV_TYPE_16X16;
> -//FIXME UMV
> -
> -            if(USES_LIST(mb_type, 0)){
> -                int16_t *mot_val= h263_pred_motion(s, 0, 0, &mx, &my);
> -                s->mv_dir = MV_DIR_FORWARD;
> -
> -                mx = h263_decode_motion(s, mx, 1);
> -                my = h263_decode_motion(s, my, 1);
> -
> -                s->mv[0][0][0] = mx;
> -                s->mv[0][0][1] = my;
> -                mot_val[0       ]= mot_val[2       ]= mot_val[0+2*stride]= mot_val[2+2*stride]= mx;
> -                mot_val[1       ]= mot_val[3       ]= mot_val[1+2*stride]= mot_val[3+2*stride]= my;
> -            }
> -
> -            if(USES_LIST(mb_type, 1)){
> -                int16_t *mot_val= h263_pred_motion(s, 0, 1, &mx, &my);
> -                s->mv_dir |= MV_DIR_BACKWARD;
> -
> -                mx = h263_decode_motion(s, mx, 1);
> -                my = h263_decode_motion(s, my, 1);
> -
> -                s->mv[1][0][0] = mx;
> -                s->mv[1][0][1] = my;
> -                mot_val[0       ]= mot_val[2       ]= mot_val[0+2*stride]= mot_val[2+2*stride]= mx;
> -                mot_val[1       ]= mot_val[3       ]= mot_val[1+2*stride]= mot_val[3+2*stride]= my;
> -            }
> -        }
> -
> -        s->current_picture.mb_type[xy]= mb_type;
>      } else { /* I-Frame */
>          do{
>              cbpc = get_vlc2(&s->gb, ff_h263_intra_MCBPC_vlc.table, INTRA_MCBPC_VLC_BITS, 2);
> @@ -797,24 +407,11 @@
>              }
>          }while(cbpc == 8);
>  
> -        s->dsp.clear_blocks(s->block[0]);
> -
>          dquant = cbpc & 4;
>          s->mb_intra = 1;
>  intra:
>          s->current_picture.mb_type[xy]= MB_TYPE_INTRA;
> -        if (s->h263_aic) {
> -            s->ac_pred = get_bits1(&s->gb);
> -            if(s->ac_pred){
> -                s->current_picture.mb_type[xy]= MB_TYPE_INTRA | MB_TYPE_ACPRED;
>  
> -                s->h263_aic_dir = get_bits1(&s->gb);
> -            }
> -        }else
> -            s->ac_pred = 0;
> -
> -        if(s->pb_frame && get_bits1(&s->gb))
> -            pb_mv_count = h263_get_modb(&s->gb, s->pb_frame, &cbpb);
>          cbpy = get_vlc2(&s->gb, ff_h263_cbpy_vlc.table, CBPY_VLC_BITS, 1);
>          if(cbpy<0){
>              av_log(s->avctx, AV_LOG_ERROR, "I cbpy damaged at %d %d\n", s->mb_x, s->mb_y);
> @@ -824,30 +421,21 @@
>          if (dquant) {
>              h263_decode_dquant(s);
>          }
> +        M32( s->current_picture.motion_val[0][mv_xy] ) = 0;
> +        M32( s->current_picture.motion_val[0][mv_xy + wrap] ) = 0;
>  
> -        pb_mv_count += !!s->pb_frame;
>      }
>  
> -    while(pb_mv_count--){
> -        h263_decode_motion(s, 0, 1);
> -        h263_decode_motion(s, 0, 1);
> -    }
> -
> +    qscale = s->qscale;
> +    chroma_qscale = s->chroma_qscale;
>      /* decode each block */
>      for (i = 0; i < 6; i++) {
> -        if (h263_decode_block(s, block[i], i, cbp&32) < 0)
> +        if (h263_decode_block(s, block[i], i, cbp&32, i < 4 ? qscale : chroma_qscale) < 0)
>              return -1;
>          cbp+=cbp;
>      }
>  
> -    if(s->pb_frame && h263_skip_b_part(s, cbpb) < 0)
> -        return -1;
> -    if(s->obmc && !s->mb_intra){
> -        if(s->pict_type == FF_P_TYPE && s->mb_x+1<s->mb_width && s->mb_num_left != 1)
> -            preview_obmc(s);
> -    }
>  end:
> -
>          /* per-MB end of slice check */
>      {
>          int v= show_bits(&s->gb, 16);

we could be duplicating + optimizing + simplifying the decode_mb() in flvdec.c
if you think this is making a meassuerable speed difference.
though i have my doubts that this makes that much difference, i just wanted to
say iam not against it if you want to do it

the same applies to the block/coeff decode though that will make a speed
difference of course, we also have duplicated block/coeff decodes in mpeg1/2
so thats quite in line with existing optimizations

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Breaking DRM is a little like attempting to break through a door even
though the window is wide open and the only thing in the house is a bunch
of things you dont want and which you would get tomorrow for free anyway
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100622/60960e92/attachment.pgp>