[FFmpeg-devel] [PATCH] mpeg2: fix block_last_index when mismatch control modifies last coeff
Michael Niedermayer
michaelni
Tue Jun 22 21:47:20 CEST 2010
On Mon, Jun 21, 2010 at 09:08:24PM -0700, Jason Garrett-Glaser wrote:
> On Mon, Jun 21, 2010 at 8:17 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> > On Mon, Jun 21, 2010 at 03:30:32PM -0700, Jason Garrett-Glaser wrote:
> > [...]
> >> I'm trying to merge parts of a local changeset I have that makes the
> >> FLV decoder 30-40% faster overall (many parts may apply to mpegvideo
> >> overall). ?Some of these parts are unmergable, but others are quite
> >> mergable.
> >
> > please merge what is mergeable, this is greatly appreciated work!
> > also iam interrested in the unmergable changes, why are they unmergable?
> > are they public somewhere? This thread shows nicely how talking about
> > code that only some have seen can lead to confusion and flames when
> > discussing it ....
>
> Here's the changeset. The purpose of this was to get realtime,
> full-screen video playback on the iPad at 30fps. The iPad has an
> extraordinarily slow display driver, which is synchronous (you can't
> decode video while calling display on a texture) and takes up to
> ~8-9ms per frame. Combined with CELT audio, this leaves about ~22ms
> per 1024x768 frame for video decoding -- a massive challenge on a 1Ghz
> Cortex chip, especially in very high motion at bitrates like ~6mbps.
>
> I intentionally tore up the entire MPEG decoder with the intention
> that only FLV work. Accordingly, keep in mind that this is a GIGANTIC
> UGLY HACK and you'd be insane to hold me accountable for utterly
> insanely ugly any of this is. But some portion of the changes may be
> mergable.
>
> A short summary (not complete, I don't remember everything):
>
> 1. Use idct_dc whenever possible, for both inter and intra. Add a
> NEON idct_dc function written by Mans.
> 2. Move dequantization into entropy decoding, to avoid costly calls
> to the SIMD dequant function.
> 3. Inline the flv escape code decoder -- gave a huge benefit.
> 4. Inline the h263 block decode.
> 5. Remove every single last case everywhere that isn't relevant to
> FLV decoding (a huge amount of this gain could be gotten via
> templatization, IMO). Yes, some of the removals are utterly pointless
> and was just me deleting code to make my editing space smaller.
> 6. Eliminate the mv caching code: do it all in bitstream decoding,
> even for 16x16 blocks.
> 7. Change the mv cache to 8-bit (would break everything with unrestricted mvs).
You could just introduce a mvtype typedef and if no codecs needing 16bit
where enabled then it could be switched to 8bit by an ifdef.
In the future this could become used with templating
> 8. Use write-combining in some places.
> 9. Inline the values from the h263 table directly into the code --
> e.g. 102 instead of rl->n, and make the tables that aren't
> runtime-generated static const in the file to avoid pointer
> dereferences. Generally optimize and clean up the decode_block
> function (breaking everything non-flv obviously).
>
> Overall, this basically eliminated all overhead outside of residual
> decoding, idct, and mc. MC was a tiny part of overall decode, and
> residual decoding was probably 30-40%+ faster. IDCT got a ton faster
> with the addition of the idct_dc, which triggers in a shockingly large
> percentage of total cases.
the shocking effectivity of the dc idct is likely due to the shocking low
quality of flvs ;)
8x8 dc only frames would probably gain flv some rd ...
(bigger transform blocks than 8x8 of course would do better)
anyway, this is hugely nice work and i think most of it is actually easy
to merge.
If needed i would be willing to help with that, just send me a mail with
a hunk/diff that you dont want to merge yourself. And ill try to look into
it, that is except the templatization which i dont volunteer for atm due
to lack of sufficent time/motivation.
further comments below ...
>
> It's, amazingly, still bitexact, as far as I measured. But of course
> it breaks everything else.
>
> Dark Shikari
> arm/mpegvideo_arm.c | 9
> arm/simple_idct_neon.S | 60 +++
> avcodec.h | 2
> error_resilience.c | 8
> flvdec.c | 11
> h263.c | 69 ---
> h263dec.c | 111 -----
> h264.h | 16
> ituh263dec.c | 959 +++++++------------------------------------------
> mpegvideo.c | 329 ++++------------
> mpegvideo.h | 38 +
> mpegvideo_common.h | 723 +++---------------------------------
> 12 files changed, 400 insertions(+), 1935 deletions(-)
> a9dd7b4667634ec25f6add24b8d16ea214ca0230 destroy_mpeg_decoder5.diff
> Index: libavcodec/mpegvideo_common.h
> ===================================================================
> --- libavcodec/mpegvideo_common.h (revision 23459)
> +++ libavcodec/mpegvideo_common.h (working copy)
[...]
> static inline int hpel_motion(MpegEncContext *s,
> uint8_t *dest, uint8_t *src,
> int field_based, int field_select,
> @@ -245,65 +103,34 @@
> int motion_x, int motion_y, int h, int is_mpeg12, int mb_y)
> {
> uint8_t *ptr_y, *ptr_cb, *ptr_cr;
> - int dxy, uvdxy, mx, my, src_x, src_y,
> - uvsrc_x, uvsrc_y, v_edge_pos, uvlinesize, linesize;
> + int dxy, uvdxy, src_x, src_y, uvsrc_x, uvsrc_y, v_edge_pos, uvlinesize, linesize;
>
> -#if 0
> -if(s->quarter_sample)
> -{
> - motion_x>>=1;
> - motion_y>>=1;
> -}
> -#endif
> + linesize = s->current_picture.linesize[0];
> + uvlinesize = s->current_picture.linesize[1];
> +
> + if (s->mb_skipped) {
> + src_x = s->mb_x<<4;
> + src_y = mb_y<<4;
> + uvsrc_x = src_x>>1;
> + uvsrc_y = src_y>>1;
> + ptr_y = ref_picture[0] + src_y * linesize + src_x;
> + ptr_cb = ref_picture[1] + uvsrc_y * uvlinesize + uvsrc_x;
> + ptr_cr = ref_picture[2] + uvsrc_y * uvlinesize + uvsrc_x;
> + pix_op[0][0](dest_y, ptr_y, linesize, h);
> + pix_op[1][0](dest_cb, ptr_cb, uvlinesize, h >> 1);
> + pix_op[1][0](dest_cr, ptr_cr, uvlinesize, h >> 1);
> + return;
> + }
nice idea
and ok, with a && !field_based && (!CONFIG_GRAY || !(s->flags&CODEC_FLAG_GRAY))
and if it passes reg tests
[...]
> @@ -580,11 +200,11 @@
>
> src_x = s->mb_x * 8 + mx;
> src_y = s->mb_y * 8 + my;
> - src_x = av_clip(src_x, -8, s->width/2);
> - if (src_x == s->width/2)
> + src_x = av_clip(src_x, -8, (s->width >> 1));
> + if (src_x == (s->width >> 1))
> dxy &= ~1;
> - src_y = av_clip(src_y, -8, s->height/2);
> - if (src_y == s->height/2)
> + src_y = av_clip(src_y, -8, (s->height >> 1));
> + if (src_y == (s->height >> 1))
> dxy &= ~2;
>
> offset = (src_y * (s->uvlinesize)) + src_x;
ok
> @@ -639,246 +259,41 @@
> uint8_t *dest_y, uint8_t *dest_cb,
> uint8_t *dest_cr, int dir,
> uint8_t **ref_picture,
> - op_pixels_func (*pix_op)[4],
> - qpel_mc_func (*qpix_op)[16], int is_mpeg12)
> + op_pixels_func (*pix_op)[4])
> {
> - int dxy, mx, my, src_x, src_y, motion_x, motion_y;
> + int mx, my;
> int mb_x, mb_y, i;
> - uint8_t *ptr, *dest;
>
> mb_x = s->mb_x;
> mb_y = s->mb_y;
>
> prefetch_motion(s, ref_picture, dir);
>
> - if(!is_mpeg12 && s->obmc && s->pict_type != FF_B_TYPE){
> - int16_t mv_cache[4][4][2];
> - const int xy= s->mb_x + s->mb_y*s->mb_stride;
> - const int mot_stride= s->b8_stride;
> - const int mot_xy= mb_x*2 + mb_y*2*mot_stride;
> -
> - assert(!s->mb_skipped);
> -
> - memcpy(mv_cache[1][1], s->current_picture.motion_val[0][mot_xy ], sizeof(int16_t)*4);
> - memcpy(mv_cache[2][1], s->current_picture.motion_val[0][mot_xy+mot_stride], sizeof(int16_t)*4);
> - memcpy(mv_cache[3][1], s->current_picture.motion_val[0][mot_xy+mot_stride], sizeof(int16_t)*4);
> -
> - if(mb_y==0 || IS_INTRA(s->current_picture.mb_type[xy-s->mb_stride])){
> - memcpy(mv_cache[0][1], mv_cache[1][1], sizeof(int16_t)*4);
> - }else{
> - memcpy(mv_cache[0][1], s->current_picture.motion_val[0][mot_xy-mot_stride], sizeof(int16_t)*4);
> - }
> -
> - if(mb_x==0 || IS_INTRA(s->current_picture.mb_type[xy-1])){
> - *(int32_t*)mv_cache[1][0]= *(int32_t*)mv_cache[1][1];
> - *(int32_t*)mv_cache[2][0]= *(int32_t*)mv_cache[2][1];
> - }else{
> - *(int32_t*)mv_cache[1][0]= *(int32_t*)s->current_picture.motion_val[0][mot_xy-1];
> - *(int32_t*)mv_cache[2][0]= *(int32_t*)s->current_picture.motion_val[0][mot_xy-1+mot_stride];
> - }
> -
> - if(mb_x+1>=s->mb_width || IS_INTRA(s->current_picture.mb_type[xy+1])){
> - *(int32_t*)mv_cache[1][3]= *(int32_t*)mv_cache[1][2];
> - *(int32_t*)mv_cache[2][3]= *(int32_t*)mv_cache[2][2];
> - }else{
> - *(int32_t*)mv_cache[1][3]= *(int32_t*)s->current_picture.motion_val[0][mot_xy+2];
> - *(int32_t*)mv_cache[2][3]= *(int32_t*)s->current_picture.motion_val[0][mot_xy+2+mot_stride];
> - }
> -
> - mx = 0;
> - my = 0;
> - for(i=0;i<4;i++) {
> - const int x= (i&1)+1;
> - const int y= (i>>1)+1;
> - int16_t mv[5][2]= {
> - {mv_cache[y][x ][0], mv_cache[y][x ][1]},
> - {mv_cache[y-1][x][0], mv_cache[y-1][x][1]},
> - {mv_cache[y][x-1][0], mv_cache[y][x-1][1]},
> - {mv_cache[y][x+1][0], mv_cache[y][x+1][1]},
> - {mv_cache[y+1][x][0], mv_cache[y+1][x][1]}};
> - //FIXME cleanup
> - obmc_motion(s, dest_y + ((i & 1) * 8) + (i >> 1) * 8 * s->linesize,
> - ref_picture[0],
> - mb_x * 16 + (i & 1) * 8, mb_y * 16 + (i >>1) * 8,
> - pix_op[1],
> - mv);
> -
> - mx += mv[0][0];
> - my += mv[0][1];
> - }
> - if(!CONFIG_GRAY || !(s->flags&CODEC_FLAG_GRAY))
> - chroma_4mv_motion(s, dest_cb, dest_cr, ref_picture, pix_op[1], mx, my);
> -
> - return;
> - }
> -
> switch(s->mv_type) {
> case MV_TYPE_16X16:
> - if(s->mcsel){
> - if(s->real_sprite_warping_points==1){
> - gmc1_motion(s, dest_y, dest_cb, dest_cr,
> - ref_picture);
> - }else{
> - gmc_motion(s, dest_y, dest_cb, dest_cr,
> - ref_picture);
> - }
> - }else if(!is_mpeg12 && s->quarter_sample){
> - qpel_motion(s, dest_y, dest_cb, dest_cr,
> - 0, 0, 0,
> - ref_picture, pix_op, qpix_op,
> - s->mv[dir][0][0], s->mv[dir][0][1], 16);
> - }else if(!is_mpeg12 && (CONFIG_WMV2_DECODER || CONFIG_WMV2_ENCODER) && s->mspel){
> - ff_mspel_motion(s, dest_y, dest_cb, dest_cr,
> - ref_picture, pix_op,
> - s->mv[dir][0][0], s->mv[dir][0][1], 16);
> - }else
> - {
> - mpeg_motion(s, dest_y, dest_cb, dest_cr,
> - 0, 0, 0,
> - ref_picture, pix_op,
> - s->mv[dir][0][0], s->mv[dir][0][1], 16, mb_y);
> - }
> + mpeg_motion(s, dest_y, dest_cb, dest_cr,
> + 0, 0, 0,
> + ref_picture, pix_op,
> + s->mv[dir][0][0], s->mv[dir][0][1], 16, mb_y);
> break;
> case MV_TYPE_8X8:
> - if (!is_mpeg12) {
> mx = 0;
> my = 0;
> - if(s->quarter_sample){
> - for(i=0;i<4;i++) {
> - motion_x = s->mv[dir][i][0];
> - motion_y = s->mv[dir][i][1];
> + for(i=0;i<4;i++) {
> + hpel_motion(s, dest_y + ((i & 1) * 8) + (i >> 1) * 8 * s->linesize,
> + ref_picture[0], 0, 0,
> + mb_x * 16 + (i & 1) * 8, mb_y * 16 + (i >>1) * 8,
> + s->width, s->height, s->linesize,
> + s->h_edge_pos, s->v_edge_pos,
> + 8, 8, pix_op[1],
> + s->mv[dir][i][0], s->mv[dir][i][1]);
>
> - dxy = ((motion_y & 3) << 2) | (motion_x & 3);
> - src_x = mb_x * 16 + (motion_x >> 2) + (i & 1) * 8;
> - src_y = mb_y * 16 + (motion_y >> 2) + (i >>1) * 8;
> -
> - /* WARNING: do no forget half pels */
> - src_x = av_clip(src_x, -16, s->width);
> - if (src_x == s->width)
> - dxy &= ~3;
> - src_y = av_clip(src_y, -16, s->height);
> - if (src_y == s->height)
> - dxy &= ~12;
> -
> - ptr = ref_picture[0] + (src_y * s->linesize) + (src_x);
> - if(s->flags&CODEC_FLAG_EMU_EDGE){
> - if( (unsigned)src_x > s->h_edge_pos - (motion_x&3) - 8
> - || (unsigned)src_y > s->v_edge_pos - (motion_y&3) - 8 ){
> - ff_emulated_edge_mc(s->edge_emu_buffer, ptr,
> - s->linesize, 9, 9,
> - src_x, src_y,
> - s->h_edge_pos, s->v_edge_pos);
> - ptr= s->edge_emu_buffer;
> - }
> - }
> - dest = dest_y + ((i & 1) * 8) + (i >> 1) * 8 * s->linesize;
> - qpix_op[1][dxy](dest, ptr, s->linesize);
> -
> - mx += s->mv[dir][i][0]/2;
> - my += s->mv[dir][i][1]/2;
> - }
> - }else{
> - for(i=0;i<4;i++) {
> - hpel_motion(s, dest_y + ((i & 1) * 8) + (i >> 1) * 8 * s->linesize,
> - ref_picture[0], 0, 0,
> - mb_x * 16 + (i & 1) * 8, mb_y * 16 + (i >>1) * 8,
> - s->width, s->height, s->linesize,
> - s->h_edge_pos, s->v_edge_pos,
> - 8, 8, pix_op[1],
> - s->mv[dir][i][0], s->mv[dir][i][1]);
> -
> - mx += s->mv[dir][i][0];
> - my += s->mv[dir][i][1];
> - }
> + mx += s->mv[dir][i][0];
> + my += s->mv[dir][i][1];
> }
>
> - if(!CONFIG_GRAY || !(s->flags&CODEC_FLAG_GRAY))
> - chroma_4mv_motion(s, dest_cb, dest_cr, ref_picture, pix_op[1], mx, my);
> - }
> + chroma_4mv_motion(s, dest_cb, dest_cr, ref_picture, pix_op[1], mx, my);
> break;
a patch without the reindentions would have been more readable
[...]
> @@ -1752,12 +1752,51 @@
> }
> }
>
> +static inline void idct_dc_add(uint8_t *dst, int line_size, int dc)
> +{
> + int x, y;
> + uint8_t *cm = ff_cropTbl + MAX_NEG_CROP;
> + dc = (16383 * dc + 1024) >> 11;
> + dc = (16383 * (dc + 32)) >> 20;
dc= (dc + (i<0) + 3)>>3;
or
dc= (dc*2047 + 8192)>>14;
> + for (y = 0; y < 8; y++, dst += line_size) {
> + for (x = 0; x < 8; x++) {
> + dst[x] = cm[dst[x] + dc];
> + }
> + }
cm += dc;
can be done outside the loop
[...]
> Index: libavcodec/h264.h
> ===================================================================
> --- libavcodec/h264.h (revision 23459)
> +++ libavcodec/h264.h (working copy)
> @@ -735,22 +735,6 @@
> 1+5*8, 2+5*8,
> };
>
> -static av_always_inline uint32_t pack16to32(int a, int b){
> -#if HAVE_BIGENDIAN
> - return (b&0xFFFF) + (a<<16);
> -#else
> - return (a&0xFFFF) + (b<<16);
> -#endif
> -}
> -
> -static av_always_inline uint16_t pack8to16(int a, int b){
> -#if HAVE_BIGENDIAN
> - return (b&0xFF) + (a<<8);
> -#else
> - return (a&0xFF) + (b<<8);
> -#endif
> -}
> -
> /**
> * gets the chroma qp.
> */
moving these or anything else to a common place is ok of course
[...]
> int h263_decode_motion(MpegEncContext * s, int pred, int f_code)
> {
> - int code, val, sign, shift, l;
> - code = get_vlc2(&s->gb, mv_vlc.table, MV_VLC_BITS, 2);
> + return 0;
> +}
>
> - if (code == 0)
> +static int h263_decode_motion2(MpegEncContext * s, int pred)
> +{
> + int l;
> + int val = get_vlc2(&s->gb, mv_vlc.table, MV_VLC_BITS, 2);
> +
> + if (val == 0)
> return pred;
> - if (code < 0)
> + if (val < 0)
> return 0xffff;
>
> - sign = get_bits1(&s->gb);
> - shift = f_code - 1;
> - val = code;
> - if (shift) {
> - val = (val - 1) << shift;
> - val |= get_bits(&s->gb, shift);
> - val++;
> - }
> - if (sign)
> + if (get_bits1(&s->gb))
> val = -val;
> val += pred;
>
> /* modulo decoding */
> - if (!s->h263_long_vectors) {
> - l = INT_BIT - 5 - f_code;
> - val = (val<<l)>>l;
> - } else {
> - /* horrible h263 long vector mode */
> - if (pred < -31 && val < -63)
> - val += 64;
> - if (pred > 32 && val > 63)
> - val -= 64;
> -
> - }
> + l = INT_BIT - 6;
> + val = (val<<l)>>l;
> return val;
> }
ok
possible function name h263_decode_fcode1_motion()
[...]
> s->mv_dir = MV_DIR_FORWARD;
> s->mv_type = MV_TYPE_16X16;
> s->current_picture.mb_type[xy]= MB_TYPE_SKIP | MB_TYPE_16x16 | MB_TYPE_L0;
> - s->mv[0][0][0] = 0;
> - s->mv[0][0][1] = 0;
[...]
> + M16( s->mv[0][0] ) = 0;
[...]
> goto end;
> }
> cbpc = get_vlc2(&s->gb, ff_h263_inter_MCBPC_vlc.table, INTER_MCBPC_VLC_BITS, 2);
ok
> @@ -630,18 +347,13 @@
> }
> }while(cbpc == 20);
>
> - s->dsp.clear_blocks(s->block[0]);
> -
> dquant = cbpc & 8;
> s->mb_intra = ((cbpc & 4) != 0);
> if (s->mb_intra) goto intra;
>
> - if(s->pb_frame && get_bits1(&s->gb))
> - pb_mv_count = h263_get_modb(&s->gb, s->pb_frame, &cbpb);
> cbpy = get_vlc2(&s->gb, ff_h263_cbpy_vlc.table, CBPY_VLC_BITS, 1);
>
> - if(s->alt_inter_vlc==0 || (cbpc & 3)!=3)
> - cbpy ^= 0xF;
> + cbpy ^= 0xF;
>
> cbp = (cbpc & 3) | (cbpy << 2);
> if (dquant) {
> @@ -650,144 +362,42 @@
>
> s->mv_dir = MV_DIR_FORWARD;
> if ((cbpc & 16) == 0) {
> + uint32_t mv;
> s->current_picture.mb_type[xy]= MB_TYPE_16x16 | MB_TYPE_L0;
> /* 16x16 motion prediction */
> s->mv_type = MV_TYPE_16X16;
> - h263_pred_motion(s, 0, 0, &pred_x, &pred_y);
> - if (s->umvplus)
> - mx = h263p_decode_umotion(s, pred_x);
> - else
> - mx = h263_decode_motion(s, pred_x, 1);
> -
> + h263_pred_motion2(s, 0, 0, &pred_x, &pred_y);
> + mx = h263_decode_motion2(s, pred_x);
> if (mx >= 0xffff)
> return -1;
>
> - if (s->umvplus)
> - my = h263p_decode_umotion(s, pred_y);
> - else
> - my = h263_decode_motion(s, pred_y, 1);
> + my = h263_decode_motion2(s, pred_y);
>
> if (my >= 0xffff)
> return -1;
> - s->mv[0][0][0] = mx;
> - s->mv[0][0][1] = my;
> -
> - if (s->umvplus && (mx - pred_x) == 1 && (my - pred_y) == 1)
> - skip_bits1(&s->gb); /* Bit stuffing to prevent PSC */
> + mv = pack8to16(mx, my);
> + M16( s->mv[0][0] ) = mv;
> + mv *= 0x00010001U;
> + M32( s->current_picture.motion_val[0][mv_xy] ) = mv;
> + M32( s->current_picture.motion_val[0][mv_xy + wrap] ) = mv;
> } else {
> s->current_picture.mb_type[xy]= MB_TYPE_8x8 | MB_TYPE_L0;
> s->mv_type = MV_TYPE_8X8;
> for(i=0;i<4;i++) {
> - mot_val = h263_pred_motion(s, i, 0, &pred_x, &pred_y);
> - if (s->umvplus)
> - mx = h263p_decode_umotion(s, pred_x);
> - else
> - mx = h263_decode_motion(s, pred_x, 1);
> + mot_val = h263_pred_motion2(s, i, 0, &pred_x, &pred_y);
> + mx = h263_decode_motion2(s, pred_x);
> if (mx >= 0xffff)
> return -1;
>
> - if (s->umvplus)
> - my = h263p_decode_umotion(s, pred_y);
> - else
> - my = h263_decode_motion(s, pred_y, 1);
> + my = h263_decode_motion2(s, pred_y);
> if (my >= 0xffff)
> return -1;
> s->mv[0][i][0] = mx;
> s->mv[0][i][1] = my;
> - if (s->umvplus && (mx - pred_x) == 1 && (my - pred_y) == 1)
> - skip_bits1(&s->gb); /* Bit stuffing to prevent PSC */
> mot_val[0] = mx;
> mot_val[1] = my;
> }
> }
> - } else if(s->pict_type==FF_B_TYPE) {
> - int mb_type;
> - const int stride= s->b8_stride;
> - int16_t *mot_val0 = s->current_picture.motion_val[0][ 2*(s->mb_x + s->mb_y*stride) ];
> - int16_t *mot_val1 = s->current_picture.motion_val[1][ 2*(s->mb_x + s->mb_y*stride) ];
> -// const int mv_xy= s->mb_x + 1 + s->mb_y * s->mb_stride;
> -
> - //FIXME ugly
> - mot_val0[0 ]= mot_val0[2 ]= mot_val0[0+2*stride]= mot_val0[2+2*stride]=
> - mot_val0[1 ]= mot_val0[3 ]= mot_val0[1+2*stride]= mot_val0[3+2*stride]=
> - mot_val1[0 ]= mot_val1[2 ]= mot_val1[0+2*stride]= mot_val1[2+2*stride]=
> - mot_val1[1 ]= mot_val1[3 ]= mot_val1[1+2*stride]= mot_val1[3+2*stride]= 0;
> -
> - do{
> - mb_type= get_vlc2(&s->gb, h263_mbtype_b_vlc.table, H263_MBTYPE_B_VLC_BITS, 2);
> - if (mb_type < 0){
> - av_log(s->avctx, AV_LOG_ERROR, "b mb_type damaged at %d %d\n", s->mb_x, s->mb_y);
> - return -1;
> - }
> -
> - mb_type= h263_mb_type_b_map[ mb_type ];
> - }while(!mb_type);
> -
> - s->mb_intra = IS_INTRA(mb_type);
> - if(HAS_CBP(mb_type)){
> - s->dsp.clear_blocks(s->block[0]);
> - cbpc = get_vlc2(&s->gb, cbpc_b_vlc.table, CBPC_B_VLC_BITS, 1);
> - if(s->mb_intra){
> - dquant = IS_QUANT(mb_type);
> - goto intra;
> - }
> -
> - cbpy = get_vlc2(&s->gb, ff_h263_cbpy_vlc.table, CBPY_VLC_BITS, 1);
> -
> - if (cbpy < 0){
> - av_log(s->avctx, AV_LOG_ERROR, "b cbpy damaged at %d %d\n", s->mb_x, s->mb_y);
> - return -1;
> - }
> -
> - if(s->alt_inter_vlc==0 || (cbpc & 3)!=3)
> - cbpy ^= 0xF;
> -
> - cbp = (cbpc & 3) | (cbpy << 2);
> - }else
> - cbp=0;
> -
> - assert(!s->mb_intra);
> -
> - if(IS_QUANT(mb_type)){
> - h263_decode_dquant(s);
> - }
> -
> - if(IS_DIRECT(mb_type)){
> - s->mv_dir = MV_DIR_FORWARD | MV_DIR_BACKWARD | MV_DIRECT;
> - mb_type |= ff_mpeg4_set_direct_mv(s, 0, 0);
> - }else{
> - s->mv_dir = 0;
> - s->mv_type= MV_TYPE_16X16;
> -//FIXME UMV
> -
> - if(USES_LIST(mb_type, 0)){
> - int16_t *mot_val= h263_pred_motion(s, 0, 0, &mx, &my);
> - s->mv_dir = MV_DIR_FORWARD;
> -
> - mx = h263_decode_motion(s, mx, 1);
> - my = h263_decode_motion(s, my, 1);
> -
> - s->mv[0][0][0] = mx;
> - s->mv[0][0][1] = my;
> - mot_val[0 ]= mot_val[2 ]= mot_val[0+2*stride]= mot_val[2+2*stride]= mx;
> - mot_val[1 ]= mot_val[3 ]= mot_val[1+2*stride]= mot_val[3+2*stride]= my;
> - }
> -
> - if(USES_LIST(mb_type, 1)){
> - int16_t *mot_val= h263_pred_motion(s, 0, 1, &mx, &my);
> - s->mv_dir |= MV_DIR_BACKWARD;
> -
> - mx = h263_decode_motion(s, mx, 1);
> - my = h263_decode_motion(s, my, 1);
> -
> - s->mv[1][0][0] = mx;
> - s->mv[1][0][1] = my;
> - mot_val[0 ]= mot_val[2 ]= mot_val[0+2*stride]= mot_val[2+2*stride]= mx;
> - mot_val[1 ]= mot_val[3 ]= mot_val[1+2*stride]= mot_val[3+2*stride]= my;
> - }
> - }
> -
> - s->current_picture.mb_type[xy]= mb_type;
> } else { /* I-Frame */
> do{
> cbpc = get_vlc2(&s->gb, ff_h263_intra_MCBPC_vlc.table, INTRA_MCBPC_VLC_BITS, 2);
> @@ -797,24 +407,11 @@
> }
> }while(cbpc == 8);
>
> - s->dsp.clear_blocks(s->block[0]);
> -
> dquant = cbpc & 4;
> s->mb_intra = 1;
> intra:
> s->current_picture.mb_type[xy]= MB_TYPE_INTRA;
> - if (s->h263_aic) {
> - s->ac_pred = get_bits1(&s->gb);
> - if(s->ac_pred){
> - s->current_picture.mb_type[xy]= MB_TYPE_INTRA | MB_TYPE_ACPRED;
>
> - s->h263_aic_dir = get_bits1(&s->gb);
> - }
> - }else
> - s->ac_pred = 0;
> -
> - if(s->pb_frame && get_bits1(&s->gb))
> - pb_mv_count = h263_get_modb(&s->gb, s->pb_frame, &cbpb);
> cbpy = get_vlc2(&s->gb, ff_h263_cbpy_vlc.table, CBPY_VLC_BITS, 1);
> if(cbpy<0){
> av_log(s->avctx, AV_LOG_ERROR, "I cbpy damaged at %d %d\n", s->mb_x, s->mb_y);
> @@ -824,30 +421,21 @@
> if (dquant) {
> h263_decode_dquant(s);
> }
> + M32( s->current_picture.motion_val[0][mv_xy] ) = 0;
> + M32( s->current_picture.motion_val[0][mv_xy + wrap] ) = 0;
>
> - pb_mv_count += !!s->pb_frame;
> }
>
> - while(pb_mv_count--){
> - h263_decode_motion(s, 0, 1);
> - h263_decode_motion(s, 0, 1);
> - }
> -
> + qscale = s->qscale;
> + chroma_qscale = s->chroma_qscale;
> /* decode each block */
> for (i = 0; i < 6; i++) {
> - if (h263_decode_block(s, block[i], i, cbp&32) < 0)
> + if (h263_decode_block(s, block[i], i, cbp&32, i < 4 ? qscale : chroma_qscale) < 0)
> return -1;
> cbp+=cbp;
> }
>
> - if(s->pb_frame && h263_skip_b_part(s, cbpb) < 0)
> - return -1;
> - if(s->obmc && !s->mb_intra){
> - if(s->pict_type == FF_P_TYPE && s->mb_x+1<s->mb_width && s->mb_num_left != 1)
> - preview_obmc(s);
> - }
> end:
> -
> /* per-MB end of slice check */
> {
> int v= show_bits(&s->gb, 16);
we could be duplicating + optimizing + simplifying the decode_mb() in flvdec.c
if you think this is making a meassuerable speed difference.
though i have my doubts that this makes that much difference, i just wanted to
say iam not against it if you want to do it
the same applies to the block/coeff decode though that will make a speed
difference of course, we also have duplicated block/coeff decodes in mpeg1/2
so thats quite in line with existing optimizations
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Breaking DRM is a little like attempting to break through a door even
though the window is wide open and the only thing in the house is a bunch
of things you dont want and which you would get tomorrow for free anyway
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100622/60960e92/attachment.pgp>
More information about the ffmpeg-devel
mailing list