[FFmpeg-devel] PAFF support h264 - preliminary patch as notes

Tue Jul 17 01:24:19 CEST 2007

Don't get too excited....

As I have a camera (Sony HDR-SR1) that produces AVCHD files with PAFF
interlacing, I thought I'd see what is involved in getting ffmpeg to
work with them.  I learnt a lot in the process.

There is quite a bit of code in h264.c to support some of PAFF
already.
I'm not sure if I have all the terminology right here, but:
 A stream contains a collection of slices, which are contiguous
 regions of macro-blocks (16x16 pixels).
 A number of slices comprise a picture, but usually there is just
 one slice per picture.
 A picture is either a field or a frame, which is two fields of
 opposite parity encoded together.  Fields are either TOP (even
 numbers lines) or BOTTOM (odd numbered lines).

So what we need to handle PAFF is to be able to correctly handle the
TOP and BOTTOM FIELDS - the FRAMES are largely handled properly
already.

I have a patch, included below, which fixes a number of errors in
mapping macro-block numbers to picture line numbers, and other related
code.  With this, the first field gets decoded properly.
However it is only one field, half the scan-lines of a full frame.  So
it looks a bit washed out.  But it is progress.

The next steps, as I see them, are:

 - Get the second field to decode properly.
    In my files, it is a P field rather than an I field.
     i.e. while the first is coded with 'intra-coding', without
     reference to any other frame/field, the second is coded
     with progressive 'inter-coding', with reference to the
     preceding field. This brings up some issues.
    Presumably is it encoded based on the first field, which
     is of the opposite parity.  That means that every pixel is
     offset by one line.  I don't know how to handle that.
    There is some code missing from fill_default_ref_list that
     I think is important.  It reads.
       if(s->picture_structure == PICT_FRAME){
          .... do useful stuff
       }else{ //FIELD
           if(h->slice_type==B_TYPE){
           }else{
               //FIXME second field balh
           }
       }

     I think this needs to be fleshed out so that decoding
     the P frame has the right reference frames available.

    I also have a suspicion that fill_caches needs some extra handling
    to get left_block setup correctly in the PICT_FIELD case, but as yet I
    have no idea what 'left_block' is for, so I cannot be sure.


 - Combine fields into frames.  I really don't know what the desired
   result is here.  The decoding process will produce a series of
   interlace fields, first a top field (lines 0,2,4,6,...) then a
   bottom field (lines 1,3,5,7,...).  How should these be presented
   to the application?
   One option is to combine them into a single field.  However this
   loses information as the fields should be separated by 1/50th of a
   second (for the PAL case).
   The other option would be to pass them back as individual fields
   with half the expected number of lines and tell the application
   that they are fields to be interlaced together.  However I have no
   idea how to do that or if it is even possible.

   If the first option is best, then we need to hold on to the first
   field until the second field is ready.  Then merge them together.

   If the second option is best (which I suspect to be the case), then
   we need to decode fields densely (not leaving blank lines between
   content lines, only using the top half of the buffer), which means
   that my following patch is completely wrong as it gets the
   decoding to use the full height, but only half the lines.
   It would also mean that if we find a FRAME picture while expecting
   interlacing, we need to split it into the two component frames to
   return it to the application.  I think this would be a very
   substantial code change.  It might make the code cleaner though.

I don't know if/when I might find time to work on this again so I am
doing this brain dump now in case it might help someone else.  I would
really like input on the question of how to return interlaced video
fields before I even consider hacking on the code any more.


This patch contains a hack to mpegvideo.c so that decoding h264 with
PAFF doesn't crash copying data from NULL, which it probably does
because the reference frames aren't being set up properly.
With this patch, I can 'ffplay' a .mts file and the first frame looks
recognisable, though a bit washed out.  Following frames degrade quite
quickly.  It still crashes on exit with a bad 'free'.

As I suggest above, it is entirely possible that this patch is
completely wrong as it tried to make a field look like a frame, and we
possibly shouldn't be doing that.  It was a useful learning experience
though.

Thanks for your time,

NeilBrown




Index: libavcodec/h264.c
===================================================================

--- libavcodec/h264.c	(revision 9692)
+++ libavcodec/h264.c	(working copy)
@@ -172,6 +172,7 @@
     //wow what a mess, why didn't they simplify the interlacing&intra stuff, i can't imagine that these complex rules are worth it
 
     top_xy     = mb_xy  - s->mb_stride;
+    if (PICT_FIELD) top_xy -= s->mb_stride;
     topleft_xy = top_xy - 1;
     topright_xy= top_xy + 1;
     left_xy[1] = left_xy[0] = mb_xy-1;
@@ -247,6 +248,9 @@
                 left_block[7]= 10;
             }
         }
+    } 
+    else if (PICT_FIELD) {
+        /* MBAFF-FIXME Do we different values for 'left_block' here? */
     }
 
     h->top_mb_xy = top_xy;
@@ -4348,6 +4352,11 @@
     }
     s->resync_mb_x = s->mb_x = first_mb_in_slice % s->mb_width;
     s->resync_mb_y = s->mb_y = (first_mb_in_slice / s->mb_width) << h->mb_aff_frame;
+    if (s->picture_structure == PICT_BOTTOM_FIELD)
+        s->resync_mb_y = s->mb_y = s->mb_y *2 + 1;
+    else if (s->picture_structure == PICT_TOP_FIELD)
+        s->resync_mb_y = s->mb_y = s->mb_y *2;
+
     assert(s->mb_y < s->mb_height);
 
     if(s->picture_structure==PICT_FRAME){
@@ -4355,6 +4364,8 @@
         h->max_pic_num= 1<< h->sps.log2_max_frame_num;
     }else{
         h->curr_pic_num= 2*h->frame_num;
+        if (s->picture_structure == PICT_BOTTOM_FIELD)
+            h->curr_pic_num++;
         h->max_pic_num= 1<<(h->sps.log2_max_frame_num + 1);
     }
 
@@ -4390,7 +4401,7 @@
     if(h->slice_type == P_TYPE || h->slice_type == SP_TYPE || h->slice_type == B_TYPE){
         if(h->slice_type == B_TYPE){
             h->direct_spatial_mv_pred= get_bits1(&s->gb);
-            if(h->sps.mb_aff && h->direct_spatial_mv_pred)
+            if(h->mb_aff_frame && h->direct_spatial_mv_pred)
                 av_log(h->s.avctx, AV_LOG_ERROR, "MBAFF + spatial direct mode is not implemented\n");
         }
         num_ref_idx_active_override_flag= get_bits1(&s->gb);
@@ -5390,6 +5401,8 @@
         int mb_xy = mb_x + mb_y*s->mb_stride;
         mba_xy = mb_xy - 1;
         mbb_xy = mb_xy - s->mb_stride;
+        if (PICT_FIELD)
+            mbb_xy -= s->mb_stride;
     }
 
     if( h->slice_table[mba_xy] == h->slice_num && !IS_SKIP( s->current_picture.mb_type[mba_xy] ))
@@ -5522,16 +5535,9 @@
     return 1 + get_cabac_noinline( &h->cabac, &h->cabac_state[77 + ctx] );
 }
 static int decode_cabac_mb_dqp( H264Context *h) {
-    MpegEncContext * const s = &h->s;
-    int mbn_xy;
     int   ctx = 0;
     int   val = 0;
 
-    if( s->mb_x > 0 )
-        mbn_xy = s->mb_x + s->mb_y*s->mb_stride - 1;
-    else
-        mbn_xy = s->mb_width - 1 + (s->mb_y-1)*s->mb_stride;
-
     if( h->last_qscale_diff != 0 )
         ctx++;
 
@@ -5885,6 +5891,8 @@
         if (left_mb_frame_flag != curr_mb_frame_flag) {
             h->left_mb_xy[0] = pair_xy - 1;
         }
+    } else if (s->picture_structure == PICT_BOTTOM_FIELD) {
+        h->top_mb_xy -= s->mb_stride;
     }
     return;
 }
@@ -7117,7 +7125,7 @@
                 s->mb_x = 0;
                 ff_draw_horiz_band(s, 16*s->mb_y, 16);
                 ++s->mb_y;
-                if(FRAME_MBAFF) {
+                if(FRAME_MBAFF || PICT_FIELD) {
                     ++s->mb_y;
                 }
             }
@@ -7154,7 +7162,7 @@
                 s->mb_x=0;
                 ff_draw_horiz_band(s, 16*s->mb_y, 16);
                 ++s->mb_y;
-                if(FRAME_MBAFF) {
+                if(FRAME_MBAFF || PICT_FIELD) {
                     ++s->mb_y;
                 }
                 if(s->mb_y >= s->mb_height){
Index: libavcodec/h264.h
===================================================================
--- libavcodec/h264.h	(revision 9692)
+++ libavcodec/h264.h	(working copy)
@@ -58,10 +58,12 @@
 #define MB_MBAFF h->mb_mbaff
 #define MB_FIELD h->mb_field_decoding_flag
 #define FRAME_MBAFF h->mb_aff_frame
+#define PICT_FIELD (h->s.picture_structure != PICT_FRAME)
 #else
 #define MB_MBAFF 0
 #define MB_FIELD 0
 #define FRAME_MBAFF 0
+#define PICT_FIELD 0
 #undef  IS_INTERLACED
 #define IS_INTERLACED(mb_type) 0
 #endif
Index: libavcodec/mpegvideo.c
===================================================================
--- libavcodec/mpegvideo.c	(revision 9692)
+++ libavcodec/mpegvideo.c	(working copy)
@@ -1925,6 +1925,7 @@
         if (!s->mb_intra) {
             /* motion handling */
             /* decoding or more than one mb_type (MC was already done otherwise) */
+		if (s->last_picture.data[0]){ /* Hack to stop h264/PAFF from crashing */
             if(!s->encoding){
                 if(lowres_flag){
                     h264_chroma_mc_func *op_pix = s->dsp.put_h264_chroma_pixels_tab;
@@ -1953,7 +1954,7 @@
                     }
                 }
             }
-
+}
             /* skip dequant / idct if we are really late ;) */
             if(s->hurry_up>1) goto skip_idct;
             if(s->avctx->skip_idct){