[FFmpeg-devel] [PATCH] avcodec/h264_metadata: add change pic_order_cnt_type option

Sat Mar 20 20:43:27 EET 2021

sharpbai:
> Some encoder set pic_order_cnt_type=0 when not using bframe,
> such as h264_videotoolbox. It may cause that some hardware decoder
> delays output frames (buffer up to 18frames), such as MediaCodec
> decoder before Android 11.
> 
> Setting pic_order_cnt_type=2 indicates that the picture order could not
> be reversed, it will minimize the decoding delay on some decoder
> implementations.
> 
> Signed-off-by: sharpbai <sharpbai at gmail.com>
> ---
>  libavcodec/cbs_h264.h                 |  1 +
>  libavcodec/cbs_h264_syntax_template.c | 16 ++++++++++++++++
>  libavcodec/h264_metadata_bsf.c        |  9 +++++++++
>  3 files changed, 26 insertions(+)
> 
> diff --git a/libavcodec/cbs_h264.h b/libavcodec/cbs_h264.h
> index 9f7c2a0d30..00fe5178ea 100644
> --- a/libavcodec/cbs_h264.h
> +++ b/libavcodec/cbs_h264.h
> @@ -136,6 +136,7 @@ typedef struct H264RawSPS {
>  
>      uint8_t log2_max_frame_num_minus4;
>      uint8_t pic_order_cnt_type;
> +    uint8_t pic_order_cnt_type_write;
>      uint8_t log2_max_pic_order_cnt_lsb_minus4;
>      uint8_t delta_pic_order_always_zero_flag;
>      int32_t offset_for_non_ref_pic;
> diff --git a/libavcodec/cbs_h264_syntax_template.c b/libavcodec/cbs_h264_syntax_template.c
> index b65460996b..4a8a8442dd 100644
> --- a/libavcodec/cbs_h264_syntax_template.c
> +++ b/libavcodec/cbs_h264_syntax_template.c
> @@ -324,9 +324,17 @@ static int FUNC(sps)(CodedBitstreamContext *ctx, RWContext *rw,
>      }
>  
>      ue(log2_max_frame_num_minus4, 0, 12);
> +    #ifdef READ
>      ue(pic_order_cnt_type, 0, 2);
> +    #else
> +    ue(pic_order_cnt_type_write, 0, 2);
> +    #endif

This is wrong. Instead you should make the SPS that you are modifying
writable and update it. (The h264_metadata bsf uses separate
CodedBitstreamContexts for input and output and the SPS is normally
shared between them, so if someone wants to modify a field one needs to
make it writable which makes a copy of the NAL unit (the reason it is
currently not done is because the fields that are changed are not used
for parsing at all, but pic_order_cnt_type is). See h264_redundant_pps
for an example of this.)

(Btw: What you are doing here will break writing H.264 SPS if
pic_order_cnt_type_write isn't set (unless pic_order_cnt_type is zero).
This can e.g. happen for the aforementioned h264_redundant_pps BSF.)

>  
> +    #ifdef READ
>      if (current->pic_order_cnt_type == 0) {
> +    #else
> +    if (current->pic_order_cnt_type_write == 0) {
> +    #endif
>          ue(log2_max_pic_order_cnt_lsb_minus4, 0, 12);
>      } else if (current->pic_order_cnt_type == 1) {
>          flag(delta_pic_order_always_zero_flag);
> @@ -1265,13 +1273,21 @@ static int FUNC(slice_header)(CodedBitstreamContext *ctx, RWContext *rw,
>      if (idr_pic_flag)
>          ue(idr_pic_id, 0, 65535);
>  
> +    #ifdef READ
>      if (sps->pic_order_cnt_type == 0) {
> +    #else
> +    if (sps->pic_order_cnt_type_write == 0) {
> +    #endif
>          ub(sps->log2_max_pic_order_cnt_lsb_minus4 + 4, pic_order_cnt_lsb);
>          if (pps->bottom_field_pic_order_in_frame_present_flag &&
>              !current->field_pic_flag)
>              se(delta_pic_order_cnt_bottom, INT32_MIN + 1, INT32_MAX);
>  
> +    #ifdef READ
>      } else if (sps->pic_order_cnt_type == 1) {
> +    #else
> +    } else if (sps->pic_order_cnt_type_write == 1) {
> +    #endif
>          if (!sps->delta_pic_order_always_zero_flag) {
>              se(delta_pic_order_cnt[0], INT32_MIN + 1, INT32_MAX);
>              if (pps->bottom_field_pic_order_in_frame_present_flag &&
> diff --git a/libavcodec/h264_metadata_bsf.c b/libavcodec/h264_metadata_bsf.c
> index cef054bd65..c21e477841 100644
> --- a/libavcodec/h264_metadata_bsf.c
> +++ b/libavcodec/h264_metadata_bsf.c
> @@ -59,6 +59,7 @@ typedef struct H264MetadataContext {
>      AVRational sample_aspect_ratio;
>  
>      int overscan_appropriate_flag;
> +    int pic_order_cnt_type;
>  
>      int video_format;
>      int video_full_range_flag;
> @@ -95,6 +96,11 @@ static int h264_metadata_update_sps(AVBSFContext *bsf,
>      int need_vui = 0;
>      int crop_unit_x, crop_unit_y;
>  
> +    if (ctx->pic_order_cnt_type != -1) {
> +        sps->pic_order_cnt_type_write = ctx->pic_order_cnt_type;
> +    } else {
> +        sps->pic_order_cnt_type_write = sps->pic_order_cnt_type;
> +    }

This may break valid files and is therefore completely unsafe and should
IMO not be in this bsf.
Have you tested whether setting the number of reorder frames (which is
the typical way to set the delay in H.264) would be enough for your
usecase? I have already made a patch for this here:
https://github.com/mkver/FFmpeg/commit/6f6de261b30ef493f65af293c66798fe625ea743,
but it is not based upon current git master. Just remove the check
(which relies on a function not added in git master (introduced here:
https://github.com/mkver/FFmpeg/commit/29ff08b47ca63faff250802cdf743e2ac89d34f5)).
My usecase for this patch is slightly different from yours though: When
the input doesn't have reorder frames set, one has to guess it if the
container does not provide dts (think of Matroska). And said guess can
be wrong.

>      if (ctx->sample_aspect_ratio.num && ctx->sample_aspect_ratio.den) {
>          // Table E-1.
>          static const AVRational sar_idc[] = {
> @@ -689,6 +695,9 @@ static const AVOption h264_metadata_options[] = {
>          OFFSET(overscan_appropriate_flag), AV_OPT_TYPE_INT,
>          { .i64 = -1 }, -1, 1, FLAGS },
>  
> +    { "pic_order_cnt_type", "Set pic_order_cnt_type",
> +        OFFSET(pic_order_cnt_type), AV_OPT_TYPE_INT,
> +        { .i64 = -1 }, -1, 2, FLAGS },
>      { "video_format", "Set video format (table E-2)",
>          OFFSET(video_format), AV_OPT_TYPE_INT,
>          { .i64 = -1 }, -1, 7, FLAGS},
>