[FFmpeg-devel] [PATCH v3 3/6] lavc/qsvdec: Replace current parser with MFXVideoDECODE_DecodeHeader()

Tue Mar 12 02:04:07 EET 2019

pix_fmts[1] is misleading. And I think it can be quite easily
avoided.

If I understand the code correctly, qsv_decode_preinit is the place
(and the only place) where you need an array of pix_fmts. You actually
request ffmpeg to select one of the formats and program it into avctx.
Is that right?

If so, could we try to define the array exactly in this function and
avoid passing the whole array around? See comments below for
suggestions...

On Mon, 2019-03-11 at 17:23 +0800, Li, Zhong wrote:
> > -----Original Message-----
> > From: Rogozhkin, Dmitry V
> > Sent: Saturday, March 9, 2019 8:48 AM
> > To: ffmpeg-devel at ffmpeg.org
> > Cc: Li, Zhong <zhong.li at intel.com>
> > Subject: Re: [FFmpeg-devel] [PATCH v3 3/6] lavc/qsvdec: Replace
> > current
> > parser with MFXVideoDECODE_DecodeHeader()
> > 
> > On Fri, 2019-03-08 at 15:40 +0800, Zhong Li wrote:
> > > Using MSDK parser can improve qsv decoder pass rate in some cases
> > > (E.g:
> > > sps declares a wrong level_idc, smaller than it should be).
> > > And it is necessary for adding new qsv decoders such as MJPEG and
> > > VP9
> > > since current parser can't provide enough information.
> > > Actually using MFXVideoDECODE_DecodeHeader() was disscussed at
> > > https://ffmpeg.org/pipermail/ffmpeg-devel/2015-July/175734.html
> > > and
> > > merged as commit 1acb19d, but was overwritten when merged libav
> > > patches (commit: 1f26a23) without any explain.
> > > 
> > > v2: split decode header from decode_init, and call it for
> > > everyframe
> > > to detect format/resoultion change. It can fix some regression
> > > issues
> > > such as hevc 10bits decoding.
> > > 
> > > Signed-off-by: Zhong Li <zhong.li at intel.com>
> > > ---
> > >  libavcodec/qsvdec.c       | 172 ++++++++++++++++++++----------
> > > ----
> > > ----
> > >  libavcodec/qsvdec.h       |   2 +
> > >  libavcodec/qsvdec_h2645.c |   1 +
> > >  libavcodec/qsvdec_other.c |   1 +
> > >  4 files changed, 93 insertions(+), 83 deletions(-)
> > > 
> > > diff --git a/libavcodec/qsvdec.c b/libavcodec/qsvdec.c index
> > > 4a0be811fb..b78026e14d 100644
> > > --- a/libavcodec/qsvdec.c
> > > +++ b/libavcodec/qsvdec.c
> > > @@ -120,19 +120,17 @@ static inline unsigned int
> > > qsv_fifo_size(const
> > > AVFifoBuffer* fifo)
> > >      return av_fifo_size(fifo) / qsv_fifo_item_size();
> > >  }
> > > 
> > > -static int qsv_decode_init(AVCodecContext *avctx, QSVContext *q)
> > > +static int qsv_decode_preinit(AVCodecContext *avctx, QSVContext
> > > *q,
> > > enum AVPixelFormat *pix_fmts, mfxVideoParam *param)

Just: enum AVPixelFormat pix_fmt. Single one we need.

> > >  {
> > > -    const AVPixFmtDescriptor *desc;
> > >      mfxSession session = NULL;
> > >      int iopattern = 0;
> > > -    mfxVideoParam param = { 0 };
> > > -    int frame_width  = avctx->coded_width;
> > > -    int frame_height = avctx->coded_height;
> > >      int ret;
> > > 
> > > -    desc = av_pix_fmt_desc_get(avctx->sw_pix_fmt);
> > > -    if (!desc)
> > > -        return AVERROR_BUG;

enum AVPixelFormat pix_fmts[3] = { AV_PIX_FMT_QSV,
                                   pix_fmt,  // <<< use the one we got
in function call
                                   AV_PIX_FMT_NONE };

> > > +    ret = ff_get_format(avctx, pix_fmts);
> > > +    if (ret < 0) {
> > > +        q->orig_pix_fmt = avctx->pix_fmt = AV_PIX_FMT_NONE;
> > > +        return ret;
> > > +    }
> > > 
> > >      if (!q->async_fifo) {
> > >          q->async_fifo = av_fifo_alloc(q->async_depth *
> > > qsv_fifo_item_size()); @@ -170,48 +168,72 @@ static int
> > > qsv_decode_init(AVCodecContext *avctx, QSVContext *q)
> > >          return ret;
> > >      }
> > > 
> > > -    ret = ff_qsv_codec_id_to_mfx(avctx->codec_id);
> > > -    if (ret < 0)
> > > -        return ret;
> > > +    param->IOPattern   = q->iopattern;
> > > +    param->AsyncDepth  = q->async_depth;
> > > +    param->ExtParam    = q->ext_buffers;
> > > +    param->NumExtParam = q->nb_ext_buffers;
> > > 
> > > -    param.mfx.CodecId      = ret;
> > > -    param.mfx.CodecProfile = ff_qsv_profile_to_mfx(avctx-
> > > >codec_id,
> > > avctx->profile);
> > > -    param.mfx.CodecLevel   = avctx->level ==
> > 
> > FF_LEVEL_UNKNOWN ?
> > > MFX_LEVEL_UNKNOWN : avctx->level;
> > > -
> > > -    param.mfx.FrameInfo.BitDepthLuma   = desc->comp[0].depth;
> > > -    param.mfx.FrameInfo.BitDepthChroma = desc->comp[0].depth;
> > > -    param.mfx.FrameInfo.Shift          = desc->comp[0].depth >
> > 
> > 8;
> > > -    param.mfx.FrameInfo.FourCC         = q->fourcc;
> > > -    param.mfx.FrameInfo.Width          = frame_width;
> > > -    param.mfx.FrameInfo.Height         = frame_height;
> > > -    param.mfx.FrameInfo.ChromaFormat   =
> > 
> > MFX_CHROMAFORMAT_YUV420;
> > > -
> > > -    switch (avctx->field_order) {
> > > -    case AV_FIELD_PROGRESSIVE:
> > > -        param.mfx.FrameInfo.PicStruct =
> > 
> > MFX_PICSTRUCT_PROGRESSIVE;
> > > -        break;
> > > -    case AV_FIELD_TT:
> > > -        param.mfx.FrameInfo.PicStruct =
> > 
> > MFX_PICSTRUCT_FIELD_TFF;
> > > -        break;
> > > -    case AV_FIELD_BB:
> > > -        param.mfx.FrameInfo.PicStruct =
> > 
> > MFX_PICSTRUCT_FIELD_BFF;
> > > -        break;
> > > -    default:
> > > -        param.mfx.FrameInfo.PicStruct =
> > 
> > MFX_PICSTRUCT_UNKNOWN;
> > > -        break;
> > > -    }
> > > +    return 0;
> > > + }
> > > 
> > > -    param.IOPattern   = q->iopattern;
> > > -    param.AsyncDepth  = q->async_depth;
> > > -    param.ExtParam    = q->ext_buffers;
> > > -    param.NumExtParam = q->nb_ext_buffers;
> > > +static int qsv_decode_init(AVCodecContext *avctx, QSVContext *q,
> > > mfxVideoParam *param)
> > > +{
> > > +    int ret;
> > > 
> > > -    ret = MFXVideoDECODE_Init(q->session, &param);
> > > +    avctx->width        = param->mfx.FrameInfo.CropW;
> > > +    avctx->height       = param->mfx.FrameInfo.CropH;
> > > +    avctx->coded_width  = param->mfx.FrameInfo.Width;
> > > +    avctx->coded_height = param->mfx.FrameInfo.Height;
> > > +    avctx->level        = param->mfx.CodecLevel;
> > > +    avctx->profile      = param->mfx.CodecProfile;
> > > +    avctx->field_order  = ff_qsv_map_picstruct(param-
> > > > mfx.FrameInfo.PicStruct);
> > > 
> > > +    avctx->pix_fmt      = ff_qsv_map_fourcc(param-
> > > > mfx.FrameInfo.FourCC);
> > > 
> > > +
> > > +    ret = MFXVideoDECODE_Init(q->session, param);
> > >      if (ret < 0)
> > >          return ff_qsv_print_error(avctx, ret,
> > >                                    "Error initializing the
> > 
> > MFX video
> > > decoder");
> > > 
> > > -    q->frame_info = param.mfx.FrameInfo;
> > > +    q->frame_info = param->mfx.FrameInfo;
> > > +
> > > +    return 0;
> > > +}
> > > +
> > > +static int qsv_decode_header(AVCodecContext *avctx, QSVContext
> > > *q,
> > > AVPacket *avpkt, enum AVPixelFormat *pix_fmts, mfxVideoParam
> > > *param)

Just: enum AVPixelFormat pix_fmt. Single one we need.

> > > +{
> > > +    int ret;
> > > +
> > > +    mfxBitstream bs = { { { 0 } } };
> > > +
> > > +    if (avpkt->size) {
> > > +        bs.Data       = avpkt->data;
> > > +        bs.DataLength = avpkt->size;
> > > +        bs.MaxLength  = bs.DataLength;
> > > +        bs.TimeStamp  = avpkt->pts;
> > > +        if (avctx->field_order == AV_FIELD_PROGRESSIVE)
> > > +            bs.DataFlag   |=
> > 
> > MFX_BITSTREAM_COMPLETE_FRAME;
> > > +    } else
> > > +        return AVERROR_INVALIDDATA;
> > > +
> > > +
> > > +    if(!q->session) {
> > > +        ret = qsv_decode_preinit(avctx, q, pix_fmts, param);
> > > +        if (ret < 0)
> > > +            return ret;
> > > +    }
> > > +
> > > +    ret = ff_qsv_codec_id_to_mfx(avctx->codec_id);
> > > +    if (ret < 0)
> > > +        return ret;
> > > +
> > > +    param->mfx.CodecId = ret;
> > > +    ret = MFXVideoDECODE_DecodeHeader(q->session, &bs, param);
> > > +    if (MFX_ERR_MORE_DATA == ret) {
> > > +       return AVERROR(EAGAIN);
> > > +    }
> > > +    if (ret < 0)
> > > +        return ff_qsv_print_error(avctx, ret,
> > > +                "Error decoding stream header");
> > > 
> > >      return 0;
> > >  }
> > > @@ -494,7 +516,10 @@ int ff_qsv_process_data(AVCodecContext
> > > *avctx,
> > > QSVContext *q,
> > >      uint8_t *dummy_data;
> > >      int dummy_size;
> > >      int ret;
> > > -    const AVPixFmtDescriptor *desc;
> > > +    mfxVideoParam param = { 0 };
> > > +    enum AVPixelFormat pix_fmts[3] = { AV_PIX_FMT_QSV,
> > > 
> > 
> > +                                       AV_PIX_FMT_NV
> > 12,
> > > 
> > 
> > +                                       AV_PIX_FMT_N
> > ONE };

Just enum AVPixelFormat pix_fmt = AV_PIX_FMT_NV12.

> > > 
> > >      if (!q->avctx_internal) {
> > >          q->avctx_internal = avcodec_alloc_context3(NULL); @@
> > 
> > -508,7
> > > +533,6 @@ int ff_qsv_process_data(AVCodecContext *avctx,
> > > QSVContext
> > > *q,
> > >              return AVERROR(ENOMEM);
> > > 
> > >          q->parser->flags |= PARSER_FLAG_COMPLETE_FRAMES;
> > > -        q->orig_pix_fmt   = AV_PIX_FMT_NONE;
> > >      }
> > > 
> > >      if (!pkt->size)
> > > @@ -521,15 +545,17 @@ int ff_qsv_process_data(AVCodecContext
> > 
> > *avctx,
> > > QSVContext *q,
> > >                       pkt->data, pkt->size, pkt->pts, pkt->dts,
> > >                       pkt->pos);
> > > 
> > > -    avctx->field_order  = q->parser->field_order;
> > >      /* TODO: flush delayed frames on reinit */
> > > -    if (q->parser->format       != q->orig_pix_fmt    ||
> > > -        FFALIGN(q->parser->coded_width, 16)  != FFALIGN(avctx-
> > > > coded_width, 16) ||
> > > 
> > > -        FFALIGN(q->parser->coded_height, 16) != FFALIGN(avctx-
> > > > coded_height, 16)) {
> > > 
> > > -        enum AVPixelFormat pix_fmts[3] = { AV_PIX_FMT_QSV,
> > > 
> > 
> > -                                           AV_PIX_F
> > MT_NONE,
> > > 
> > 
> > -                                           AV_PIX_F
> > MT_NONE };
> > > -        enum AVPixelFormat qsv_format;
> > > +
> > > +    // sw_pix_fmt was initialized as NV12, will be updated after
> > > header decoded if not true.
> > > +    if (q->orig_pix_fmt != AV_PIX_FMT_NONE)
> > > +        pix_fmts[1] = q->orig_pix_fmt;
> > 
> > Well, it still seems non self-explanatory to me why we have
> > pix_fmts[1] here.
> 
> I think it has been explained and descripted in my comment. It means
> sw_pix_fmt.
> For more detail, probably you want to check the implementation of
> ff_get_format().
> 
> > > +
> > > +    ret = qsv_decode_header(avctx, q, pkt, pix_fmts, &param);
> > 
> > Parsing can be expensive to perform on each frame. See more below.
> 
> 1. Parsing is just to reading header bits, I don't think it is CPU
> heavy calculation task, compared with slice level decoding. 
> 2. Mostly only key frames have completed header. For non-key frame,
> just return MFX_ERR_MORE_DATA instead of real parsing work. 
> 
> > > +
> > > +    if (ret >= 0 && (q->orig_pix_fmt !=
> > > ff_qsv_map_fourcc(param.mfx.FrameInfo.FourCC) ||
> > > +        avctx->coded_width  != param.mfx.FrameInfo.Width ||
> > > +        avctx->coded_height != param.mfx.FrameInfo.Height)) {
> > >          AVPacket zero_pkt = {0};
> > 
> > You are actually trying to perform the task which is being done by
> > mediasdk.
> > Moreover, you don't detect all the cases when mediasdk decoder will
> > require
> > to be reset. For example, you may have 2 streams concatenated which
> > have
> > perfectly the same resolution, but still decoder reset will be
> > required because
> > of profile/level/gop structure changes and memory reallocation or
> > better to
> > say additional allocation could also be required because mediasdk
> > may
> > require different (bigger or lesser) number of surfaces to properly
> > handle
> > new bitstream. See more below.
> 
> Which case of gop structure changing need to reset? Could you please,
> give an example.
> 
> > > 
> > >          if (q->buffered_count) {
> > > @@ -538,45 +564,24 @@ int ff_qsv_process_data(AVCodecContext
> > 
> > *avctx,
> > > QSVContext *q,
> > >              q->buffered_count--;
> > >              return qsv_decode(avctx, q, frame, got_frame,
> > 
> > &zero_pkt);
> > >          }
> > > -
> > >          q->reinit_flag = 0;
> > > 
> > > -        qsv_format = ff_qsv_map_pixfmt(q->parser->format, &q-
> > > > fourcc);
> > > 
> > > -        if (qsv_format < 0) {
> > > -            av_log(avctx, AV_LOG_ERROR,
> > > -                   "Decoding pixel format '%s' is not
> > 
> > supported\n",
> > > -                   av_get_pix_fmt_name(q->parser->format));
> > > -            ret = AVERROR(ENOSYS);
> > > -            goto reinit_fail;
> > > -        }
> > > +        q->orig_pix_fmt = avctx->pix_fmt = pix_fmts[1] =
> > > ff_qsv_map_fourcc(param.mfx.FrameInfo.FourCC);
> > > 
> > > -        q->orig_pix_fmt     = q->parser->format;
> > > -        avctx->pix_fmt      = pix_fmts[1] = qsv_format;
> > > -        avctx->width        = q->parser->width;
> > > -        avctx->height       = q->parser->height;
> > > -        avctx->coded_width  = FFALIGN(q->parser->coded_width,
> > 
> > 16);
> > > -        avctx->coded_height = FFALIGN(q->parser->coded_height,
> > 
> > 16);
> > > -        avctx->level        = q->avctx_internal->level;
> > > -        avctx->profile      = q->avctx_internal->profile;
> > > +        avctx->coded_width  = param.mfx.FrameInfo.Width;
> > > +        avctx->coded_height = param.mfx.FrameInfo.Height;
> > > 
> > > -        ret = ff_get_format(avctx, pix_fmts);
> > > +        ret = qsv_decode_preinit(avctx, q, pix_fmts, &param);
> > >          if (ret < 0)
> > >              goto reinit_fail;
> > > +        q->initialized = 0;
> > > +    }
> > > 
> > > -        avctx->pix_fmt = ret;
> > > -
> > > -        desc = av_pix_fmt_desc_get(avctx->pix_fmt);
> > > -        if (!desc)
> > > -            goto reinit_fail;
> > > -
> > > -         if (desc->comp[0].depth > 8) {
> > > -            avctx->coded_width
> > 
> > =  FFALIGN(q->parser->coded_width,
> > > 32);
> > > -            avctx->coded_height =
> > 
> > FFALIGN(q->parser->coded_height,
> > > 32);
> > > -        }
> > > -
> > > -        ret = qsv_decode_init(avctx, q);
> > > +    if (!q->initialized) {
> > > +        ret = qsv_decode_init(avctx, q, &param);
> > >          if (ret < 0)
> > >              goto reinit_fail;
> > > +        q->initialized = 1;
> > >      }
> > > 
> > >      return qsv_decode(avctx, q, frame, got_frame, pkt);
> > 
> > So, from my perspective you should not attempt to call decode
> > header on
> > each incoming data portion and try to interpret results. Instead
> > you should
> > respect what mediasdk DecodeFrameAsync is returning. Basically it
> > will
> > return MFX_ERR_INCOMPATIBLE_VIDEO_PARAM is decoder reset is
> > required.
> > This gives you 2 situations when you should call
> > DecodeHeader:
> > 1. You either call it if decoder is not initalized at all 2. Or you
> > call it upon
> > receiving MFX_ERR_INCOMPATIBLE_VIDEO_PARAM
> > 
> > Once you received MFX_ERR_INCOMPATIBLE_VIDEO_PARAM step to
> > perform are:
> > 1. Flush cached frames from medisdk decoder passing nullptr
> > bitstream to
> > DecodeFrameAsync 2. Run decode header and get updated parameters 3.
> > Check (via QueryIOSurf) and potentially reallocate surfaces to
> > satisfy new
> > bitstream requirement 4. Reset decoder with new parameters
> 
> 1. This patch is just to replace original parser to MSDK parser. I
> don't like to mix other thing here (thus probably introduces new
> bugs, but full round testing has been done for this patch).
>   If it is really necessary, should be done by an additional patch. 
> 2. Unfortunately, if we don't reset and set correct
> resolution/pix_fmt, FFmpeg-qsv already failed in the progress of MSDK
> initialization (e,g: pass a NV12 format but actually it is P010. see
> : https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/qsv.c#L424
> )
>   There is no opportunity to get the status of DecodeFrameAsync at
> these case. I haven't compared MSDK sample decoder with current
> FFmpeg (Will take a look later). But IIRC, MSDK sample decoder isn't
> doing a better job than ffmpeg-qsv for reinit case.
>   (one example, it is hang when run sample_decode h264 -i /fate-
> suite/h264/reinit-large_420_8-to-small_420_8.h264 -r , and this clip
> is changing from a larger resolution to a smaller one. I wonder to
> know what will happen if change resolution from smaller to a larger,
> eg, 720x480 to 1920x1080).
> 3. IMHO, reinit work is better to do before decoding, instead of
> decoding it and recover it after it is failed. 
> 4. This a common function for all decoders, I would like to see it is
> under control of ffmpeg. MSDK may have no unified implementation of
> all codecs ( e.g: https://github.com/Intel-Media-SDK/MediaSDK/issues/
> 333 ).
>   Your suggestion may be addressed only when I see any new issues or
> bugs. But currently I can't see the necessity. 
> 
> > Application may attempt to optimize procedure on step 3 above and
> > avoid
> > surfaces reallocation if possible. For example, if you have
> > allocated
> > 1920x1080 surfaces and resolution changed to 720x480, you may
> > actually
> > stay on 1920x1080 if you are ok to use more memory hoping to
> > receive
> > 1920x1080 stream in the future again. Eventually this would work if
> > you have
> > enough number of 1920x1080 surfaces.
> 
> Hm, this is a simpler topic, https://patchwork.ffmpeg.org/patch/12257
> / is to prepare this though decoder should be updated too. But as
> previous discussion, thus should be a separated patch and need pre-
> testing.