[FFmpeg-devel] [PATCH v3 3/6] lavc/qsvdec: Replace current parser with MFXVideoDECODE_DecodeHeader()

Mon Mar 18 05:22:52 EET 2019

> From: ffmpeg-devel [mailto:ffmpeg-devel-bounces at ffmpeg.org] On Behalf
> Of Mark Thompson
> Sent: Sunday, March 17, 2019 10:20 PM
> To: ffmpeg-devel at ffmpeg.org
> Subject: Re: [FFmpeg-devel] [PATCH v3 3/6] lavc/qsvdec: Replace current
> parser with MFXVideoDECODE_DecodeHeader()
> 
> On 08/03/2019 07:40, Zhong Li wrote:
> > Using MSDK parser can improve qsv decoder pass rate in some cases (E.g:
> > sps declares a wrong level_idc, smaller than it should be).
> > And it is necessary for adding new qsv decoders such as MJPEG and VP9
> > since current parser can't provide enough information.
> > Actually using MFXVideoDECODE_DecodeHeader() was disscussed at
> > https://ffmpeg.org/pipermail/ffmpeg-devel/2015-July/175734.html and
> > merged as commit 1acb19d, but was overwritten when merged libav
> patches (commit: 1f26a23) without any explain.
> >
> > v2: split decode header from decode_init, and call it for everyframe
> > to detect format/resoultion change. It can fix some regression issues
> > such as hevc 10bits decoding.
> >
> > Signed-off-by: Zhong Li <zhong.li at intel.com>
> > ---
> >  libavcodec/qsvdec.c       | 172
> ++++++++++++++++++++------------------
> >  libavcodec/qsvdec.h       |   2 +
> >  libavcodec/qsvdec_h2645.c |   1 +
> >  libavcodec/qsvdec_other.c |   1 +
> >  4 files changed, 93 insertions(+), 83 deletions(-)
> >
> > diff --git a/libavcodec/qsvdec.c b/libavcodec/qsvdec.c index
> > 4a0be811fb..b78026e14d 100644
> > --- a/libavcodec/qsvdec.c
> > +++ b/libavcodec/qsvdec.c
> > @@ -120,19 +120,17 @@ static inline unsigned int qsv_fifo_size(const
> AVFifoBuffer* fifo)
> >      return av_fifo_size(fifo) / qsv_fifo_item_size();  }
> >
> > -static int qsv_decode_init(AVCodecContext *avctx, QSVContext *q)
> > +static int qsv_decode_preinit(AVCodecContext *avctx, QSVContext *q,
> > +enum AVPixelFormat *pix_fmts, mfxVideoParam *param)
> >  {
> > -    const AVPixFmtDescriptor *desc;
> >      mfxSession session = NULL;
> >      int iopattern = 0;
> > -    mfxVideoParam param = { 0 };
> > -    int frame_width  = avctx->coded_width;
> > -    int frame_height = avctx->coded_height;
> >      int ret;
> >
> > -    desc = av_pix_fmt_desc_get(avctx->sw_pix_fmt);
> > -    if (!desc)
> > -        return AVERROR_BUG;
> > +    ret = ff_get_format(avctx, pix_fmts);
> > +    if (ret < 0) {
> > +        q->orig_pix_fmt = avctx->pix_fmt = AV_PIX_FMT_NONE;
> > +        return ret;
> > +    }
> >
> >      if (!q->async_fifo) {
> >          q->async_fifo = av_fifo_alloc(q->async_depth *
> > qsv_fifo_item_size()); @@ -170,48 +168,72 @@ static int
> qsv_decode_init(AVCodecContext *avctx, QSVContext *q)
> >          return ret;
> >      }
> >
> > -    ret = ff_qsv_codec_id_to_mfx(avctx->codec_id);
> > -    if (ret < 0)
> > -        return ret;
> > +    param->IOPattern   = q->iopattern;
> > +    param->AsyncDepth  = q->async_depth;
> > +    param->ExtParam    = q->ext_buffers;
> > +    param->NumExtParam = q->nb_ext_buffers;
> >
> > -    param.mfx.CodecId      = ret;
> > -    param.mfx.CodecProfile = ff_qsv_profile_to_mfx(avctx->codec_id,
> avctx->profile);
> > -    param.mfx.CodecLevel   = avctx->level == FF_LEVEL_UNKNOWN ?
> MFX_LEVEL_UNKNOWN : avctx->level;
> > -
> > -    param.mfx.FrameInfo.BitDepthLuma   = desc->comp[0].depth;
> > -    param.mfx.FrameInfo.BitDepthChroma = desc->comp[0].depth;
> > -    param.mfx.FrameInfo.Shift          = desc->comp[0].depth > 8;
> > -    param.mfx.FrameInfo.FourCC         = q->fourcc;
> > -    param.mfx.FrameInfo.Width          = frame_width;
> > -    param.mfx.FrameInfo.Height         = frame_height;
> > -    param.mfx.FrameInfo.ChromaFormat   =
> MFX_CHROMAFORMAT_YUV420;
> > -
> > -    switch (avctx->field_order) {
> > -    case AV_FIELD_PROGRESSIVE:
> > -        param.mfx.FrameInfo.PicStruct =
> MFX_PICSTRUCT_PROGRESSIVE;
> > -        break;
> > -    case AV_FIELD_TT:
> > -        param.mfx.FrameInfo.PicStruct = MFX_PICSTRUCT_FIELD_TFF;
> > -        break;
> > -    case AV_FIELD_BB:
> > -        param.mfx.FrameInfo.PicStruct = MFX_PICSTRUCT_FIELD_BFF;
> > -        break;
> > -    default:
> > -        param.mfx.FrameInfo.PicStruct = MFX_PICSTRUCT_UNKNOWN;
> > -        break;
> > -    }
> > +    return 0;
> > + }
> >
> > -    param.IOPattern   = q->iopattern;
> > -    param.AsyncDepth  = q->async_depth;
> > -    param.ExtParam    = q->ext_buffers;
> > -    param.NumExtParam = q->nb_ext_buffers;
> > +static int qsv_decode_init(AVCodecContext *avctx, QSVContext *q,
> > +mfxVideoParam *param) {
> > +    int ret;
> >
> > -    ret = MFXVideoDECODE_Init(q->session, &param);
> > +    avctx->width        = param->mfx.FrameInfo.CropW;
> > +    avctx->height       = param->mfx.FrameInfo.CropH;
> > +    avctx->coded_width  = param->mfx.FrameInfo.Width;
> > +    avctx->coded_height = param->mfx.FrameInfo.Height;
> > +    avctx->level        = param->mfx.CodecLevel;
> > +    avctx->profile      = param->mfx.CodecProfile;
> 
> This feels insufficient for the UNKNOWN cases?  (VP8 in particular has no
> profiles.)
> 
> > +    avctx->field_order  =
> ff_qsv_map_picstruct(param->mfx.FrameInfo.PicStruct);
> > +    avctx->pix_fmt      =
> ff_qsv_map_fourcc(param->mfx.FrameInfo.FourCC);
> > +
> > +    ret = MFXVideoDECODE_Init(q->session, param);
> >      if (ret < 0)
> >          return ff_qsv_print_error(avctx, ret,
> >                                    "Error initializing the MFX
> video
> > decoder");
> >
> > -    q->frame_info = param.mfx.FrameInfo;
> > +    q->frame_info = param->mfx.FrameInfo;
> > +
> > +    return 0;
> > +}
> > +
> > +static int qsv_decode_header(AVCodecContext *avctx, QSVContext *q,
> > +AVPacket *avpkt, enum AVPixelFormat *pix_fmts, mfxVideoParam
> *param) {
> > +    int ret;
> > +
> > +    mfxBitstream bs = { { { 0 } } };
> 
> { 0 }
> 
> > +
> > +    if (avpkt->size) {
> > +        bs.Data       = avpkt->data;
> > +        bs.DataLength = avpkt->size;
> > +        bs.MaxLength  = bs.DataLength;
> > +        bs.TimeStamp  = avpkt->pts;
> > +        if (avctx->field_order == AV_FIELD_PROGRESSIVE)
> > +            bs.DataFlag   |= MFX_BITSTREAM_COMPLETE_FRAME;
> > +    } else
> > +        return AVERROR_INVALIDDATA;
> > +
> > +
> > +    if(!q->session) {
> > +        ret = qsv_decode_preinit(avctx, q, pix_fmts, param);
> 
> I think you've misunderstood some of the mechanism of the get_format
> callback.  It can't be called before you have parsed out the stream
> information, because you don't know the necessary size and format
> information for the user to allocate the frames.  (Note that some of this is
> hinted if you use libavformat and copy the codec parameters from there,
> leading to cases where it mostly works in the ffmpeg utility but fails
> completely with libavcodec-only users.)

Thanks for your comment. But I think it should has been discussed on the patch V2 (https://patchwork.ffmpeg.org/patch/12112/ ) ?
[Mark] The reason for using the internal parsers is that you need the information before libmfx is initialized at all in the hw_frames_ctx case (i.e. before the
get_format callback which will supply the hardware context information), and once you require that anyway there isn't much point in parsing things twice for the same information.

[Zhong] As I see, there are very limited information needed before init a libmfx session (we must <and only need> init a libmfx session before call MFXVideoDECODE_DecodeHeader() ): There are resolution and pix_fmt. 
As you can see from my current implementation, I don't call internal parser before init the session, we are assuming a resolution (It may be provided from libavformat, but not must as Hendrik's comment) and pix_fmt (we assume it is NV12), and will correct it after MFXVideoDECODE_DecodeHeader(). It probably means we need to init the session twice for the first decoding call (e.g hevc/vp9 10bit clips, or resolution is not provided somewhere such as libavformat), but it is just happens when the first call (if header information is not changed after that) and the assumed resolution/pix_fmt is not correct.  
I think it is higher efficient than parse twice for every decoding calling, and it is not a workaround way since we still need to handling resolution/pix_fmt changing cases.

For libavcodec-only, it won't fails, the actual result is we need to call qsv_decode_preinit() twice: the first time is to allocate incorrect frames, after header parsed, we will check the format and resolution and find it is wrong, then will call qsv_decode_preinit() again to make sure a correct frame allocation. 

> Given the chicken-egg problem here for library initialisation, maybe what you
> want is to switch to using
> AV_CODEC_HW_CONFIG_METHOD_HW_DEVICE_CTX?
> 
> > +        if (ret < 0)
> > +            return ret;
> > +    }
> > +
> > +    ret = ff_qsv_codec_id_to_mfx(avctx->codec_id);
> > +    if (ret < 0)
> > +        return ret;
> > +
> > +    param->mfx.CodecId = ret;
> > +    ret = MFXVideoDECODE_DecodeHeader(q->session, &bs, param);
> > +    if (MFX_ERR_MORE_DATA == ret) {
> > +       return AVERROR(EAGAIN);
> > +    }
> > +    if (ret < 0)
> > +        return ff_qsv_print_error(avctx, ret,
> > +                "Error decoding stream header");
> >
> >      return 0;
> >  }
> > @@ -494,7 +516,10 @@ int ff_qsv_process_data(AVCodecContext *avctx,
> QSVContext *q,
> >      uint8_t *dummy_data;
> >      int dummy_size;
> >      int ret;
> > -    const AVPixFmtDescriptor *desc;
> > +    mfxVideoParam param = { 0 };
> > +    enum AVPixelFormat pix_fmts[3] = { AV_PIX_FMT_QSV,
> > +                                       AV_PIX_FMT_NV12,
> > +                                       AV_PIX_FMT_NONE };
> >
> >      if (!q->avctx_internal) {
> >          q->avctx_internal = avcodec_alloc_context3(NULL); @@
> -508,7
> > +533,6 @@ int ff_qsv_process_data(AVCodecContext *avctx, QSVContext
> *q,
> >              return AVERROR(ENOMEM);
> >
> >          q->parser->flags |= PARSER_FLAG_COMPLETE_FRAMES;
> > -        q->orig_pix_fmt   = AV_PIX_FMT_NONE;
> >      }
> >
> >      if (!pkt->size)
> > @@ -521,15 +545,17 @@ int ff_qsv_process_data(AVCodecContext
> *avctx, QSVContext *q,
> >                       pkt->data, pkt->size, pkt->pts, pkt->dts,
> >                       pkt->pos);
> >
> > -    avctx->field_order  = q->parser->field_order;
> >      /* TODO: flush delayed frames on reinit */
> > -    if (q->parser->format       != q->orig_pix_fmt    ||
> > -        FFALIGN(q->parser->coded_width, 16)  !=
> FFALIGN(avctx->coded_width, 16) ||
> > -        FFALIGN(q->parser->coded_height, 16) !=
> FFALIGN(avctx->coded_height, 16)) {
> > -        enum AVPixelFormat pix_fmts[3] = { AV_PIX_FMT_QSV,
> > -                                           AV_PIX_FMT_NONE,
> > -
> AV_PIX_FMT_NONE };
> > -        enum AVPixelFormat qsv_format;
> > +
> > +    // sw_pix_fmt was initialized as NV12, will be updated after header
> decoded if not true.
> > +    if (q->orig_pix_fmt != AV_PIX_FMT_NONE)
> > +        pix_fmts[1] = q->orig_pix_fmt;
> > +
> > +    ret = qsv_decode_header(avctx, q, pkt, pix_fmts, &param);
> > +
> > +    if (ret >= 0 && (q->orig_pix_fmt !=
> ff_qsv_map_fourcc(param.mfx.FrameInfo.FourCC) ||
> > +        avctx->coded_width  != param.mfx.FrameInfo.Width ||
> > +        avctx->coded_height != param.mfx.FrameInfo.Height)) {
> >          AVPacket zero_pkt = {0};
> >
> >          if (q->buffered_count) {
> > @@ -538,45 +564,24 @@ int ff_qsv_process_data(AVCodecContext
> *avctx, QSVContext *q,
> >              q->buffered_count--;
> >              return qsv_decode(avctx, q, frame, got_frame,
> &zero_pkt);
> >          }
> > -
> >          q->reinit_flag = 0;
> >
> > -        qsv_format = ff_qsv_map_pixfmt(q->parser->format,
> &q->fourcc);
> > -        if (qsv_format < 0) {
> > -            av_log(avctx, AV_LOG_ERROR,
> > -                   "Decoding pixel format '%s' is not supported\n",
> > -                   av_get_pix_fmt_name(q->parser->format));
> > -            ret = AVERROR(ENOSYS);
> > -            goto reinit_fail;
> > -        }
> > +        q->orig_pix_fmt = avctx->pix_fmt = pix_fmts[1] =
> > + ff_qsv_map_fourcc(param.mfx.FrameInfo.FourCC);
> >
> > -        q->orig_pix_fmt     = q->parser->format;
> > -        avctx->pix_fmt      = pix_fmts[1] = qsv_format;
> > -        avctx->width        = q->parser->width;
> > -        avctx->height       = q->parser->height;
> > -        avctx->coded_width  = FFALIGN(q->parser->coded_width, 16);
> > -        avctx->coded_height = FFALIGN(q->parser->coded_height, 16);
> > -        avctx->level        = q->avctx_internal->level;
> > -        avctx->profile      = q->avctx_internal->profile;
> > +        avctx->coded_width  = param.mfx.FrameInfo.Width;
> > +        avctx->coded_height = param.mfx.FrameInfo.Height;
> >
> > -        ret = ff_get_format(avctx, pix_fmts);
> > +        ret = qsv_decode_preinit(avctx, q, pix_fmts, &param);
> >          if (ret < 0)
> >              goto reinit_fail;
> > +        q->initialized = 0;
> > +    }
> >
> > -        avctx->pix_fmt = ret;
> > -
> > -        desc = av_pix_fmt_desc_get(avctx->pix_fmt);
> > -        if (!desc)
> > -            goto reinit_fail;
> > -
> > -         if (desc->comp[0].depth > 8) {
> > -            avctx->coded_width =  FFALIGN(q->parser->coded_width,
> 32);
> > -            avctx->coded_height = FFALIGN(q->parser->coded_height,
> 32);
> > -        }
> > -
> > -        ret = qsv_decode_init(avctx, q);
> > +    if (!q->initialized) {
> > +        ret = qsv_decode_init(avctx, q, &param);
> >          if (ret < 0)
> >              goto reinit_fail;
> > +        q->initialized = 1;
> >      }
> >
> >      return qsv_decode(avctx, q, frame, got_frame, pkt); @@ -589,4
> > +594,5 @@ reinit_fail:
> >  void ff_qsv_decode_flush(AVCodecContext *avctx, QSVContext *q)  {
> >      q->orig_pix_fmt = AV_PIX_FMT_NONE;
> > +    q->initialized = 0;
> >  }
> > diff --git a/libavcodec/qsvdec.h b/libavcodec/qsvdec.h index
> > 111536caba..4812fb2a6b 100644
> > --- a/libavcodec/qsvdec.h
> > +++ b/libavcodec/qsvdec.h
> > @@ -63,6 +63,8 @@ typedef struct QSVContext {
> >      uint32_t fourcc;
> >      mfxFrameInfo frame_info;
> >
> > +    int initialized;
> > +
> >      // options set by the caller
> >      int async_depth;
> >      int iopattern;
> > diff --git a/libavcodec/qsvdec_h2645.c b/libavcodec/qsvdec_h2645.c
> > index 9b49f5506e..eb1dc336a4 100644
> > --- a/libavcodec/qsvdec_h2645.c
> > +++ b/libavcodec/qsvdec_h2645.c
> > @@ -103,6 +103,7 @@ static av_cold int
> qsv_decode_init(AVCodecContext *avctx)
> >          }
> >      }
> >
> > +    s->qsv.orig_pix_fmt = AV_PIX_FMT_NV12;
> 
> This setting looks very suspicious?  The real value is in the stream header,
> and you don't want to do anything with pixfmts until you've read it.
> 
> >      s->packet_fifo = av_fifo_alloc(sizeof(AVPacket));
> >      if (!s->packet_fifo) {
> >          ret = AVERROR(ENOMEM);
> > diff --git a/libavcodec/qsvdec_other.c b/libavcodec/qsvdec_other.c
> > index 03251d2c85..a6f1b88ca0 100644
> > --- a/libavcodec/qsvdec_other.c
> > +++ b/libavcodec/qsvdec_other.c
> > @@ -90,6 +90,7 @@ static av_cold int qsv_decode_init(AVCodecContext
> *avctx)
> >      }
> >  #endif
> >
> > +    s->qsv.orig_pix_fmt = AV_PIX_FMT_NV12;
> >      s->packet_fifo = av_fifo_alloc(sizeof(AVPacket));
> >      if (!s->packet_fifo) {
> >          ret = AVERROR(ENOMEM);
> >
> 
> I have to admit I'm still sceptical that this change is useful.  It seems like it
> would be better to fix up your cases where libmfx barfs on streams with bad
> profile or level information, and then to add the parser support for your
> additional codecs.

Personally speaking, I don't think it can be supported well in the parser, at least for VP9. 
Just like our discussion on patch V2 (https://patchwork.ffmpeg.org/patch/12116/ ): 

[Mark] Similarly, you will need to extend the VP9 parser to return the relevant
information for the following patch so that it works in general rather than only in cases where the user can supply it externally.  It should be quite
straightforward; see 182cf170a544bce069c8690c90b49381150a1f10.

[Zhong] There are something different from vp8 for vp9. VP9 frame resolution probably can't be got from bitstream but may be referred from reference frames, see: https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/vp9.c#L565 
If parsing header is separated form decoding process, it will be a problem how to get the reference list information.

If you think that case can be resolved for VP9 parser, could you please provide more details?