[FFmpeg-devel] [PATCH, v2 2/2] lavc/vaapi_decode: recreate hw_frames_ctx for vp9 without destroy va_context

Wed Nov 20 09:06:23 EET 2019

On Fri, Sep 20, 2019 at 5:12 AM Fu, Linjie <linjie.fu at intel.com> wrote:

> > -----Original Message-----
> > From: ffmpeg-devel [mailto:ffmpeg-devel-bounces at ffmpeg.org] On Behalf
> > Of Fu, Linjie
> > Sent: Wednesday, September 11, 2019 00:02
> > To: ffmpeg-devel at ffmpeg.org
> > Subject: Re: [FFmpeg-devel] [PATCH, v2 2/2] lavc/vaapi_decode: recreate
> > hw_frames_ctx for vp9 without destroy va_context
> >
> > > -----Original Message-----
> > > From: ffmpeg-devel [mailto:ffmpeg-devel-bounces at ffmpeg.org] On
> > Behalf
> > > Of Mark Thompson
> > > Sent: Tuesday, September 10, 2019 08:02
> > > To: ffmpeg-devel at ffmpeg.org
> > > Subject: Re: [FFmpeg-devel] [PATCH, v2 2/2] lavc/vaapi_decode: recreate
> > > hw_frames_ctx for vp9 without destroy va_context
> > >
> > > On 09/09/2019 16:40, Fu, Linjie wrote:
> > > >> -----Original Message-----
> > > >> From: ffmpeg-devel [mailto:ffmpeg-devel-bounces at ffmpeg.org] On
> > > Behalf
> > > >> Of Fu, Linjie
> > > >> Sent: Friday, August 30, 2019 16:05
> > > >> To: FFmpeg development discussions and patches <ffmpeg-
> > > >> devel at ffmpeg.org>
> > > >> Subject: Re: [FFmpeg-devel] [PATCH, v2 2/2] lavc/vaapi_decode:
> > recreate
> > > >> hw_frames_ctx for vp9 without destroy va_context
> > > >>
> > > >>> -----Original Message-----
> > > >>> From: ffmpeg-devel [mailto:ffmpeg-devel-bounces at ffmpeg.org] On
> > > >> Behalf
> > > >>> Of Fu, Linjie
> > > >>> Sent: Friday, August 9, 2019 19:47
> > > >>> To: FFmpeg development discussions and patches <ffmpeg-
> > > >>> devel at ffmpeg.org>
> > > >>> Subject: Re: [FFmpeg-devel] [PATCH, v2 2/2] lavc/vaapi_decode:
> > > recreate
> > > >>> hw_frames_ctx for vp9 without destroy va_context
> > > >>>
> > > >>>> -----Original Message-----
> > > >>>> From: ffmpeg-devel [mailto:ffmpeg-devel-bounces at ffmpeg.org] On
> > > >>> Behalf
> > > >>>> Of Hendrik Leppkes
> > > >>>> Sent: Friday, August 9, 2019 17:40
> > > >>>> To: FFmpeg development discussions and patches <ffmpeg-
> > > >>>> devel at ffmpeg.org>
> > > >>>> Subject: Re: [FFmpeg-devel] [PATCH, v2 2/2] lavc/vaapi_decode:
> > > >> recreate
> > > >>>> hw_frames_ctx for vp9 without destroy va_context
> > > >>>>
> > > >>>> On Tue, Aug 6, 2019 at 10:55 AM Fu, Linjie <linjie.fu at intel.com>
> > wrote:
> > > >>>>>
> > > >>>>>> -----Original Message-----
> > > >>>>>> From: ffmpeg-devel [mailto:ffmpeg-devel-bounces at ffmpeg.org]
> > > On
> > > >>>> Behalf
> > > >>>>>> Of Hendrik Leppkes
> > > >>>>>> Sent: Tuesday, August 6, 2019 16:27
> > > >>>>>> To: FFmpeg development discussions and patches <ffmpeg-
> > > >>>>>> devel at ffmpeg.org>
> > > >>>>>> Subject: Re: [FFmpeg-devel] [PATCH, v2 2/2] lavc/vaapi_decode:
> > > >>>> recreate
> > > >>>>>> hw_frames_ctx for vp9 without destroy va_context
> > > >>>>>>
> > > >>>>>> On Thu, Jul 11, 2019 at 10:20 AM Linjie Fu <linjie.fu at intel.com
> >
> > > wrote:
> > > >>>>>>>
> > > >>>>>>> VP9 allows resolution changes per frame. Currently in VAAPI,
> > > >>> resolution
> > > >>>>>>> changes leads to va context destroy and reinit. This will cause
> > > >>>>>>> reference frame surface lost and produce garbage.
> > > >>>>>>>
> > > >>>>>>> Though refs surface id could be passed to media driver and
> found
> > > in
> > > >>>>>>> RTtbl, vp9RefList[] in hal layer has already been destroyed.
> Thus
> > > the
> > > >>>>>>> new created VaContext could only got an empty RefList.
> > > >>>>>>>
> > > >>>>>>> As libva allows re-create surface separately without changing
> the
> > > >>>>>>> context, this issue could be handled by only recreating
> > > >>> hw_frames_ctx.
> > > >>>>>>>
> > > >>>>>>> Set hwaccel_priv_data_keeping flag for vp9 to only recreating
> > > >>>>>>> hw_frame_ctx when dynamic resolution changing happens.
> > > >>>>>>>
> > > >>>>>>> Could be verified by:
> > > >>>>>>> ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 -i
> > > >>>>>>>   ./resolutions.ivf -pix_fmt p010le -f rawvideo -vframes 20 -y
> > > >> vaapi.yuv
> > > >>>>>>>
> > > >>>>>>> Signed-off-by: Linjie Fu <linjie.fu at intel.com>
> > > >>>>>>> ---
> > > >>>>>>>  libavcodec/decode.c        | 10 +++++-----
> > > >>>>>>>  libavcodec/internal.h      |  1 +
> > > >>>>>>>  libavcodec/pthread_frame.c |  2 ++
> > > >>>>>>>  libavcodec/vaapi_decode.c  | 40 ++++++++++++++++++++++----
> > --
> > > -
> > > >> --
> > > >>> --
> > > >>>> -----
> > > >>>>>> --
> > > >>>>>>>  libavcodec/vaapi_vp9.c     |  4 ++++
> > > >>>>>>>  5 files changed, 34 insertions(+), 23 deletions(-)
> > > >>>>>>>
> > > >>>>>>> diff --git a/libavcodec/decode.c b/libavcodec/decode.c
> > > >>>>>>> index 0863b82..7b15fa5 100644
> > > >>>>>>> --- a/libavcodec/decode.c
> > > >>>>>>> +++ b/libavcodec/decode.c
> > > >>>>>>> @@ -1254,7 +1254,6 @@ int
> > > >>>>>> ff_decode_get_hw_frames_ctx(AVCodecContext *avctx,
> > > >>>>>>>
> > > >>>>>>>      frames_ctx = (AVHWFramesContext*)avctx->hw_frames_ctx-
> > > >>>> data;
> > > >>>>>>>
> > > >>>>>>> -
> > > >>>>>>>      if (frames_ctx->initial_pool_size) {
> > > >>>>>>>          // We guarantee 4 base work surfaces. The function
> above
> > > >>>> guarantees
> > > >>>>>> 1
> > > >>>>>>>          // (the absolute minimum), so add the missing count.
> > > >>>>>>
> > > >>>>>> Unrelated whitespace change
> > > >>>>>
> > > >>>>> There is  a redundant whitespace here, so I removed it within
> this
> > > patch.
> > > >>>>>
> > > >>>>>>> @@ -1333,7 +1332,7 @@ static int hwaccel_init(AVCodecContext
> > > >>>> *avctx,
> > > >>>>>>>          return AVERROR_PATCHWELCOME;
> > > >>>>>>>      }
> > > >>>>>>>
> > > >>>>>>> -    if (hwaccel->priv_data_size) {
> > > >>>>>>> +    if (hwaccel->priv_data_size && !avctx->internal-
> > > >>>>> hwaccel_priv_data) {
> > > >>>>>>>          avctx->internal->hwaccel_priv_data =
> > > >>>>>>>              av_mallocz(hwaccel->priv_data_size);
> > > >>>>>>>          if (!avctx->internal->hwaccel_priv_data)
> > > >>>>>>> @@ -1396,9 +1395,10 @@ int ff_get_format(AVCodecContext
> > > >> *avctx,
> > > >>>>>> const enum AVPixelFormat *fmt)
> > > >>>>>>>      memcpy(choices, fmt, (n + 1) * sizeof(*choices));
> > > >>>>>>>
> > > >>>>>>>      for (;;) {
> > > >>>>>>> -        // Remove the previous hwaccel, if there was one.
> > > >>>>>>> -        hwaccel_uninit(avctx);
> > > >>>>>>> -
> > > >>>>>>> +        // Remove the previous hwaccel, if there was one,
> > > >>>>>>> +        // and no need for keeping.
> > > >>>>>>> +        if (!avctx->internal->hwaccel_priv_data_keeping)
> > > >>>>>>> +            hwaccel_uninit(avctx);
> > > >>>>>>>          user_choice = avctx->get_format(avctx, choices);
> > > >>>>>>>          if (user_choice == AV_PIX_FMT_NONE) {
> > > >>>>>>>              // Explicitly chose nothing, give up.
> > > >>>>>>
> > > >>>>>> There could be a dozen special cases how stuff can go wrong
> here.
> > > >>> What
> > > >>>>>> if get_format actually returns a different format then the one
> > > >>>>>> currently in use? Or a software format?
> > > >>>>>> Just removing this alone is not safe.
> > > >>>>>
> > > >>>>> Didn't quite get your point.
> > > >>>>> IMHO,  avctx->internal->hwaccel_priv_data_keeping won't be set in
> > > >>> other
> > > >>>> cases
> > > >>>>> other than vaapi_vp9, so this patch won't break the default
> pipeline,
> > > >> and
> > > >>>>> hwaccel_uninit(avctx) will always be called.
> > > >>>>>
> > > >>>>
> > > >>>> The point is that you cannot rely on get_format to return the same
> > > >>>> format that it previously did. It could return a software format,
> or
> > > >>>> in some cases possibly even a different hardware format. And you
> > just
> > > >>>> don't handle that.
> > > >>>
> > > >>> Got it. Thanks for the explanation, it should be reconsidered in
> case it
> > > >>> happens.
> > > >>>
> > > >>>> The entire approach here smells a bit of hack. Lets try to think
> this
> > > >>>> through and do it properly. One idea that comes to mind is a new
> > > >>>> hwaccel callback that checks if a in-place re-init is possible
> without
> > > >>>> destroying everything. This could be used for a multitude of
> different
> > > >>>> situations, and not just this one special case.
> > > >>>
> > > >>> Sounds great, and just FYI, this similar issue is reproduced with
> > > >> nvdec/dxva2
> > > >>> as well. Clips and some details are provided on trac #8068 in case
> you
> > and
> > > >>> other developers may be interested in or need to verify your
> solution.
> > > >>> http://trac.ffmpeg.org/ticket/8068
> > > >>
> > > >> Any step-further progress for the hwaccel callback methods or
> > something
> > > I
> > > >> can
> > > >> help to fix this gap?
> > > >>
> > > >
> > > > Ping?
> > > > A general solution works for multitude situation is great to me, and
> how
> > > about having
> > > > one solution specific for vp9 which introduces no regression as the
> first
> > > step,
> > > > since there are lots of cases(1400 +) failed/blocked and could be
> fixed by
> > > this patch.
> > > >
> > > > This blocked quite a lot, please comment what I can do to get this
> step
> > > further.
> > >
> > > I still don't understand how the error here can be in FFmpeg - it
> looks more
> > > like a driver problem to me.
> > >
> > > The sequence with VAAPI using HW_CONFIG_METHOD_HW_DEVICE_CTX
> > on
> > > your lena_resolution_change_on_inter_frame.ivf should be as follows:
> > >
> > > 1.  The header of frame 0 is read, it's a key frame with resolution is
> 352x288.
> > > 2.  The user-supplied get_format() is called, they pick
> AV_PIX_FMT_VAAPI
> > > and supply a VAAPI device.
> > > 3.  Output/reference surfaces are created in a new frames context at
> > > 352x288.
> > > 4.  A decoder context is created for VP9 at 352x288, rendering to the
> > surfaces
> > > created in the previous step.
> > > 5.  Frame 0 is decoded and placed in all reference slots.
> > > 6.  Frames 1-49 are decoded normally, they overwrite slots 0 and 1
> only.
> > > 7.  The header of frame 50 is read, it's an inter frame but with a new
> > > resolution of 240x196.
> > > 8.  The old decoder context is discarded, since it has the wrong
> resolution
> > and
> > > is bound to the wrong render targets.
> > > 9.  The old frames context is unreferenced, but references remain to
> its
> > > frames in slots 2-7 so the actual frames themselves stay around.
> > > 10. The user-supplied get_format() is called, they pick
> AV_PIX_FMT_VAAPI
> > > again.
> > > 11. Output/reference surfaces are created in a new frames context at
> > > 240x196.
> > > 12. A new decoder context is created for VP9 at 240x196, rendering to
> the
> > > new surfaces.
> > > 13. Frame 50 is decoded with reference to frame slots 0, 1 and 2
> (those are
> > all
> > > in the old frames context and have the old resolution); the result is
> placed
> > in
> > > slots 0 and 1.
> > > 14. Frames 51-100 are all decoded with reference to slots 0, 1 and 2,
> > > overwriting slots 0 and/or 1 only (in every case slot 2 still contains
> the
> > original
> > > key frame).
> > >
> > > (Using HW_CONFIG_METHOD_HW_FRAMES_CTX the main difference is
> > that
> > > steps 3 and 11 would be removed, replaced by user action in the
> > get_format()
> > > callback in the steps immediately preceding them.)
> > >
> > > The iHD driver indeed returns "internal error" immediately on step 13.
> > > However, looking at traces made with libva everything about that render
> > call
> > > looks correct - the decoder context is the new one which matches the
> > > resolution and render target, and the right surfaces are provided as
> > > reference frames (in the old frames context, but definitely haven't
> been
> > > destroyed).
> > >
> > > So, can you explain more about what is going wrong?
> > >
> >
> > Hi Mark,
> >
> > Thanks for the detailed response. If I got it correctly, the concern is
> mainly
> > about
> > "since the reference surface is definitely not destroyed,
> destroy/recreate
> > context
> > shouldn't have blocked the decode for vp9"
> >
> > Actually, there are some intermedia dependencies in driver beside
> reference
> > frame itself.
> > For example,
> > 1. Motion vector dependency.
> >
> > As chapter 5.13 (motion vector prediction )in VP9 spec has said:
> > "When the spatial neighbors do not provide sufficient information, it
> can fall
> > back to using the
> > Motion vectors from the previous decoded frames".
> >
> > MV buffer is the dependency across DPB, driver have to allocate the
> internal
> > buffer to store the
> > MV information associated to each frame on DPB.
> > We need previous frame’s mv, and we hold mv  in decoder context.
> > Destroying the context would
> > cause the loss of mv information.
> >
> > 2.Besides, segmentation map buffer is the another dependency.
> >
> > It is allocated in driver as well, and could be updated by every frame
> and
> > impact next frame.
> > The segmentation map is coded differentially across frames in order to
> > minimize the bit overheads.
> >
> > As a consequence, context destroying would block the decode for clips
> with
> > resolution changing on
> > inter frames.
> >
> A friendly ping in case this message is missed.
>
> - linjie
> _______________________________________________
>

Generally : LGTM ,
any other comments ?

regards
Max