[FFmpeg-devel] [PATCH] libavcodec/qsv.c: Re-design session control and internal allocation

Mon Oct 26 19:50:22 CET 2015

On Mon, 26 Oct 2015 15:01:50 +0100
Gwenole Beauchesne <gb.devel at gmail.com> wrote:

> 2015-10-26 12:44 GMT+01:00 wm4 <nfxjfg at googlemail.com>:
> > On Mon, 26 Oct 2015 11:22:38 +0100
> > Gwenole Beauchesne <gb.devel at gmail.com> wrote:
> >
> >  
> >> >> /**
> >> >>  * Hardware Accelerator identifier.
> >> >>  *
> >> >>  * @note
> >> >>  * A hardware accelerator can be device-less. This means that only the
> >> >>  * underlying hardware resource, e.g. a Linux dma_buf handle, is being
> >> >>  * transported in the AVFrame. That hardware resource could be mapped
> >> >>  * through standard OS-dependent calls, e.g. mmap() on Linux.
> >> >>  */
> >> >> enum AVHWAccelId {
> >> >>     AV_HWACCEL_ID_NONE = -1,
> >> >>     AV_HWACCEL_ID_VAAPI,
> >> >>     AV_HWACCEL_ID_NB,           ///< Not part of ABI
> >> >> };
> >> >>
> >> >> Name to be improved if people have better suggestions, as this really
> >> >> is to be seen as HW resource, not necessarily attached to a particular
> >> >> HW device. i.e. this could be a dma_buf handle from a V4L2 buffer or
> >> >> VA surface.  
> >> >
> >> > OK. (Minor nit: if ID_NONE is valid and means HW API without context,
> >> > maybe it should be 0, not -1. Also, if it was meant this way, maybe
> >> > these should still have their own ID for other purposes.)  
> >>
> >> In my current model, ID_NONE is not meant to be valid because the
> >> hwaccel side data shall only exist for hwaccel purposes. Besides,
> >> having ID_NONE set to -1 is consistent with other liavu enums and
> >> convenient to have ID_NB express directly the exact number of
> >> hwaccels.  
> >
> > OK, this makes sense to me.
> >  
> >> >> I am reworking the patch series as I changed my mind again: current
> >> >> map strategy was overly complex (and required to be). There were at
> >> >> least 4 map flags: AV_FRAME_MAP_{READ,WRITE,READ_FIRST,USWC_MEMORY}. I
> >> >> am now preferring a unique av_hwaccel_frame_get_pixels() defined as
> >> >> follow:
> >> >>
> >> >> /**
> >> >>  * Returns AVFrame pixels into linear memory
> >> >>  *
> >> >>  * This function takes a snapshot of the underlying HW surface and
> >> >>  * exposes it to SW backed memory. This may involve a copy from GPU
> >> >>  * memory to CPU memory.
> >> >>  *
> >> >>  * @note
> >> >>  * There is no effort underway to commit the modified pixels back to
> >> >>  * GPU memory when the \ref dst AVFrame is released.
> >> >>  *
> >> >>  * @param[in] src       the source frame to read
> >> >>  * @param[inout] dst    the target frame to update, or create if NULL
> >> >>  * @param[in] flags     an optional combination of AV_FRAME_FLAG_xxx flags
> >> >>  * @return 0 on success, an AVERROR code on failure.
> >> >>  */
> >> >> int
> >> >> av_hwaccel_frame_get_pixels(AVFrame *src, AVFrame **dst, unsigned flags);
> >> >>
> >> >> i.e. the cost of allocating and copying AVFrame metadata should be
> >> >> less than the actual work needed behind the scene. So, it looks like a
> >> >> better interim solution for starters.  
> >> >
> >> > So this is for read-access only, right? If it can map data, there
> >> > also needs to be an unmap function, and the user would have to know
> >> > about when to use it.  
> >>
> >> Well, put can be implementing by reversing src/dst in this function. :)
> >> Actually, this can be av_hwaccel_frame_copy(), but the benefit of
> >> having get_pixels() is to leave out the allocation business to lavu
> >> and just having the user to bother about _unref().  
> >
> > Also makes sense to me.
> >
> > What is a problem is that mapped frames and CPU frames (let's call pure
> > CPU-allocated surfaces that) are not exactly the same thing. If the API
> > user assumes the frame is a CPU frame, it might reference it for a long
> > time, which would cause various problems. On the other hand, you don't
> > want the user to force copying a frame if it's really a CPU frame.
> >
> > Maybe this is not really a problem. I'm just mentioning it as another
> > detail.
> >  
> >> >> For compatibility, that's also the idea behind another generic
> >> >> AV_PIX_FMT_HWACCEL that would enforce data[i] to be clear of any
> >> >> user-supplied pointers, and buf[i] shall be filled in by appropriate
> >> >> accessors, or while creating the side-data, e.g.
> >> >> av_vaapi_frame_create_side_data(). i.e. when lavc swallows up an
> >> >> AVFrame with new-style hwaccel, this is where the AV_PIX_FMT_VAAPI
> >> >> would be replaced with AV_PIX_FMT_HWACCEL. Replace "swallows up" with
> >> >> e.g. av_vaapi_frame_convert_in_place() if you prefer. Otherwise, IMHO,
> >> >> the old-style fields should live untouched, hence the need to keep the
> >> >> hwaccel side-data around.  
> >> >
> >> > Isn't the problem more about output?
> >> >
> >> > Again, there's the problem with the current hwaccel API selecting the
> >> > hwaccel with get_format(), just using the hwaccel-specific pixfmt.  
> >>
> >> I also envision a need for AVCodecContext.hwaccel_id field + possibly
> >> .get_hwaccel(). Just so that to depart from that pixfmt tie.  
> >
> > There are some of us who would like this. Of course it makes the API
> > change larger.  
> 
> I don't think it would be too larger. I rather think the move would be
> less intrusive, thus smoother.

I'm just thinking in terms of API changes, and your proposed changes
are major. Basically, everything changes. I'm still fine with that, but
the more is changed, the more work it is, obviously.

> My vision is:
> - You want hwaccel: you report those you are interested in/support
> through get_hwaccel() ;
>   + should you need hwaccel-specific things, you notify lavc through
> appropriate functions like e.g. av_vaapi_set_pipeline_params().
> - You don't want hwaccel, or happen to fallback to SW decoding,
> get_pixfmt() is all fine and you can also further use your
> get_buffer2() supplied buffers. i.e. not have to think in a
> .get_buffer2() implementation: am I hwaccel (and which one...), or sw
> decoding?

Still sounds fine to me.

> Besides, this will simplify hwaccel work terribly IMHO. Imagine when
> we come to support 10, 12, 14bpp ; would we grow the pixfmt by all
> that additional information? No, this is not needed. My advice and
> intent is: we should stop polluting the pixfmt namespace with hwaccel
> specific things.

Not sure how this is related. It seems to be completely. Orthogonal.
The vdpau hwaccel code already shows how to handle 4:4:4 formats with
the current hwaccel API.

> > Also, I do find it useful to have pixfmt distinguish
> > between underlying surface types (i.e. the hwaccel API). For example,
> > if we add support for hw filters to libavfilter, how would you prevent
> > that a vdpau filter takes vaapi surfaces as input?  
> 
> An AVFrame would have that AVHWAccelFrame attached to it. The
> hwaccel_id is in there, and can be matched. This could also be made to
> work, at the expense of some penalty: i.e. call into
> av_hwaccel_frame_get_pixels() / copy / whatever to CPU backed memory,
> and then re-upload to a VdpSurface. In this precise situation, this is
> where multiple hwaccels into the meta hwaccel metadata could have been
> interesting to have, instead of the one
> hwaccel-specific-metadata-forever approach.

libavfilter does format negotiation based on pixel formats, and adding
something new there is not trivial. We haven't even gotten rid of the J
pixfmts (duplicated pixfmts that signal what color_range normally
does). I'm just saying that picking a single format might inconvenience
us, while also being not overly useful.

(Makes me wonder if the fabled "pixformaton" wasn't the right
approach.)

> > So I'm not sure if a single AV_PIX_FMT_HWACCEL is the way to go, even
> > if we make access to hwaccel AVFrames somewhat more uniform.  
> 
> It's just a way to signal the user that this is an hwaccel frame
> without going through pixfmt -> pixdesc + check for hwaccel flags.
> i.e. a hint to: "hey, don't try to read in pixels through
> AVFrame.data[]!" If you want the pixels, we can probably consider that
> av_frame_get_pixels() an dest would just return a ref to src in case
> of CPU buffers. So no copies in there either.

So let's add an av_frame_is_hwaccel().

It's not possible to access ffmpeg AVFrames in a generic way that works
for all pixfmts anyway. (It depends what you do, but usually code which
tries to be generic has to put a lot of effort into excluding formats
which don't. Checking for hwaccel is just another minor check.)

> >> > Also, AVFrame.buf[] should cover the memory referenced by
> >> > AVFrame.data[]. It's merely a refcount helper for AVFrame.data[], and
> >> > should not do additional things.
> >> >
> >> > I think using AVFrame side data for this would be a bit awkward.
> >> > Possibly it _could_ be used to store things like VADisplay if we don't
> >> > find a better way, but I think having a AVHWAccelFrame would be better.  
> >>
> >> Side data is quite simple to use, and ref-counted easily. I didn't
> >> want to touch to AVFrame fields. Though, it's of course possible to
> >> extend it with either public or external fields.  
> >
> > Side data is really just for things by the "side", not information that
> > is critical and central.  
> 
> IMHO, it's not that critical or central. At least, definitely not for
> users. It's internal information, with some ways to expose enough
> information in a consistent manner. Besides, metadata is convenient as
> it can be attached to an existing AVFrame. Use-case: you have a SW
> AVFrame, you then import/wrap into a VA frame for further processing
> for instance. That VA frame is not fundamentally needed to have the
> AVFrame work in non-vaapi case. It's side-data :)

Uh, so we could wrap planes 1-3 in side data, because RGB proves that
only plane 0 is fundamentally needed for AVFrames?

Yes, sure, the most generic way to represent things is as some sort of
generic dictionary object. But We don't do this because it's
inconvenient and confusing, and we prefer structs with fields instead.
So some information gets a very prominent place (like the plane
pointers), while other things just need to be somehow passed through,
and which we want to have out of our way normally, and which are
fundamentally optional (like A53 caption data).

And in my opinion, the pointer to the struct which contains the
hardware surface ID is pretty important and fulfills the same role as
the plane pointers with SW surfaces. And they're not optional either;
not even e.g. the VADisplay handle is.

(It's also a great topic to bikeshed about.)

> > Maybe we agree that there's no technical issue here, and it's only a
> > matter of API design and "taste".  
> 
> Indeed, it doesn't really matter, that's why is called through
> av_hwaccel_frame_get(). This can be anything else under the hood.
> Preferably, side-data or extra "internal" field, for my personal
> taste. I'd prefer to not expose fields directly, unless really needed
> or convenient.

Well, that part is orthogonal. The user probably doesn't _have_ to use
these fields, just like an API user doing transcoding with libavcodec
likely never accesses the plane data of SW AVFrames directly.

> >  
> >> >> >  
> >> >> >> PS: other benefit of the AVHWAccelFrame side-data is that I can stuff
> >> >> >> crop information into there. Since this is only useful to hwaccel, no
> >> >> >> need to populate AVFrame with additional fields IMHO.  
> >> >> >
> >> >> > IMHO, crop information should be generally available, even for software
> >> >> > surfaces. What we currently do are terrible hacks: align the X/Y
> >> >> > coordinates to chroma boundaries and adjust the pointer (meaning
> >> >> > everyone has to do with possibly unaligned pointers, and non-mod-2
> >> >> > crops don't work correctly), and this also means you can have a
> >> >> > non-mod-2 width/height, which doesn't really make sense for chroma.  
> >> >>
> >> >> I don't really see why this would be needed for SW. AVFrame.buf[] will
> >> >> hold the buffers as in "the original allocation". Then AVFrame.data[]
> >> >> shall be filled in to fit the actual cropped/decoded region. Isn't it?  
> >> >
> >> > Yes, AVFrame.buf[] does this, but you still don't know e.g. the
> >> > original width, or even the pointer to a plane's original (0, 0) pixel
> >> > if AVFrame.buf[0] covers all planes.
> >> >
> >> > I think doing cropping as metadata would also simplify code a lot. For
> >> > example, nobody has to worry about non-mod-2 yuv420 anymore, and how it
> >> > should be handled. It's less tricky, more correct, more efficient.  
> >>
> >> OK. A crop side-data is desired then. I somehow was convinced that it
> >> wouldn't matter for SW. Though, it would actually be really need that  
> >
> > At least this is my opinion. I would like to have such cropping side
> > data, instead of half-broken ad-hoc cropping in the decoder and things
> > like coded_width.
> >
> > I don't know what most others think about this. I suspect most would
> > find such a change too intrusive. At least for hwaccel it's mandatory
> > though. (What we currently do just barely works, and I need hacks in my
> > own code to reconstruct the real surface size.)
> >  
> >> the user doesn't have to care about anything and just use .data[]
> >> appropriately. So, probably have internal_data[] fields for that SIMD
> >> alignment needs?  
> >
> > This reminds me of AVFrame.base[], which was removed 2 years ago.
> >
> > I'm fairly sure a cropping rect would be cleaner.  
> 
> I think this would be terribly painful. Really, the user should not
> care, and only use .data[], or get_pixels() beforehand. The
> AVFrame.base[] was actually a good idea, so that decoders know where
> to start. Crop info specific for hwaccel-only is reasonable to me
> since, we (I) don't aim at making direct mappings possible, and this
> is something that would only be useful internally (FFmpeg in general),
> or at least taken care implicitly through _copy() functions for
> instance.

AVFrame.base[] is even more painful IMHO, because it's not as clear as a
concept. The original base[] field was also quite ill-defined.