[FFmpeg-devel] [PATCH] libavcodec/qsv.c: Re-design session control and internal allocation

Fri Oct 23 16:12:07 CEST 2015

Hi,

2015-10-23 15:41 GMT+02:00 wm4 <nfxjfg at googlemail.com>:
> On Fri, 23 Oct 2015 14:54:56 +0200
> Gwenole Beauchesne <gb.devel at gmail.com> wrote:
>
>> Hi,
>>
>> 2015-10-07 18:40 GMT+02:00 wm4 <nfxjfg at googlemail.com>:
>> > On Wed, 7 Oct 2015 19:20:56 +0300
>> > Ivan Uskov <ivan.uskov at nablet.com> wrote:
>> >
>> >> Hello Hendrik,
>> >>
>> >> Wednesday, October 7, 2015, 5:58:25 PM, you wrote:
>> >>
>> >> HL> On Wed, Oct 7, 2015 at 4:41 PM, Ivan Uskov <ivan.uskov at nablet.com> wrote:
>> >>
>> >> HL> Global static variables are not acceptable, sorry.
>> >> HL> You'll have to find another way to solve your problem, but global
>> >> HL> state is not the way to go.
>> >> Unfortunately   I   do   not   have   ideas  how to provide single and common
>> >> memory  block  for  separate    modules   by  another  way.   Memory  mapping
>> >> files  looks not too portable and much more bulky solution then one global variable.
>> >> I  do  not  see  the  way to use appropriate field of AVCodecContext to share
>> >> global data.
>> >> Has anybody any ideas?
>> >> Is  me  missed  something?  There is really the necessary to have a global common
>> >> structure shared between decoder, vpp and decoder.
>> >>
>> >
>> > There's no automagic way to get this done.
>> >
>> > Hardware accelerators like vaapi and vdpau need the same thing. These
>> > have special APIs to set an API context on the decoder. Some APIs use
>> > hwaccel_context, vdpau uses av_vdpau_bind_context().
>> >
>> > The hwaccel_context pointer is untyped (just a void*), and what it means
>> > is implicit to the hwaccel or the decoder that is used. It could point
>> > to some sort of qsv state, which will have to be explicitly allocated
>> > and set by the API user (which might be ffmpeg.c).
>> >
>> > For filters there is no such thing yet. New API would have to be
>> > created. For filters in particular, we will have to make sure that the
>> > API isn't overly qsv-specific, and that it is extendable to other APIs
>> > (like for example vaapi or vdpau).
>>
>> I have been looking into a slightly different way: the common object
>> being transported is an AVFrame. So, my initial approach is to create
>> an AV_FRAME_DATA_HWACCEL metadata. Lookups could be optimized by
>> keeping around an AVFrameInternal struct that resides in the same
>> allocation unit as the AVFrame. But, this is a detail.
>>
>> From there, there are at least two usage models, when it comes to filters too:
>>
>> 1. Make the AVHWAccelFrame struct hold multiple hwaccel-specific info,
>> with a master one, and slave ones for various different APIs.
>> Advantage: a single AVFrame can be used and impersonified whenever
>> necessary. e.g. a VA surface master could be exported/re-used with an
>> mfxSurface, a dma_buf (for OpenCL), or userptr buffer. Drawback:
>> terribly tedious to manage.
>>
>> 2. Simpler approach: the AVHWAccelFrame holds a unique struct to
>> hwaccel specific data. Should we need to export that for use with
>> another API, it's not too complicated to av_frame_ref() + add new
>> hwaccel-specific metadata.
>
> It could be something like an API identifier (an enum or so) + API
> specific pointer to a struct.

That's exactly that:

/**
 * Hardware Accelerator identifier.
 *
 * @note
 * A hardware accelerator can be device-less. This means that only the
 * underlying hardware resource, e.g. a Linux dma_buf handle, is being
 * transported in the AVFrame. That hardware resource could be mapped
 * through standard OS-dependent calls, e.g. mmap() on Linux.
 */
enum AVHWAccelId {
    AV_HWACCEL_ID_NONE = -1,
    AV_HWACCEL_ID_VAAPI,
    AV_HWACCEL_ID_NB,           ///< Not part of ABI
};

Name to be improved if people have better suggestions, as this really
is to be seen as HW resource, not necessarily attached to a particular
HW device. i.e. this could be a dma_buf handle from a V4L2 buffer or
VA surface.

I am reworking the patch series as I changed my mind again: current
map strategy was overly complex (and required to be). There were at
least 4 map flags: AV_FRAME_MAP_{READ,WRITE,READ_FIRST,USWC_MEMORY}. I
am now preferring a unique av_hwaccel_frame_get_pixels() defined as
follow:

/**
 * Returns AVFrame pixels into linear memory
 *
 * This function takes a snapshot of the underlying HW surface and
 * exposes it to SW backed memory. This may involve a copy from GPU
 * memory to CPU memory.
 *
 * @note
 * There is no effort underway to commit the modified pixels back to
 * GPU memory when the \ref dst AVFrame is released.
 *
 * @param[in] src       the source frame to read
 * @param[inout] dst    the target frame to update, or create if NULL
 * @param[in] flags     an optional combination of AV_FRAME_FLAG_xxx flags
 * @return 0 on success, an AVERROR code on failure.
 */
int
av_hwaccel_frame_get_pixels(AVFrame *src, AVFrame **dst, unsigned flags);

i.e. the cost of allocating and copying AVFrame metadata should be
less than the actual work needed behind the scene. So, it looks like a
better interim solution for starters.

>
>> For VA-API specific purposes, I then have:
>> - AVVADisplay, which is refcounted, and that can handle automatic
>> initialize/terminate calls ;
>> - AVVAFrameBuffer that holds an { AVVADisplay, VASurfaceID, and
>> possibly VAImageID } tuple (for mappings in non-USWC memory), and that
>> populates AVFrame.buf[0].
>
> Sounds good.
>
>> I am undecided yet on whether I'd create an AV_PIX_FMT_HWACCEL format
>> and allow hwaccel specific AVFrame to absorb an existing AVFrame, as a
>> transitional method from existing vaapi hwaccel to "new" vaapi
>> hwaccel. In that new generic hwaccel format model, buf[0] et al. would
>> be clearly defined, and data[i] not supposed to be touched by user
>> application. For now, the trend towards that option is low in my mind.
>
> I'd still have different pixfmts for each hwaccel, even if they behave
> the same. (The main reason being that hw accel init via get_format()
> requires it.)
>
> IMHO, one AVFrame plane pointer should point to your suggested
> AVHWAccelFrame. This would make more sense with how AVFrame
> refcounting works in the general case.
>
> AVFrame specifically demands that AVFrame.buf[] covers all memory
> pointed to by AVFrame.data. This doesn't make much sense with the
> current API, where AVFrame.data[3] is an API specific surface ID
> (usually an integer casted to a pointer), and the AVBufferRef doesn't
> really make any sense.

Yes, that's also what I wanted to get rid of, at least for VA-API with
lavc managed objects.

> With the new suggestion, the AVBufferRef would cover the memory used by
> AVHWAccelFrame. While not particularly useful, this is consistent, and
> also frees API users from worrying about what the AVBufferRef data/size
> fields should be set to.
>
> As for compatiblity with old code: maybe AVFrame.data[1] could
> point to something like this. But I think it's better to make a clean
> break.

When is the next major bump scheduled to happen BTW? :)

For compatibility, that's also the idea behind another generic
AV_PIX_FMT_HWACCEL that would enforce data[i] to be clear of any
user-supplied pointers, and buf[i] shall be filled in by appropriate
accessors, or while creating the side-data, e.g.
av_vaapi_frame_create_side_data(). i.e. when lavc swallows up an
AVFrame with new-style hwaccel, this is where the AV_PIX_FMT_VAAPI
would be replaced with AV_PIX_FMT_HWACCEL. Replace "swallows up" with
e.g. av_vaapi_frame_convert_in_place() if you prefer. Otherwise, IMHO,
the old-style fields should live untouched, hence the need to keep the
hwaccel side-data around.

>
>> PS: other benefit of the AVHWAccelFrame side-data is that I can stuff
>> crop information into there. Since this is only useful to hwaccel, no
>> need to populate AVFrame with additional fields IMHO.
>
> IMHO, crop information should be generally available, even for software
> surfaces. What we currently do are terrible hacks: align the X/Y
> coordinates to chroma boundaries and adjust the pointer (meaning
> everyone has to do with possibly unaligned pointers, and non-mod-2
> crops don't work correctly), and this also means you can have a
> non-mod-2 width/height, which doesn't really make sense for chroma.

I don't really see why this would be needed for SW. AVFrame.buf[] will
hold the buffers as in "the original allocation". Then AVFrame.data[]
shall be filled in to fit the actual cropped/decoded region. Isn't it?

>
> Libav has a patchset for passing around cropping, but IMO it's too
> complex (allows multiple cropping rects...).
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Regards,
-- 
Gwenole Beauchesne
Intel Corporation SAS / 2 rue de Paris, 92196 Meudon Cedex, France
Registration Number (RCS): Nanterre B 302 456 199