[FFmpeg-devel] [RFC] amfenc: Add support for OpenCL input

Mironov, Mikhail Mikhail.Mironov at amd.com
Tue Jan 23 17:50:27 EET 2018


> -----Original Message-----
> From: ffmpeg-devel [mailto:ffmpeg-devel-bounces at ffmpeg.org] On Behalf
> Of Mark Thompson
> Sent: January 23, 2018 10:41 AM
> To: ffmpeg-devel at ffmpeg.org
> Subject: Re: [FFmpeg-devel] [RFC] amfenc: Add support for OpenCL input
> 
> On 23/01/18 15:14, Mironov, Mikhail wrote:
> >> -----Original Message-----
> >> From: ffmpeg-devel [mailto:ffmpeg-devel-bounces at ffmpeg.org] On
> Behalf
> >> Of Mironov, Mikhail
> >> Sent: January 23, 2018 10:04 AM
> >> To: FFmpeg development discussions and patches <ffmpeg-
> >> devel at ffmpeg.org>
> >> Subject: Re: [FFmpeg-devel] [RFC] amfenc: Add support for OpenCL
> >> input
> >>
> >>> -----Original Message-----
> >>> From: ffmpeg-devel [mailto:ffmpeg-devel-bounces at ffmpeg.org] On
> >>> Behalf Of Mark Thompson
> >>> Sent: January 22, 2018 6:57 PM
> >>> To: FFmpeg development discussions and patches <ffmpeg-
> >>> devel at ffmpeg.org>
> >>> Subject: [FFmpeg-devel] [RFC] amfenc: Add support for OpenCL input
> >>>
> >>> ---
> >>> This allows passing OpenCL frames to AMF without a download/upload
> >>> step to get around AMD's lack of support for D3D11 mapping.
> >>>
> >>> For example:
> >>>
> >>> ./ffmpeg -hwaccel dxva2 -hwaccel_output_format dxva2_vld -i
> >>> input.mp4 -an -vf
> >>>
> >>
> 'hwmap=derive_device=opencl,program_opencl=source=examples.cl:kernel=
> >>> rotate_image' -c:v h264_amf output.mp4
> >>>
> >>> * I can't find any documentation or examples for these functions, so
> >>> I'm guessing a bit exactly how they are meant to work.  In
> >>> particular, there are some locking functions which I have ignored
> >>> because I have no idea under what circumstances something might want
> to be locked.
> >>> * I tried to write common parts with D3D11, but I might well have
> >>> broken
> >>> D3D11 support in the process - it doesn't work at all for me so I can't test
> it.
> >>> * Not sure how to get non-NV12 to work.  I may be missing something,
> >>> or it may just not be there - the trace messages suggest it doesn't
> >>> like the width of
> >>> RGB0 or the second plane of GRAY8.
> >>>
> >>> - Mark
> >>>
> >>>
> >>>  libavcodec/amfenc.c | 178
> >>> +++++++++++++++++++++++++++++++++++---------
> >>> --------
> >>>  libavcodec/amfenc.h |   1 +
> >>>  2 files changed, 123 insertions(+), 56 deletions(-)
> >>>
> >>> diff --git a/libavcodec/amfenc.c b/libavcodec/amfenc.c index
> >>> 89a10ff253..220cdd278f 100644
> >>> --- a/libavcodec/amfenc.c
> >>> +++ b/libavcodec/amfenc.c
> >>> @@ -24,6 +24,9 @@
> >>>  #if CONFIG_D3D11VA
> >>>  #include "libavutil/hwcontext_d3d11va.h"
> >>>  #endif
> >>> +#if CONFIG_OPENCL
> >>> +#include "libavutil/hwcontext_opencl.h"
> >>> +#endif
> >>>  #include "libavutil/mem.h"
> >>>  #include "libavutil/pixdesc.h"
> >>>  #include "libavutil/time.h"
> >>> @@ -51,6 +54,9 @@ const enum AVPixelFormat ff_amf_pix_fmts[] = {
> >>> #if CONFIG_D3D11VA
> >>>      AV_PIX_FMT_D3D11,
> >>>  #endif
> >>> +#if CONFIG_OPENCL
> >>> +    AV_PIX_FMT_OPENCL,
> >>> +#endif
> >>>      AV_PIX_FMT_NONE
> >>>  };
> >>>
> >>> @@ -69,6 +75,7 @@ static const FormatMap format_map[] =
> >>>      { AV_PIX_FMT_YUV420P,    AMF_SURFACE_YUV420P },
> >>>      { AV_PIX_FMT_YUYV422,    AMF_SURFACE_YUY2 },
> >>>      { AV_PIX_FMT_D3D11,      AMF_SURFACE_NV12 },
> >>> +    { AV_PIX_FMT_OPENCL,     AMF_SURFACE_NV12 },
> >>>  };
> >>>
> >>>
> >>> @@ -154,8 +161,9 @@ static int amf_load_library(AVCodecContext
> >>> *avctx)
> >>>
> >>>  static int amf_init_context(AVCodecContext *avctx)  {
> >>> -    AmfContext         *ctx = avctx->priv_data;
> >>> -    AMF_RESULT          res = AMF_OK;
> >>> +    AmfContext *ctx = avctx->priv_data;
> >>> +    AMF_RESULT res;
> >>> +    AVHWDeviceContext *hwdev = NULL;
> >>>
> >>>      // configure AMF logger
> >>>      // the return of these functions indicates old state and do not
> >>> affect behaviour @@ -173,59 +181,91 @@ static int
> >>> amf_init_context(AVCodecContext *avctx)
> >>>
> >>>      res = ctx->factory->pVtbl->CreateContext(ctx->factory, &ctx->context);
> >>>      AMF_RETURN_IF_FALSE(ctx, res == AMF_OK, AVERROR_UNKNOWN,
> >>> "CreateContext() failed with error %d\n", res);
> >>> -    // try to reuse existing DX device
> >>> -#if CONFIG_D3D11VA
> >>> +
> >>> +    // Attempt to initialise from an existing D3D11 or OpenCL device.
> >>>      if (avctx->hw_frames_ctx) {
> >>> -        AVHWFramesContext *device_ctx = (AVHWFramesContext*)avctx-
> >>>> hw_frames_ctx->data;
> >>> -        if (device_ctx->device_ctx->type == AV_HWDEVICE_TYPE_D3D11VA)
> {
> >>> -            if (amf_av_to_amf_format(device_ctx->sw_format) !=
> >>> AMF_SURFACE_UNKNOWN) {
> >>> -                if (device_ctx->device_ctx->hwctx) {
> >>> -                    AVD3D11VADeviceContext *device_d3d11 =
> >>> (AVD3D11VADeviceContext *)device_ctx->device_ctx->hwctx;
> >>> -                    res = ctx->context->pVtbl->InitDX11(ctx->context,
> >> device_d3d11-
> >>>> device, AMF_DX11_1);
> >>> -                    if (res == AMF_OK) {
> >>> -                        ctx->hw_frames_ctx = av_buffer_ref(avctx-
> >hw_frames_ctx);
> >>> -                        if (!ctx->hw_frames_ctx) {
> >>> -                            return AVERROR(ENOMEM);
> >>> -                        }
> >>> -                    } else {
> >>> -                        if(res == AMF_NOT_SUPPORTED)
> >>> -                            av_log(avctx, AV_LOG_INFO, "avctx->hw_frames_ctx has
> >>> D3D11 device which doesn't have D3D11VA interface, switching to
> >>> default\n");
> >>> -                        else
> >>> -                            av_log(avctx, AV_LOG_INFO, "avctx->hw_frames_ctx has
> >>> non-AMD device, switching to default\n");
> >>> -                    }
> >>> -                }
> >>> -            } else {
> >>> -                av_log(avctx, AV_LOG_INFO, "avctx->hw_frames_ctx has
> format
> >>> not uspported by AMF, switching to default\n");
> >>> -            }
> >>> +        AVHWFramesContext *hwfc =
> >>> + (AVHWFramesContext*)avctx->hw_frames_ctx->data;
> >>> +
> >>> +        if (amf_av_to_amf_format(hwfc->sw_format) ==
> >>> AMF_SURFACE_UNKNOWN) {
> >>> +            av_log(avctx, AV_LOG_VERBOSE, "Input hardware frame
> >>> + format (%s)
> >>> is not supported.\n",
> >>> +                   av_get_pix_fmt_name(hwfc->sw_format));
> >>> +        } else {
> >>> +            hwdev = hwfc->device_ctx;
> >>> +
> >>> +            ctx->hw_frames_ctx = av_buffer_ref(avctx->hw_frames_ctx);
> >>> +            if (!ctx->hw_frames_ctx)
> >>> +                return AVERROR(ENOMEM);
> >>>          }
> >>> -    } else if (avctx->hw_device_ctx) {
> >>> -        AVHWDeviceContext *device_ctx = (AVHWDeviceContext*)(avctx-
> >>>> hw_device_ctx->data);
> >>> -        if (device_ctx->type == AV_HWDEVICE_TYPE_D3D11VA) {
> >>> -            if (device_ctx->hwctx) {
> >>> -                AVD3D11VADeviceContext *device_d3d11 =
> >>> (AVD3D11VADeviceContext *)device_ctx->hwctx;
> >>> -                res = ctx->context->pVtbl->InitDX11(ctx->context,
> device_d3d11-
> >>>> device, AMF_DX11_1);
> >>> +    }
> >>> +    if (!hwdev && avctx->hw_device_ctx) {
> >>> +        hwdev = (AVHWDeviceContext*)avctx->hw_device_ctx->data;
> >>> +
> >>> +        ctx->hw_device_ctx = av_buffer_ref(avctx->hw_device_ctx);
> >>> +        if (!ctx->hw_device_ctx)
> >>> +            return AVERROR(ENOMEM);
> >>> +    }
> >>> +    if (hwdev) {
> >>> +#if CONFIG_D3D11VA
> >>> +        if (hwdev->type == AV_HWDEVICE_TYPE_D3D11VA) {
> >>> +            AVD3D11VADeviceContext *d3d11dev = hwdev->hwctx;
> >>> +
> >>> +            res = ctx->context->pVtbl->InitDX11(ctx->context,
> >>> +                                                d3d11dev->device, AMF_DX11_1);
> >>> +            if (res == AMF_OK) {
> >>> +                av_log(avctx, AV_LOG_VERBOSE, "Initialised from "
> >>> +                       "external D3D11 device.\n");
> >>> +                return 0;
> >>> +            }
> >>> +
> >>> +            av_log(avctx, AV_LOG_INFO, "Failed to initialise from "
> >>> +                   "external D3D11 device: %d.\n", res);
> >>> +        } else
> >>> +#endif
> >>> +#if CONFIG_OPENCL
> >>> +        if (hwdev->type == AV_HWDEVICE_TYPE_OPENCL) {
> >>> +            AVOpenCLDeviceContext *cldev = hwdev->hwctx;
> >>> +            cl_int cle;
> >>> +
> >>> +            ctx->cl_command_queue =
> >>> +                clCreateCommandQueue(cldev->context,
> >>> + cldev->device_id, 0,
> >>> &cle);
> >>> +            if (!ctx->cl_command_queue) {
> >>> +                av_log(avctx, AV_LOG_INFO, "Failed to create OpenCL "
> >>> +                       "command queue: %d.\n", cle);
> >>> +            } else {
> >>> +                res = ctx->context->pVtbl->InitOpenCL(ctx->context,
> >>> +
> >>> + ctx->cl_command_queue);
> >>>                  if (res == AMF_OK) {
> >>> -                    ctx->hw_device_ctx = av_buffer_ref(avctx->hw_device_ctx);
> >>> -                    if (!ctx->hw_device_ctx) {
> >>> -                        return AVERROR(ENOMEM);
> >>> -                    }
> >>> -                } else {
> >>> -                    if (res == AMF_NOT_SUPPORTED)
> >>> -                        av_log(avctx, AV_LOG_INFO, "avctx->hw_device_ctx has
> >> D3D11
> >>> device which doesn't have D3D11VA interface, switching to default\n");
> >>> -                    else
> >>> -                        av_log(avctx, AV_LOG_INFO, "avctx->hw_device_ctx has
> non-
> >>> AMD device, switching to default\n");
> >>> +                    av_log(avctx, AV_LOG_VERBOSE, "Initialised from "
> >>> +                           "external OpenCL device.\n");
> >>> +                    return 0;
> >>>                  }
> >>> +                av_log(avctx, AV_LOG_INFO, "Failed to initialise from "
> >>> +                       "external OpenCL device: %d.\n", res);
> >>>              }
> >>> +        } else
> >>> +#endif
> >>> +        {
> >>> +            av_log(avctx, AV_LOG_INFO, "Input device type %s is not
> >>> supported.\n",
> >>> +                   av_hwdevice_get_type_name(hwdev->type));
> >>>          }
> >>>      }
> >>> -#endif
> >>> -    if (!ctx->hw_frames_ctx && !ctx->hw_device_ctx) {
> >>> -        res = ctx->context->pVtbl->InitDX11(ctx->context, NULL,
> >> AMF_DX11_1);
> >>> -        if (res != AMF_OK) {
> >>> -            res = ctx->context->pVtbl->InitDX9(ctx->context, NULL);
> >>> -            AMF_RETURN_IF_FALSE(ctx, res == AMF_OK,
> AVERROR_UNKNOWN,
> >>> "InitDX9() failed with error %d\n", res);
> >>> +
> >>> +    // Initialise from a new D3D11 device, or D3D9 if D3D11 is not
> available.
> >>> +    res = ctx->context->pVtbl->InitDX11(ctx->context, NULL,
> AMF_DX11_1);
> >>> +    if (res == AMF_OK) {
> >>> +        av_log(avctx, AV_LOG_VERBOSE, "Initialised from internal
> >>> + D3D11
> >>> device.\n");
> >>> +    } else {
> >>> +        av_log(avctx, AV_LOG_VERBOSE, "Failed to initialise from
> >>> + internal
> >>> D3D11 device: %d.\n", res);
> >>> +        res = ctx->context->pVtbl->InitDX9(ctx->context, NULL);
> >>> +        if (res == AMF_OK) {
> >>> +            av_log(avctx, AV_LOG_VERBOSE, "Initialised from
> >>> + internal
> >>> + D3D9
> >>> device.\n");
> >>> +        } else {
> >>> +            av_log(avctx, AV_LOG_VERBOSE, "Failed to initialise
> >>> + from internal
> >>> D3D9 device: %d.\n", res);
> >>> +            av_log(avctx, AV_LOG_ERROR, "Unable to initialise AMF.\n");
> >>> +            return AVERROR_UNKNOWN;
> >>>          }
> >>>      }
> >>> +
> >>>      return 0;
> >>>  }
> >>>
> >>> @@ -279,6 +319,11 @@ int av_cold
> ff_amf_encode_close(AVCodecContext
> >>> *avctx)
> >>>      av_buffer_unref(&ctx->hw_device_ctx);
> >>>      av_buffer_unref(&ctx->hw_frames_ctx);
> >>>
> >>> +#if CONFIG_OPENCL
> >>> +    if (ctx->cl_command_queue)
> >>> +        clReleaseCommandQueue(ctx->cl_command_queue);
> >>> +#endif
> >>> +
> >>>      if (ctx->trace) {
> >>>          ctx->trace->pVtbl->UnregisterWriter(ctx->trace,
> >>> FFMPEG_AMF_WRITER_ID);
> >>>      }
> >>> @@ -485,17 +530,38 @@ int ff_amf_send_frame(AVCodecContext
> *avctx,
> >>> const AVFrame *frame)
> >>>              (AVHWDeviceContext*)ctx->hw_device_ctx->data)
> >>>          )) {
> >>>  #if CONFIG_D3D11VA
> >>> -            static const GUID AMFTextureArrayIndexGUID = { 0x28115527,
> >>> 0xe7c3, 0x4b66, { 0x99, 0xd3, 0x4f, 0x2a, 0xe6, 0xb4, 0x7f, 0xaf } };
> >>> -            ID3D11Texture2D *texture = (ID3D11Texture2D*)frame->data[0];
> //
> >>> actual texture
> >>> -            int index = (int)(size_t)frame->data[1]; // index is a slice in texture
> >>> array is - set to tell AMF which slice to use
> >>> -            texture->lpVtbl->SetPrivateData(texture,
> >>> &AMFTextureArrayIndexGUID, sizeof(index), &index);
> >>> -
> >>> -            res = ctx->context->pVtbl->CreateSurfaceFromDX11Native(ctx-
> >>>> context, texture, &surface, NULL); // wrap to AMF surface
> >>> -            AMF_RETURN_IF_FALSE(ctx, res == AMF_OK,
> AVERROR(ENOMEM),
> >>> "CreateSurfaceFromDX11Native() failed  with error %d\n", res);
> >>> -
> >>> -            // input HW surfaces can be vertically aligned by 16; tell AMF the
> >> real
> >>> size
> >>> -            surface->pVtbl->SetCrop(surface, 0, 0, frame->width, frame-
> >>> height);
> >>> +            if (frame->format == AV_PIX_FMT_D3D11) {
> >>> +                static const GUID AMFTextureArrayIndexGUID = {
> >>> + 0x28115527,
> >>> 0xe7c3, 0x4b66, { 0x99, 0xd3, 0x4f, 0x2a, 0xe6, 0xb4, 0x7f, 0xaf }
> >>> };
> >>> +                ID3D11Texture2D *texture =
> >>> + (ID3D11Texture2D*)frame->data[0];
> >>> // actual texture
> >>> +                int index = (int)(size_t)frame->data[1]; // index
> >>> + is a slice in texture
> >>> array is - set to tell AMF which slice to use
> >>> +                texture->lpVtbl->SetPrivateData(texture,
> >>> + &AMFTextureArrayIndexGUID, sizeof(index), &index);
> >>> +
> >>> +                res =
> >>> + ctx->context->pVtbl->CreateSurfaceFromDX11Native(ctx-
> >>>> context, texture, &surface, NULL); // wrap to AMF surface
> >>> +                AMF_RETURN_IF_FALSE(ctx, res == AMF_OK,
> >>> + AVERROR(ENOMEM), "CreateSurfaceFromDX11Native() failed  with
> error
> >>> + %d\n", res);
> >>> +
> >>> +                // input HW surfaces can be vertically aligned by
> >>> + 16; tell AMF the
> >>> real size
> >>> +                surface->pVtbl->SetCrop(surface, 0, 0,
> >>> + frame->width,
> >>> + frame-
> >>>> height);
> >>> +            } else
> >>> +#endif
> >>> +#if CONFIG_OPENCL
> >>> +            if (frame->format == AV_PIX_FMT_OPENCL) {
> >>> +                void *planes[AV_NUM_DATA_POINTERS];
> >>> +                AMF_SURFACE_FORMAT format;
> >>> +                int i;
> >>> +
> >>> +                for (i = 0; i < AV_NUM_DATA_POINTERS; i++)
> >>> +                    planes[i] = frame->data[i];
> >>> +
> >>> +                format = amf_av_to_amf_format(frame->format);
> >>> +
> >>> +                res =
> >>> + ctx->context->pVtbl->CreateSurfaceFromOpenCLNative(ctx-
> >>>> context, format,
> >>> +                                                                         frame->width, frame->height,
> >>> +                                                                         planes, &surface, NULL);
> >>> +                AMF_RETURN_IF_FALSE(ctx, res == AMF_OK,
> >>> AVERROR_UNKNOWN,
> >>> +
> >>> + "CreateSurfaceFromOpenCLNative() failed with error
> >>> %d\n", res);
> >>> +            } else
> >>>  #endif
> >>> +            av_assert0(0 && "Invalid hardware input format.");
> >>>          } else {
> >>>              res = ctx->context->pVtbl->AllocSurface(ctx->context,
> >>> AMF_MEMORY_HOST, ctx->format, avctx->width, avctx->height,
> &surface);
> >>>              AMF_RETURN_IF_FALSE(ctx, res == AMF_OK,
> >>> AVERROR(ENOMEM),
> >>> "AllocSurface() failed  with error %d\n", res); diff --git
> >>> a/libavcodec/amfenc.h b/libavcodec/amfenc.h index
> >>> 84f0aad2fa..bb8fd1807a 100644
> >>> --- a/libavcodec/amfenc.h
> >>> +++ b/libavcodec/amfenc.h
> >>> @@ -61,6 +61,7 @@ typedef struct AmfContext {
> >>>
> >>>      AVBufferRef        *hw_device_ctx; ///< pointer to HW accelerator
> >>> (decoder)
> >>>      AVBufferRef        *hw_frames_ctx; ///< pointer to HW accelerator
> (frame
> >>> allocator)
> >>> +    void               *cl_command_queue; ///< Command queue for use
> with
> >>> OpenCL input
> >>>
> >>>      // helpers to handle async calls
> >>>      int                 delayed_drain;
> >>> --
> >>> 2.11.0
> >>> _______________________________________________
> >>> ffmpeg-devel mailing list
> >>> ffmpeg-devel at ffmpeg.org
> >>> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> >>
> >> AMF encoder works via D3D9 or D3D11 only. AMF OpenCL support is
> done
> >> for possible integration with external image processing. Passing
> >> regular OpenCL 2D images will cause mapping to system memory and
> copy.
> >> The fast way is to use interop:
> >> - Allocate last processing NV12 surface as D3D11 texture
> >> - iterop it into OpenCL
> >> - use as output for the last OCL kernel
> >> - un-interop back to D3D11
> >> - submit to AMF.
> >> There is not much value to initialize AMF with OpenCL unless AMF
> >> color space converter is used.
> >> The converter would do the sequence described above.
> >>
> >> If AMF CSC is used few things has to be done:
> >> 1. Device should be created by passing D3D11 device as a parameter.
> >> It is done in hwcontext_opencl.c clGetDeviceIDsFromD3D11KNR().
> >> 2. The D3D11 device used there should be passed to AMF via InitDX11()
> >> preferably before InitOpenCL() call.
> >> 3. Add RGB formats for submission.
> >> Mikhail
> >>
> >
> > Alternatively we could just allocate D3D11 surface, interop to OCL, copy
> using OCL, un-interop, and submit to AMF:
> > Context->InitD3D11(device used for OCL device creation)
> > Context->InitOpenCL(queue)
> > Context->AllocSurface(AMF_MEMORY_D3D11,AMF_SURFACE_NV12,,
> &surface);
> > surface->Convert(AMF_MEMORY_OPENCL); //interop
> > cl_mem planeY = surface->GetPlaneAt(0)->GetNative();
> > cl_mem planeUV = surface->GetPlaneAt(1)->GetNative();
> >
> > clEnqueueCopyImage() // Y
> > clEnqueueCopyImage() // UV
> > surface->Convert(AMF_MEMORY_D3D11); //un-interop
> > encoder->SubmitInput(surface);
> 
> Right, that sequence would work; I might try it with D3D9.
> 
> Is there a reason why the driver doesn't use this path (or some equivalent)
> internally?  Implementing the download/upload sequence inside the driver
> feels just as bad, and is significantly more misleading to the user.  (I assume
> the reason why the OpenCL images aren't usable directly is due a restriction
> on tiling modes or some similar layout issue, so at least one copy is definitely
> required.)
> 
> - Mark
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

This way we would obscure un-optimal path. We would like to promote fully optimal pipelines like the one I described above with D3D11 allocation and interops.

Speaking of D3D9 - AMF should work the same way ad D3D11 with one exception you should be aware of: for fast interop with OCL the D3D9 surface should be allocated with shared handle. 
Unfortunately DXVA2 surface allocation doesn’t allow this. But regular surface allocation via D3D9 device can be done with shared handles and AMF allocates surfaces always with shared handles.  
Mikhail


More information about the ffmpeg-devel mailing list