[FFmpeg-devel] [PATCH 1/2] libavutil/libavfilter: opencl wrapper based on comments on 20130327

Thu Mar 28 00:08:59 CET 2013

On date Wednesday 2013-03-27 21:21:09 +0800, Wei Gao encoded:
> 

> From 544da77d67c6c27e415363c3ebd2f1894f98932e Mon Sep 17 00:00:00 2001
> From: highgod0401 <highgod0401 at gmail.com>
> Date: Wed, 27 Mar 2013 21:17:29 +0800
> Subject: [PATCH] opencl wrapper based on comments on 20130327
> 
> ---
>  configure          |   4 +
>  libavutil/Makefile |   3 +
>  libavutil/opencl.c | 674 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  libavutil/opencl.h | 197 ++++++++++++++++
>  4 files changed, 878 insertions(+)
>  create mode 100644 libavutil/opencl.c
>  create mode 100644 libavutil/opencl.h
> 
> diff --git a/configure b/configure
> index 8443db4..9c42a85 100755
> --- a/configure
> +++ b/configure
> @@ -233,6 +233,7 @@ External library support:
>    --enable-libxvid         enable Xvid encoding via xvidcore,
>                             native MPEG-4/Xvid encoder exists [no]
>    --enable-openal          enable OpenAL 1.1 capture support [no]
> +  --enable-opencl          enable OpenCL code
>    --enable-openssl         enable openssl [no]
>    --enable-x11grab         enable X11 grabbing [no]
>    --enable-zlib            enable zlib [autodetect]
> @@ -1178,6 +1179,7 @@ EXTERNAL_LIBRARY_LIST="
>      libxavs
>      libxvid
>      openal
> +    opencl
>      openssl
>      x11grab
>      zlib
> @@ -3982,6 +3984,7 @@ enabled openal     && { { for al_libs in "${OPENAL_LIBS}" "-lopenal" "-lOpenAL32
>                          die "ERROR: openal not found"; } &&
>                        { check_cpp_condition "AL/al.h" "defined(AL_VERSION_1_1)" ||
>                          die "ERROR: openal must be installed and version must be 1.1 or compatible"; }
> +enabled opencl     && require2 opencl CL/cl.h clEnqueueNDRangeKernel -lOpenCL
>  enabled openssl    && { check_lib openssl/ssl.h SSL_library_init -lssl -lcrypto ||
>                          check_lib openssl/ssl.h SSL_library_init -lssl32 -leay32 ||
>                          check_lib openssl/ssl.h SSL_library_init -lssl -lcrypto -lws2_32 -lgdi32 ||
> @@ -4350,6 +4353,7 @@ echo "network support           ${network-no}"
>  echo "threading support         ${thread_type-no}"
>  echo "safe bitstream reader     ${safe_bitstream_reader-no}"
>  echo "SDL support               ${sdl-no}"
> +echo "opencl enabled            ${opencl-no}"
>  echo "texi2html enabled         ${texi2html-no}"
>  echo "perl enabled              ${perl-no}"
>  echo "pod2man enabled           ${pod2man-no}"
> diff --git a/libavutil/Makefile b/libavutil/Makefile
> index 103ce5e..6375e10 100644
> --- a/libavutil/Makefile
> +++ b/libavutil/Makefile
> @@ -52,6 +52,8 @@ HEADERS = adler32.h                                                     \
>  
>  HEADERS-$(CONFIG_LZO)                   += lzo.h
>  
> +HEADERS-$(CONFIG_OPENCL)                += opencl.h
> +
>  ARCH_HEADERS = bswap.h                                                  \
>                 intmath.h                                                \
>                 intreadwrite.h                                           \
> @@ -115,6 +117,7 @@ SKIPHEADERS-$(HAVE_MACHINE_RW_BARRIER)          += atomic_suncc.h
>  SKIPHEADERS-$(HAVE_MEMORYBARRIER)               += atomic_win32.h
>  SKIPHEADERS-$(HAVE_SYNC_VAL_COMPARE_AND_SWAP)   += atomic_gcc.h
>  
> +OBJS-$(CONFIG_OPENCL)                   += opencl.o
>  TESTPROGS = adler32                                                     \
>              aes                                                         \
>              atomic                                                      \
> diff --git a/libavutil/opencl.c b/libavutil/opencl.c
> new file mode 100644
> index 0000000..1888a34
> --- /dev/null
> +++ b/libavutil/opencl.c
> @@ -0,0 +1,674 @@
> +/*
> + * Copyright (C) 2012 Peng Gao <peng at multicorewareinc.com>
> + * Copyright (C) 2012 Li   Cao <li at multicorewareinc.com>
> + * Copyright (C) 2012 Wei  Gao <weigao at multicorewareinc.com>
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +#include "opencl.h"
> +#include "avstring.h"
> +#include "log.h"
> +#include "avassert.h"
> +

> +#if !HAVE_MEMORYBARRIER && !HAVE_SYNC_VAL_COMPARE_AND_SWAP && !HAVE_MACHINE_RW_BARRIER
> +#if HAVE_PTHREADS
> +
> +#include <pthread.h>
> +static pthread_mutex_t atomic_opencl_lock = PTHREAD_MUTEX_INITIALIZER;
> +
> +static void lock_opencl(void)
> +{
> +    pthread_mutex_lock(&atomic_opencl_lock);
> +}
> +
> +static void unlock_opencl(void)
> +{
> +    pthread_mutex_unlock(&atomic_opencl_lock);
> +}
> +
> +#elif !HAVE_THREADS
> +
> +static void lock_opencl(void)
> +{
> +}
> +
> +static void unlock_opencl(void)
> +{
> +}
> +
> +#endif
> +#else
> +static void lock_opencl(void)
> +{
> +}
> +
> +static void unlock_opencl(void)
> +{
> +}
> +
> +#endif
> +
> +
> +#define MAX_KERNEL_FUNCTION_NAME_LEN  64
> +#define MAX_KERNEL_FUNCTIONS_NUM 200
> +
> +typedef struct GPUEnv {
> +    int opencl_is_inited;
> +    cl_platform_id platform;
> +    cl_device_type device_type;
> +    cl_context context;
> +    cl_device_id *device_ids;
> +    cl_device_id device_id;
> +    cl_command_queue command_queue;
> +    cl_program program;

> +    char kernel_entry_points[MAX_KERNEL_FUNCTIONS_NUM][MAX_KERNEL_FUNCTION_NAME_LEN+1];
> +    av_opencl_kernel_function kernel_functions[MAX_KERNEL_FUNCTIONS_NUM];
> +    const char *kernel_code[MAX_KERNEL_FUNCTIONS_NUM];
> +    int kernel_function_count;

I'm sorry if I misguided you, from my current understanding, a kernel
is a bunch of code which is compiled by OpenCL, and with a name (which
is associated to the kernel, not necessarily to the entry point
declared when creating the corresponding kernel handle with
clCreateKernel). So:

char                      kernel_names    [MAX_KERNEL_NUM][MAX_KERNEL_NAME_LEN+1];
const char               *kernel_code     [MAX_KERNEL_NUM];
av_opencl_kernel_function kernel_functions[MAX_KERNEL_NUM];
int                       kernel_count;

like in the previous patch seems acceptable.

> +    int runtime_kernel_function_count;
> +    int is_user_created; // 1: the opencl env is created by user and use AVOpenCLExternalInfo to pass to ffmpeg ,0:created by opencl wrapper.
> +} GPUEnv;
> +
> +typedef struct OpenclErrorMsg {
> +    int err_code;
> +    const char *err_str;
> +} OpenclErrorMsg;
> +
> +static const OpenclErrorMsg opencl_err_msg[] = {
> +        {CL_DEVICE_NOT_FOUND,                               "DEVICE NOT FOUND"},
> +        {CL_DEVICE_NOT_AVAILABLE,                           "DEVICE NOT AVAILABLE"},
> +        {CL_COMPILER_NOT_AVAILABLE,                         "COMPILER NOT AVAILABLE"},
> +        {CL_MEM_OBJECT_ALLOCATION_FAILURE,                  "MEM OBJECT ALLOCATION FAILURE"},
> +        {CL_OUT_OF_RESOURCES,                               "OUT OF RESOURCES"},
> +        {CL_OUT_OF_HOST_MEMORY,                             "OUT OF HOST MEMORY"},
> +        {CL_PROFILING_INFO_NOT_AVAILABLE,                   "PROFILING INFO NOT AVAILABLE"},
> +        {CL_MEM_COPY_OVERLAP,                               "MEM COPY OVERLAP"},
> +        {CL_IMAGE_FORMAT_MISMATCH,                          "IMAGE FORMAT MISMATCH"},
> +        {CL_IMAGE_FORMAT_NOT_SUPPORTED,                     "IMAGE FORMAT NOT_SUPPORTED"},
> +        {CL_BUILD_PROGRAM_FAILURE,                          "BUILD PROGRAM FAILURE"},
> +        {CL_MAP_FAILURE,                                    "MAP FAILURE"},
> +        {CL_MISALIGNED_SUB_BUFFER_OFFSET,                   "MISALIGNED SUB BUFFER OFFSET"},
> +        {CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST,      "EXEC STATUS ERROR FOR EVENTS IN WAIT LIST"},
> +        {CL_COMPILE_PROGRAM_FAILURE,                        "COMPILE PROGRAM FAILURE"},
> +        {CL_LINKER_NOT_AVAILABLE,                           "LINKER NOT AVAILABLE"},
> +        {CL_LINK_PROGRAM_FAILURE,                           "LINK PROGRAM FAILURE"},
> +        {CL_DEVICE_PARTITION_FAILED,                        "DEVICE PARTITION FAILED"},
> +        {CL_KERNEL_ARG_INFO_NOT_AVAILABLE,                  "KERNEL ARG INFO NOT AVAILABLE"},
> +        {CL_INVALID_VALUE,                                  "INVALID VALUE"},
> +        {CL_INVALID_DEVICE_TYPE,                            "INVALID DEVICE TYPE"},
> +        {CL_INVALID_PLATFORM,                               "INVALID PLATFORM"},
> +        {CL_INVALID_DEVICE,                                 "INVALID DEVICE"},
> +        {CL_INVALID_CONTEXT,                                "INVALID CONTEXT"},
> +        {CL_INVALID_QUEUE_PROPERTIES,                       "INVALID QUEUE PROPERTIES"},
> +        {CL_INVALID_COMMAND_QUEUE,                          "INVALID COMMAND QUEUE"},
> +        {CL_INVALID_HOST_PTR,                               "INVALID HOST PTR"},
> +        {CL_INVALID_MEM_OBJECT,                             "INVALID MEM OBJECT"},
> +        {CL_INVALID_IMAGE_FORMAT_DESCRIPTOR,                "INVALID IMAGE FORMAT DESCRIPTOR"},
> +        {CL_INVALID_IMAGE_SIZE,                             "INVALID IMAGE SIZE"},
> +        {CL_INVALID_SAMPLER,                                "INVALID SAMPLER"},
> +        {CL_INVALID_BINARY,                                 "INVALID BINARY"},
> +        {CL_INVALID_BUILD_OPTIONS,                          "INVALID BUILD OPTIONS"},
> +        {CL_INVALID_PROGRAM,                                "INVALID PROGRAM"},
> +        {CL_INVALID_PROGRAM_EXECUTABLE,                     "INVALID PROGRAM EXECUTABLE"},
> +        {CL_INVALID_KERNEL_NAME,                            "INVALID KERNEL NAME"},
> +        {CL_INVALID_KERNEL_DEFINITION,                      "INVALID KERNEL DEFINITION"},
> +        {CL_INVALID_KERNEL,                                 "INVALID KERNEL"},
> +        {CL_INVALID_ARG_INDEX,                              "INVALID ARG INDEX"},
> +        {CL_INVALID_ARG_VALUE,                              "INVALID ARG VALUE"},
> +        {CL_INVALID_ARG_SIZE,                               "INVALID ARG_SIZE"},
> +        {CL_INVALID_KERNEL_ARGS,                            "INVALID KERNEL ARGS"},
> +        {CL_INVALID_WORK_DIMENSION,                         "INVALID WORK DIMENSION"},
> +        {CL_INVALID_WORK_GROUP_SIZE,                        "INVALID WORK GROUP SIZE"},
> +        {CL_INVALID_WORK_ITEM_SIZE,                         "INVALID WORK ITEM SIZE"},
> +        {CL_INVALID_GLOBAL_OFFSET,                          "INVALID GLOBAL OFFSET"},
> +        {CL_INVALID_EVENT_WAIT_LIST,                        "INVALID EVENT WAIT LIST"},
> +        {CL_INVALID_EVENT,                                  "INVALID EVENT"},
> +        {CL_INVALID_OPERATION,                              "INVALID OPERATION"},
> +        {CL_INVALID_GL_OBJECT,                              "INVALID GL OBJECT"},
> +        {CL_INVALID_BUFFER_SIZE,                            "INVALID BUFFER SIZE"},
> +        {CL_INVALID_MIP_LEVEL,                              "INVALID MIP LEVEL"},
> +        {CL_INVALID_GLOBAL_WORK_SIZE,                       "INVALID GLOBAL WORK SIZE"},
> +        {CL_INVALID_PROPERTY,                               "INVALID PROPERTY"},
> +        {CL_INVALID_IMAGE_DESCRIPTOR,                       "INVALID IMAGE DESCRIPTOR"},
> +        {CL_INVALID_COMPILER_OPTIONS,                       "INVALID COMPILER OPTIONS"},
> +        {CL_INVALID_LINKER_OPTIONS,                         "INVALID LINKER OPTIONS"},
> +        {CL_INVALID_DEVICE_PARTITION_COUNT,                 "INVALID DEVICE PARTITION COUNT"},
> +};
> +
> +typedef struct OpenclUtils {
> +    const AVClass *class;
> +    int log_offset;
> +    void *log_ctx;
> +} OpenclUtils;
> +
> +static const AVClass openclutils_class = {"OPENCLUTILS", av_default_item_name,
> +                                                   NULL, LIBAVUTIL_VERSION_INT,
> +                                                   offsetof(OpenclUtils, log_offset),
> +                                                   offsetof(OpenclUtils, log_ctx)};
> +static OpenclUtils openclutils = {&openclutils_class};
> +static GPUEnv gpu_env;
> +

> +int av_opencl_register_kernel(const char *kernel_entry_point, const char *kernel_code)

Same here, the previous name seems less misleading, since this is
registering a kernel code with its name.

> +{
> +    int ret = 0;
> +    lock_opencl();
> +    if (gpu_env.kernel_function_count >= MAX_KERNEL_FUNCTIONS_NUM) {
> +        av_log(&openclutils, AV_LOG_ERROR,
> +         "Could not register kernel with name '%s', maximum number of registered kernels %d already reached\n",
> +         kernel_entry_point, MAX_KERNEL_FUNCTIONS_NUM);
> +        ret = AVERROR(EINVAL);
> +        goto end;
> +    }
> +    if (strlen(kernel_entry_point) > MAX_KERNEL_FUNCTION_NAME_LEN) {
> +         av_log(&openclutils, AV_LOG_ERROR, "Registered kernel name %s is too long\n", kernel_entry_point);
> +         ret = AVERROR(EINVAL);
> +         goto end;
> +    }
> +    gpu_env.kernel_code[gpu_env.kernel_function_count] = kernel_code;
> +    av_strlcpy(gpu_env.kernel_entry_points[gpu_env.kernel_function_count], kernel_entry_point, MAX_KERNEL_FUNCTION_NAME_LEN+1);
> +    gpu_env.kernel_function_count++;
> +end:
> +    unlock_opencl();
> +    return ret;
> +}
> +static const char *opencl_errstr(cl_int status)
> +{
> +    int i;
> +    for (i = 0; i < sizeof(opencl_err_msg); i++) {
> +        if (opencl_err_msg[i].err_code == status)
> +            return opencl_err_msg[i].err_str;
> +    }
> +    return "unknown error";
> +}
> +
> +int av_opencl_create_kernel(AVOpenCLKernelEnv *env, const char *kernel_entry_point)
> +{
> +    cl_int status;
> +    if (strlen(kernel_entry_point) > sizeof(env->kernel_entry_point)) {
> +        av_log(&openclutils, AV_LOG_ERROR, "Created kernel name %s is too long\n", kernel_entry_point);
> +        return AVERROR(EINVAL);
> +    }
> +    if (!env->kernel) {
> +        env->kernel = clCreateKernel(gpu_env.program, kernel_entry_point, &status);
> +        if (status != CL_SUCCESS) {
> +            av_log(&openclutils, AV_LOG_ERROR, "Could not create OpenCL kernel: %s\n", opencl_errstr(status));
> +            return AVERROR_EXTERNAL;
> +        }
> +        env->context       = gpu_env.context;
> +        env->command_queue = gpu_env.command_queue;
> +        env->program       = gpu_env.program;
> +        av_strlcpy(env->kernel_entry_point, kernel_entry_point, sizeof(env->kernel_entry_point));
> +    }
> +    return 0;
> +}
> +
> +void av_opencl_release_kernel(AVOpenCLKernelEnv *env)
> +{
> +    cl_int status;
> +    if (!env->kernel)
> +        return;
> +    status = clReleaseKernel(env->kernel);
> +    if (status != CL_SUCCESS) {
> +        av_log(&openclutils, AV_LOG_ERROR, "Could not release kernel: %s\n", opencl_errstr(status));
> +    }
> +}
> +
> +static int init_opencl_env(GPUEnv *gpu_env, AVOpenCLExternalInfo *ext_opencl_info)
> +{
> +    size_t device_length;
> +    cl_int status;
> +    cl_uint num_platforms, num_devices;
> +    cl_platform_id *platform_ids = NULL;
> +    cl_context_properties cps[3];
> +    char platform_name[100];
> +    int i;
> +    int ret = 0;
> +    cl_device_type device_type[] = {CL_DEVICE_TYPE_GPU, CL_DEVICE_TYPE_CPU, CL_DEVICE_TYPE_DEFAULT};
> +    if (ext_opencl_info) {
> +        if (gpu_env->is_user_created)
> +            return 0;
> +        gpu_env->platform        = ext_opencl_info->platform;
> +        gpu_env->is_user_created = 1;
> +        gpu_env->command_queue   = ext_opencl_info->command_queue;
> +        gpu_env->context         = ext_opencl_info->context;
> +        gpu_env->device_ids      = ext_opencl_info->device_ids;
> +        gpu_env->device_id       = ext_opencl_info->device_id;
> +        gpu_env->device_type     = ext_opencl_info->device_type;
> +    } else {
> +        if (!gpu_env->is_user_created) {
> +            status = clGetPlatformIDs(0, NULL, &num_platforms);
> +            if (status != CL_SUCCESS) {
> +                av_log(&openclutils, AV_LOG_ERROR, "Could not get OpenCL platform ids: %s\n", opencl_errstr(status));
> +                return AVERROR_EXTERNAL;
> +            }
> +            if (num_platforms > 0) {
> +                platform_ids = av_mallocz(num_platforms * sizeof(cl_platform_id));
> +                if (!platform_ids) {
> +                    ret = AVERROR(ENOMEM);
> +                    goto end;
> +                }
> +                status = clGetPlatformIDs(num_platforms, platform_ids, NULL);
> +                if (status != CL_SUCCESS) {
> +                    av_log(&openclutils, AV_LOG_ERROR, "Could not get OpenCL platform ids: %s\n", opencl_errstr(status));
> +                    ret = AVERROR_EXTERNAL;
> +                    goto end;
> +                }
> +                for (i = 0; i < num_platforms; i++) {
> +                    status = clGetPlatformInfo(platform_ids[i], CL_PLATFORM_VENDOR,
> +                                               sizeof(platform_name), platform_name,
> +                                               NULL);
> +
> +                    if (status != CL_SUCCESS) {
> +                        av_log(&openclutils, AV_LOG_ERROR, "Could not get OpenCL platform info: %s\n", opencl_errstr(status));
> +                        ret = AVERROR_EXTERNAL;
> +                        goto end;
> +                    }
> +                    gpu_env->platform = platform_ids[i];
> +                    status = clGetDeviceIDs(gpu_env->platform, CL_DEVICE_TYPE_GPU,
> +                                            0, NULL, &num_devices);
> +                    if (status != CL_SUCCESS) {
> +                        av_log(&openclutils, AV_LOG_ERROR, "Could not get OpenCL device number:%s\n", opencl_errstr(status));
> +                        ret = AVERROR_EXTERNAL;
> +                        goto end;
> +                    }
> +                    if (num_devices == 0) {
> +                        //find CPU device
> +                        status = clGetDeviceIDs(gpu_env->platform, CL_DEVICE_TYPE_CPU,
> +                                             0, NULL, &num_devices);
> +                    }
> +                    if (status != CL_SUCCESS) {
> +                        av_log(&openclutils, AV_LOG_ERROR, "Could not get OpenCL device ids: %s\n", opencl_errstr(status));
> +                        ret = AVERROR_EXTERNAL;
> +                        goto end;
> +                    }
> +                    if (num_devices)
> +                       break;
> +
> +                }
> +            }
> +            if (!gpu_env->platform) {
> +                av_log(&openclutils, AV_LOG_ERROR, "Could not get OpenCL platforms\n");
> +                ret = AVERROR_EXTERNAL;
> +                goto end;
> +            }
> +
> +           /*
> +                 * Use available platform.
> +                 */
> +            av_log(&openclutils, AV_LOG_VERBOSE, "Platform Name: %s\n", platform_name);
> +            cps[0] = CL_CONTEXT_PLATFORM;
> +            cps[1] = (cl_context_properties)gpu_env->platform;
> +            cps[2] = 0;
> +            /* Check for GPU. */
> +
> +            for (i = 0; i < sizeof(device_type); i++) {
> +                gpu_env->device_type = device_type[i];
> +                gpu_env->context     = clCreateContextFromType(cps, gpu_env->device_type,
> +                                                               NULL, NULL, &status);
> +                if (status != CL_SUCCESS) {
> +                    av_log(&openclutils, AV_LOG_ERROR, "Could not get OpenCL context from device type: %s\n", opencl_errstr(status));
> +                    ret = AVERROR_EXTERNAL;
> +                    goto end;
> +                }
> +                if (gpu_env->context)
> +                    break;
> +            }
> +            if (!gpu_env->context) {
> +                av_log(&openclutils, AV_LOG_ERROR, "Could not get OpenCL context from device type\n");
> +                ret = AVERROR_EXTERNAL;
> +                goto end;
> +            }
> +            /* Detect OpenCL devices. */
> +            /* First, get the size of device list data */
> +            status = clGetContextInfo(gpu_env->context, CL_CONTEXT_DEVICES,
> +                                      0, NULL, &device_length);
> +            if (status != CL_SUCCESS) {
> +                av_log(&openclutils, AV_LOG_ERROR, "Could not get OpenCL device length: %s\n", opencl_errstr(status));
> +                ret = AVERROR_EXTERNAL;
> +                goto end;
> +            }
> +            if (device_length == 0) {
> +                av_log(&openclutils, AV_LOG_ERROR, "Could not get OpenCL device length\n");
> +                ret = AVERROR_EXTERNAL;
> +                goto end;
> +            }
> +            /* Now allocate memory for device list based on the size we got earlier */
> +            gpu_env->device_ids = av_mallocz(device_length);
> +            if (!gpu_env->device_ids) {
> +                ret = AVERROR(ENOMEM);
> +                goto end;
> +            }
> +            /* Now, get the device list data */
> +            status = clGetContextInfo(gpu_env->context, CL_CONTEXT_DEVICES, device_length,
> +                                      gpu_env->device_ids, NULL);
> +            if (status != CL_SUCCESS) {
> +                av_log(&openclutils, AV_LOG_ERROR, "Could not get OpenCL context info: %s\n", opencl_errstr(status));
> +                ret = AVERROR_EXTERNAL;
> +                goto end;
> +            }
> +            /* Create OpenCL command queue. */
> +            gpu_env->command_queue = clCreateCommandQueue(gpu_env->context, gpu_env->device_ids[0],
> +                                                          0, &status);
> +            if (status != CL_SUCCESS) {
> +                av_log(&openclutils, AV_LOG_ERROR, "Could not create OpenCL command queue: %s\n", opencl_errstr(status));
> +                ret = AVERROR_EXTERNAL;
> +                goto end;
> +            }
> +        }
> +    }
> +end:
> +    av_free(platform_ids);
> +    return ret;
> +}
> +
> +

> +int av_opencl_register_kernel_function(const char *kernel_entry_point, av_opencl_kernel_function function)

again, kernel_name (like in the previous patch), seems more sensible,
since this is a kernel name (to which you're associating a function).

> +{
> +    int i;
> +    for (i = 0; i < gpu_env.kernel_function_count; i++) {
> +        if (av_strcasecmp(kernel_entry_point, gpu_env.kernel_entry_points[i]) == 0) {
> +            lock_opencl();
> +            gpu_env.kernel_functions[i] = function;
> +            gpu_env.runtime_kernel_function_count++;
> +            unlock_opencl();
> +            return 0;
> +        }
> +    }

> +    return AVERROR(EINVAL);

An error message could be helpful (why did you remove it?).

> +
> +}
> +
> +static int compile_kernel_file(GPUEnv *gpu_env, const char *build_option)
> +{
> +    cl_int status;
> +    char *temp, *source_str = NULL;
> +    size_t source_str_len = 0;
> +    int i, ret = 0;
> +
> +    if (gpu_env->program)
> +        return ret;
> +
> +    for (i = 0; i < gpu_env->kernel_function_count; i++) {
> +        source_str_len += strlen(gpu_env->kernel_code[i]);
> +    }
> +    source_str = av_mallocz(source_str_len + 1);
> +    if (!source_str) {
> +        return AVERROR(ENOMEM);
> +    }
> +    temp = source_str;
> +    for (i = 0; i < gpu_env->kernel_function_count; i++) {
> +        memcpy(temp, gpu_env->kernel_code[i], strlen(gpu_env->kernel_code[i]));
> +        temp += strlen(gpu_env->kernel_code[i]);
> +    }
> +    /* create a CL program using the kernel source */
> +    gpu_env->program = clCreateProgramWithSource(gpu_env->context, 1, (const char **)(&source_str),
> +                                                 &source_str_len, &status);
> +    if(status != CL_SUCCESS) {
> +        av_log(&openclutils, AV_LOG_ERROR, "Could not create OpenCL program with source code: %s\n",
> +               opencl_errstr(status));
> +        ret = AVERROR_EXTERNAL;
> +        goto end;
> +    }
> +    if (!gpu_env->program) {
> +        av_log(&openclutils, AV_LOG_ERROR, "Created program is NULL\n");
> +        ret = AVERROR_EXTERNAL;
> +        goto end;
> +    }
> +    /* create a cl program executable for all the devices specified */
> +    if (!gpu_env->is_user_created)
> +        status = clBuildProgram(gpu_env->program, 1, gpu_env->device_ids,
> +                                build_option, NULL, NULL);
> +    else
> +        status = clBuildProgram(gpu_env->program, 1, &(gpu_env->device_id),
> +                                 build_option, NULL, NULL);
> +
> +    if (status != CL_SUCCESS) {
> +        av_log(&openclutils, AV_LOG_ERROR, "Could not compile OpenCL kernel: %s\n", opencl_errstr(status));
> +        ret = AVERROR_EXTERNAL;
> +        goto end;
> +    }
> +end:
> +    av_free(source_str);
> +    return ret;
> +}
> +
> +int av_opencl_run_kernel(const char *kernel_entry_point, void **userdata)
> +{
> +    av_opencl_kernel_function function = NULL;
> +    int i;
> +    for (i = 0; i < gpu_env.kernel_function_count; i++) {
> +        if (av_strcasecmp(kernel_entry_point, gpu_env.kernel_entry_points[i]) == 0) {
> +            function = gpu_env.kernel_functions[i];
> +            break;
> +        }
> +    }
> +    if (!function) {
> +        av_log(&openclutils, AV_LOG_ERROR, "Could not find kernel: %s\n", kernel_entry_point);
> +        return AVERROR(EINVAL);
> +    }
> +    return(function(userdata));
> +}
> +

> +int av_opencl_init(const char *build_option, AVOpenCLExternalInfo *ext_opencl_info, char *dictionary)

Uhm no, I meant something like this:

int av_opencl_init(AVDictionary *options, AVOpenCLExternalInfo *ext_opencl_info)

then you read from options the build_option (and eventually abort in
case it is mandatory.

> +{
> +    int ret = 0;
> +    lock_opencl();
> +    if (!gpu_env.opencl_is_inited) {
> +        /*initialize devices, context, command_queue*/
> +        ret = init_opencl_env(&gpu_env, ext_opencl_info);
> +        if (ret < 0)
> +            goto end;
> +        /*initialize program, kernel_entry_point, kernel_function_count*/
> +        ret = compile_kernel_file(&gpu_env, build_option);
> +        if (ret < 0)
> +            goto end;
> +        av_assert1(gpu_env.kernel_function_count > 0);
> +        gpu_env.opencl_is_inited = 1;
> +    }
> +end:
> +    unlock_opencl();
> +    return ret;
> +}
> +
> +void av_opencl_uninit(void)
> +{
> +    cl_int status;
> +    lock_opencl();
> +    if (!gpu_env.opencl_is_inited)
> +        goto end;
> +    if (gpu_env.is_user_created)
> +        goto end;

> +    gpu_env.runtime_kernel_function_count--;

int overflow in case gpu_env.runtime_kernel_function_count is 0.

> +    if (!gpu_env.runtime_kernel_function_count) {
> +        if (gpu_env.program) {
> +            status = clReleaseProgram(gpu_env.program);
> +            if (status != CL_SUCCESS) {
> +                av_log(&openclutils, AV_LOG_ERROR, "Could not release OpenCL program: %s\n", opencl_errstr(status));
> +            }
> +            gpu_env.program = NULL;
> +        }
> +        if (gpu_env.command_queue) {
> +            status = clReleaseCommandQueue(gpu_env.command_queue);
> +            if (status != CL_SUCCESS) {
> +                av_log(&openclutils, AV_LOG_ERROR, "Could not release OpenCL command queue: %s\n", opencl_errstr(status));
> +            }
> +            gpu_env.command_queue = NULL;
> +        }
> +        if (gpu_env.context) {
> +            status = clReleaseContext(gpu_env.context);
> +            if (status != CL_SUCCESS) {
> +                av_log(&openclutils, AV_LOG_ERROR, "Could not release OpenCL context: %s\n", opencl_errstr(status));
> +            }
> +            gpu_env.context = NULL;
> +        }
> +        av_freep(&(gpu_env.device_ids));
> +        gpu_env.opencl_is_inited = 0;
> +    }
> +end:
> +    unlock_opencl();
> +}
> +
> +void av_opencl_get_kernel_env(AVOpenCLKernelEnv *env)
> +{
> +    env->context       = gpu_env.context;
> +    env->command_queue = gpu_env.command_queue;
> +    env->program       = gpu_env.program;
> +}
> +
> +int av_opencl_buffer_create(cl_mem *cl_buf, size_t cl_buf_size, int flags, void *host_ptr)
> +{
> +    cl_int status;
> +    *cl_buf = clCreateBuffer(gpu_env.context, flags, cl_buf_size, host_ptr, &status);
> +    if (status != CL_SUCCESS) {
> +        av_log(&openclutils, AV_LOG_ERROR, "Could not create OpenCL buffer: %s\n", opencl_errstr(status));
> +        return AVERROR_EXTERNAL;
> +    }
> +    return 0;
> +}
> +
> +void av_opencl_buffer_release(cl_mem cl_buf)
> +{
> +    cl_int status = 0;
> +    if (!cl_buf)
> +        return;
> +    status = clReleaseMemObject(cl_buf);
> +    if (status != CL_SUCCESS) {
> +        av_log(&openclutils, AV_LOG_ERROR, "Could not release OpenCL buffer: %s\n", opencl_errstr(status));
> +    }
> +}
> +
> +int av_opencl_buffer_write(cl_mem dst_cl_buf, uint8_t *src_buf, size_t buf_size)
> +{
> +    cl_int status;
> +    void *mapped = clEnqueueMapBuffer(gpu_env.command_queue, dst_cl_buf,
> +                                      CL_TRUE,CL_MAP_WRITE, 0, sizeof(uint8_t) * buf_size,
> +                                      0, NULL, NULL, &status);
> +
> +    if (status != CL_SUCCESS) {
> +        av_log(&openclutils, AV_LOG_ERROR, "Could not map OpenCL buffer: %s\n", opencl_errstr(status));
> +        return AVERROR_EXTERNAL;
> +    }
> +    memcpy(mapped, src_buf, buf_size);
> +
> +    status = clEnqueueUnmapMemObject(gpu_env.command_queue, dst_cl_buf, mapped, 0, NULL, NULL);
> +    if (status != CL_SUCCESS) {
> +        av_log(&openclutils, AV_LOG_ERROR, "Could not unmap OpenCL buffer: %s\n", opencl_errstr(status));
> +        return AVERROR_EXTERNAL;
> +    }
> +    return 0;
> +}
> +
> +int av_opencl_buffer_read(uint8_t *dst_buf, cl_mem src_cl_buf, size_t buf_size)
> +{
> +    cl_int status;
> +    void *mapped = clEnqueueMapBuffer(gpu_env.command_queue, src_cl_buf,
> +                                      CL_TRUE,CL_MAP_READ, 0, buf_size,
> +                                      0, NULL, NULL, &status);
> +
> +    if (status != CL_SUCCESS) {
> +        av_log(&openclutils, AV_LOG_ERROR, "Could not map OpenCL buffer: %s\n", opencl_errstr(status));
> +        return AVERROR_EXTERNAL;
> +    }
> +    memcpy(dst_buf, mapped, buf_size);
> +
> +    status = clEnqueueUnmapMemObject(gpu_env.command_queue, src_cl_buf, mapped, 0, NULL, NULL);
> +    if (status != CL_SUCCESS) {
> +        av_log(&openclutils, AV_LOG_ERROR, "Could not unmap OpenCL buffer: %s\n", opencl_errstr(status));
> +        return AVERROR_EXTERNAL;
> +    }
> +    return 0;
> +}
> +
> +int av_opencl_buffer_write_image(cl_mem dst_cl_buf, size_t cl_buffer_size,
> +                                        uint8_t **src_data, int *plane_size, int plane_num, int offset)
> +{
> +    int i, buffer_size = 0;
> +    uint8_t *temp;
> +    cl_int status;
> +    void *mapped;
> +    if ((unsigned int)plane_num > 8) {
> +        return AVERROR(EINVAL);
> +    }
> +    for (i = 0;i < plane_num;i++) {
> +        buffer_size += plane_size[i];
> +    }
> +    if (buffer_size > cl_buffer_size) {
> +        av_log(&openclutils, AV_LOG_ERROR, "Cannot write image to OpenCL buffer: buffer too small\n");
> +        return AVERROR(EINVAL);
> +    }
> +    mapped = clEnqueueMapBuffer(gpu_env.command_queue, dst_cl_buf,
> +                                      CL_TRUE,CL_MAP_WRITE, 0, buffer_size + offset,
> +                                      0, NULL, NULL, &status);
> +    if (status != CL_SUCCESS) {
> +        av_log(&openclutils, AV_LOG_ERROR, "Could not map OpenCL buffer: %s\n", opencl_errstr(status));
> +        return AVERROR_EXTERNAL;
> +    }
> +    temp = mapped;
> +    temp += offset;
> +    for (i = 0; i < plane_num; i++) {
> +        memcpy(temp, src_data[i], plane_size[i]);
> +        temp += plane_size[i];
> +    }
> +    status = clEnqueueUnmapMemObject(gpu_env.command_queue, dst_cl_buf, mapped, 0, NULL, NULL);
> +    if (status != CL_SUCCESS) {
> +        av_log(&openclutils, AV_LOG_ERROR, "Could not unmap OpenCL buffer: %s\n", opencl_errstr(status));
> +        return AVERROR_EXTERNAL;
> +    }
> +    return 0;
> +}
> +
> +int av_opencl_buffer_read_image(uint8_t **dst_data, int *plane_size, int plane_num,
> +                                       cl_mem src_cl_buf, size_t cl_buffer_size)
> +{
> +    int i,buffer_size = 0,ret = 0;
> +    uint8_t *temp;
> +    void *mapped;
> +    cl_int status;
> +    if ((unsigned int)plane_num > 8) {
> +        return AVERROR(EINVAL);
> +    }
> +    for (i = 0;i < plane_num;i++) {
> +        buffer_size += plane_size[i];
> +    }
> +    if (buffer_size > cl_buffer_size) {
> +        av_log(&openclutils, AV_LOG_ERROR, "Cannot write image to CPU buffer: OpenCL buffer too small\n");
> +        return AVERROR(EINVAL);
> +    }
> +    mapped = clEnqueueMapBuffer(gpu_env.command_queue, src_cl_buf,
> +                                      CL_TRUE,CL_MAP_READ, 0, buffer_size,
> +                                      0, NULL, NULL, &status);
> +
> +    if (status != CL_SUCCESS) {
> +        av_log(&openclutils, AV_LOG_ERROR, "Could not map OpenCL buffer: %s\n", opencl_errstr(status));
> +        return AVERROR_EXTERNAL;
> +    }
> +    temp = mapped;
> +    if (ret >= 0) {
> +        for (i = 0;i < plane_num;i++) {
> +            memcpy(dst_data[i], temp, plane_size[i]);
> +            temp += plane_size[i];
> +        }
> +    }
> +    status = clEnqueueUnmapMemObject(gpu_env.command_queue, src_cl_buf, mapped, 0, NULL, NULL);
> +    if (status != CL_SUCCESS) {
> +        av_log(&openclutils, AV_LOG_ERROR, "Could not unmap OpenCL buffer: %s\n", opencl_errstr(status));
> +        return AVERROR_EXTERNAL;
> +    }
> +    return 0;
> +}
> +
> diff --git a/libavutil/opencl.h b/libavutil/opencl.h
> new file mode 100644
> index 0000000..38e0327
> --- /dev/null
> +++ b/libavutil/opencl.h
> @@ -0,0 +1,197 @@
> +/*
> + * Copyright (C) 2012 Peng Gao <peng at multicorewareinc.com>
> + * Copyright (C) 2012 Li   Cao <li at multicorewareinc.com>
> + * Copyright (C) 2012 Wei  Gao <weigao at multicorewareinc.com>
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +#include "config.h"
> +
> +#ifndef LIBAVUTIL_OPENCLWRAPPER_H
> +#define LIBAVUTIL_OPENCLWRAPPER_H
> +
> +#include <CL/cl.h>
> +
> +#define AV_OPENCL_KERNEL( ... )# __VA_ARGS__
> +
> +#define AV_OPENCL_MAX_KERNEL_NAME_SIZE 150
> +
> +typedef struct AVOpenCLKernelEnv {
> +    cl_context context;
> +    cl_command_queue command_queue;
> +    cl_program program;
> +    cl_kernel kernel;
> +    char kernel_entry_point[AV_OPENCL_MAX_KERNEL_NAME_SIZE];
> +} AVOpenCLKernelEnv;
> +
> +typedef struct AVOpenCLExternalInfo {
> +    cl_platform_id platform;
> +    cl_device_type device_type;
> +    cl_context context;
> +    cl_device_id *device_ids;
> +    cl_device_id  device_id;
> +    cl_command_queue command_queue;
> +    char *platform_name;
> +} AVOpenCLExternalInfo;

About the overall design. It is pretty difficult for me to provide a
meaningful design since I have no serious knowledge about the OpenCL
architecture, but I'll try to sketch something which could match the
FFmpeg architecture and conventions.

Let's start with something simple, which we can extend later.

First of all, you need a context for working with OpenCL related
structures, let's call it AVOpenCLContext, which you could create for
example with:

int av_opencl_open(AVOpenCLContext **ctx, AVDictionary *options);

you set the options, and the function allocates the context and fills
it with the required fields.

Alternatively, suppose you want to reuse some structures already
created, you could have:
int av_opencl_open(AVOpenCLContext *ctx, AVDictionary *options);

in this case you create and fill the context before opening it with
the function (this could be useful to pass pointers to binary
structures).

At this point you need to register a kernel:

int av_opencl_register_kernel(AVOpenCLContext *ctx,
                              const char *kernel_name, const char *kernel_code);

I wonder if it makes sense to create a program *per kernel* and
compile the program during kernel registration.

Alternatively you could bookkeep several programs, and specify the
program where you want to register the kernel, for example:

int av_opencl_register_kernel(AVOpenCLContext *ctx,
                              const char *program_name,
                              const char *kernel_name,  
                              const char *kernel_code);

then you could have a function which does:
av_opencl_compile_program(AVOpenCLContext *ctx, const char *program_name);

which compiles the program with the associated kernels (I don't know
which are the drawbacks of this approach, and if it is feasible for
your use case).

At this point you need a function to create a kernel handle associated
to a kernel (possibly already compiled in a given program).

This could be:
int av_opencl_create_kernel(AVOpenCLContext *ctx,
                            const char *program_name,
                            const char *kernel_name,
                            const char *kernel_entry_point,
                            av_opencl_kernel_function function);

which has also the advantage that you specify the kernel function, so
you don't need a separate registration function for that.

Then you finally do:
int av_opencl_run_kernel(AVOpenCLContext *ctx,
                         const char *program_name, const char *kernel_name,
                         void **userdata)

which executes the function associated to a program and a kernel, by
passing the data in userdata.

When you are done with a program (and all the related kernels), you do:
av_opencl_destroy_program(AVOpenCLContext *ctx, const char *program_name);

Alternatively, we could have a single program per AVOpenCLContext, if
this seems overkill.

Same for the buffer API, all the related functions could reference a
specific AVOpenCL context.

To destroy the context, you finally do:
void av_opencl_close(AVOpenCLContext *ctx);

which also cleans up all the associated structures.

...

The good thing with this approach is that you don't need a global per
application context (which is usually brittle or just broken by
design, especially for a complex library collection like FFmpeg), and
you can eventually share the same context amongst several components.

Do you think that such design could fit your use case?

Could it be actually implemented within the OpenCL framework?

We can extend this design in order to accomodate your specific use
cases.

[...]
-- 
FFmpeg = Fanciful & Foolish Mystic Political Empowered Gnome