[FFmpeg-devel] [PATCH] lavfi: add colorkey_opencl filter

Mark Thompson sw at jkqxz.net
Thu Apr 11 01:10:25 EEST 2019


On 10/04/2019 04:37, Jarek Samic wrote:
> This is a direct port of the CPU filter.
> 
> Signed-off-by: Jarek Samic <cldfire3 at gmail.com>
> ---
> This is my submission for the GSoC OpenCL video filters project qualification task.
> 
> Command you can use to try it out:
> 
> ./ffmpeg -i some_video -i some_img -init_hw_device opencl=gpu -filter_hw_device gpu -filter_complex "[0:v]format=rgba, hwupload, colorkey_opencl=yellow:0.4:0.2, hwdownload, format=rgba[over];[1:v][over]overlay" output
> 
> Based on simple observation of that command running vs. an equivalent one with the CPU
> colorkey filter, it would appear that the OpenCL version is ~10-20% faster including
> the overhead to upload / download from the GPU (at least with my test input and
> hardware).
> 
> You will notice that I am using overlay rather than overlay_opencl above. I am not
> sure what's going on, but the overlay_opencl filter is not working for me: every time
> I try to use it it either stops after the first couple of frames or spits out
> duplicate frames forever. Even just running this command that is basically a copy of
> the example command to overlay an image logo on the top-left corner of an input video:
> 
> ./ffmpeg -i ../video.mp4 -i ../img.png -init_hw_device opencl=gpu -filter_hw_device gpu -filter_complex "[0:v]hwupload[a], [1:v]format=yuv420p, hwupload[b], [a][b]overlay_opencl, hwdownload, format=yuv420p" ../vid_test.mp4
> 
> Results in an infinite number of duplicate frames. (The format of the input video is
> yuv420p, to be clear.)
> 
> Before I take the time to dig in and investigate what's going on, is anyone else
> aware of what could be causing this (or any existing known issues)?

The lack of checks in overlay_opencl generally leaves a lot to be desired.  It's is intended for (and only really tested with) hardware video in YUV formats (primarily overlaying YUV or YUVA on YUV, for things like subtitles or pips), but is rather variable in how well it works at that.

A tiny bit of hacking can get me to something which works with RGBA here.  Changing:

diff --git a/libavfilter/vf_overlay_opencl.c b/libavfilter/vf_overlay_opencl.c
index e9c853203b..09070e106a 100644
--- a/libavfilter/vf_overlay_opencl.c
+++ b/libavfilter/vf_overlay_opencl.c
@@ -81,7 +81,7 @@ static int overlay_opencl_load(AVFilterContext *avctx,
     }
 
     if (main_planes == overlay_planes) {
-        if (main_desc->nb_components == overlay_desc->nb_components)
+        if (main_desc->nb_components == overlay_desc->nb_components && 0)
             kernel = "overlay_no_alpha";
         else
             kernel = "overlay_internal_alpha";

to force the upper layer alpha to take effect directly I can make it work to blend one RGBA video on top of another with:

./ffmpeg_g -y -init_hw_device vaapi=va:/dev/dri/renderD128 -init_hw_device opencl=ocl at va -hwaccel vaapi -hwaccel_output_format vaapi -i in1.mp4 -hwaccel vaapi -hwaccel_output_format vaapi -i in2.mp4 -an -filter_hw_device ocl -filter_complex '[0:v]scale_vaapi=format=rgba,hwmap,colorkey_opencl=white:0.4:0.0[over];[1:v]scale_vaapi=format=rgba,hwmap[under];[under][over]overlay_opencl,hwmap=derive_device=vaapi:reverse=1,scale_vaapi=format=nv12[out]' -map '[out]' -c:v h264_vaapi out.mp4

(Intel GPU running Beignet for interop, the whole thing manages close to 100fps at 1080p.)

Patches welcome to make overlay_opencl not be so flaky, I guess :)

>  configure                        |   1 +
>  doc/filters.texi                 |  33 +++++
>  libavfilter/Makefile             |   2 +
>  libavfilter/allfilters.c         |   1 +
>  libavfilter/opencl/colorkey.cl   |  45 ++++++
>  libavfilter/opencl_source.h      |   1 +
>  libavfilter/vf_colorkey_opencl.c | 234 +++++++++++++++++++++++++++++++
>  7 files changed, 317 insertions(+)
>  create mode 100644 libavfilter/opencl/colorkey.cl
>  create mode 100644 libavfilter/vf_colorkey_opencl.c

git says:

$ cat \[FFmpeg-devel\]\ \[PATCH\]\ lavfi\:\ add\ colorkey_opencl\ filter.eml | git am
Applying: lavfi: add colorkey_opencl filter
.git/rebase-apply/patch:234: trailing whitespace.
static int colorkey_opencl_init(AVFilterContext* avctx) 
.git/rebase-apply/patch:169: new blank line at EOF.
+
.git/rebase-apply/patch:421: new blank line at EOF.
+
warning: 3 lines add whitespace errors.

> diff --git a/configure b/configure
> index f6123f53e5..a4dd9ee167 100755
> --- a/configure
> +++ b/configure
> @@ -3410,6 +3410,7 @@ boxblur_filter_deps="gpl"
>  boxblur_opencl_filter_deps="opencl gpl"
>  bs2b_filter_deps="libbs2b"
>  colormatrix_filter_deps="gpl"
> +colorkey_opencl_filter_deps="opencl"
>  convolution_opencl_filter_deps="opencl"
>  convolve_filter_deps="avcodec"
>  convolve_filter_select="fft"
> diff --git a/doc/filters.texi b/doc/filters.texi
> index 867607d870..390c8b97cf 100644
> --- a/doc/filters.texi
> +++ b/doc/filters.texi
> @@ -19030,6 +19030,39 @@ Apply erosion filter with threshold0 set to 30, threshold1 set 40, threshold2 se
>  @end example
>  @end itemize
>  
> + at section colorkey_opencl
> +RGB colorspace color keying.
> +
> +The filter accepts the following options:
> +
> + at table @option
> + at item color
> +The color which will be replaced with transparency.
> +
> + at item similarity
> +Similarity percentage with the key color.
> +
> +0.01 matches only the exact key color, while 1.0 matches everything.
> +
> + at item blend
> +Blend percentage.
> +
> +0.0 makes pixels either fully transparent, or not transparent at all.
> +
> +Higher values result in semi-transparent pixels, with a higher transparency
> +the more similar the pixels color is to the key color.
> + at end table
> +
> + at subsection Examples
> +
> + at itemize
> + at item
> +Make every semi-green pixel in the input transparent with some slight blending:
> + at example
> +-i INPUT -vf "hwupload, colorkey_opencl=green:0.3:0.1, hwdownload" OUTPUT
> + at end example
> + at end itemize
> +
>  @section overlay_opencl
>  
>  Overlay one video on top of another.
> diff --git a/libavfilter/Makefile b/libavfilter/Makefile
> index fef6ec5c55..9589dd8747 100644
> --- a/libavfilter/Makefile
> +++ b/libavfilter/Makefile
> @@ -176,6 +176,8 @@ OBJS-$(CONFIG_CODECVIEW_FILTER)              += vf_codecview.o
>  OBJS-$(CONFIG_COLORBALANCE_FILTER)           += vf_colorbalance.o
>  OBJS-$(CONFIG_COLORCHANNELMIXER_FILTER)      += vf_colorchannelmixer.o
>  OBJS-$(CONFIG_COLORKEY_FILTER)               += vf_colorkey.o
> +OBJS-$(CONFIG_COLORKEY_OPENCL_FILTER)        += vf_colorkey_opencl.o opencl.o \
> +                                                opencl/colorkey.o
>  OBJS-$(CONFIG_COLORLEVELS_FILTER)            += vf_colorlevels.o
>  OBJS-$(CONFIG_COLORMATRIX_FILTER)            += vf_colormatrix.o
>  OBJS-$(CONFIG_COLORSPACE_FILTER)             += vf_colorspace.o colorspace.o colorspacedsp.o
> diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c
> index c51ae0f3c7..ff4eb5bf6b 100644
> --- a/libavfilter/allfilters.c
> +++ b/libavfilter/allfilters.c
> @@ -165,6 +165,7 @@ extern AVFilter ff_vf_codecview;
>  extern AVFilter ff_vf_colorbalance;
>  extern AVFilter ff_vf_colorchannelmixer;
>  extern AVFilter ff_vf_colorkey;
> +extern AVFilter ff_vf_colorkey_opencl;
>  extern AVFilter ff_vf_colorlevels;
>  extern AVFilter ff_vf_colormatrix;
>  extern AVFilter ff_vf_colorspace;
> diff --git a/libavfilter/opencl/colorkey.cl b/libavfilter/opencl/colorkey.cl
> new file mode 100644
> index 0000000000..5d8a0bb8df
> --- /dev/null
> +++ b/libavfilter/opencl/colorkey.cl
> @@ -0,0 +1,45 @@
> +/*
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +__kernel void colorkey(
> +    __read_only  image2d_t src,
> +    __write_only image2d_t dst,
> +    uchar4 colorkey_rgba,

This is only ever used as a float, so it should probably be a float on input to the kernel to avoid doing the conversion billions of times.

> +    float similarity,
> +    float blend
> +) {
> +    const sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE |
> +                              CLK_FILTER_NEAREST;
> +    int2 loc = (int2)(get_global_id(0), get_global_id(1));
> +    float4 pixel = read_imagef(src, sampler, loc);
> +
> +    float dr = pixel.s0 - (float)colorkey_rgba.s0 / 255.0;
> +    float dg = pixel.s1 - (float)colorkey_rgba.s1 / 255.0;
> +    float db = pixel.s2 - (float)colorkey_rgba.s2 / 255.0;
> +
> +    double diff = sqrt(dr * dr + dg * dg + db * db);

No need to split into scalars:

float diff = distance(pixel.xyz, convert_float3(colorkey_rgba.xyz) / 255.0f);

If the key is already normalised float then you can drop the division as well.

> +
> +    if (blend > 0.0001) {
> +        pixel.s3 = clamp((diff - similarity) / blend, 0.0, 1.0);

This resolves as clamp in double rather than float.  More generally, I think all the constants want f suffix.  (GPUs are much better at float than double, especially in consumer ones with nobbled double support so those naughty scientists can't use the cheap ones.)

> +    } else {
> +        pixel.s3 = (diff > similarity) ? 1.0 : 0.0;
> +    }

Since blend is a constant per filter invocation, is there any gain to making two separate kernels and removing the if-branch entirely?

> +
> +    write_imagef(dst, loc, pixel);
> +}
> +
> diff --git a/libavfilter/opencl_source.h b/libavfilter/opencl_source.h
> index 4118138c30..51f7178cf2 100644
> --- a/libavfilter/opencl_source.h
> +++ b/libavfilter/opencl_source.h
> @@ -20,6 +20,7 @@
>  #define AVFILTER_OPENCL_SOURCE_H
>  
>  extern const char *ff_opencl_source_avgblur;
> +extern const char *ff_opencl_source_colorkey;
>  extern const char *ff_opencl_source_colorspace_common;
>  extern const char *ff_opencl_source_convolution;
>  extern const char *ff_opencl_source_neighbor;
> diff --git a/libavfilter/vf_colorkey_opencl.c b/libavfilter/vf_colorkey_opencl.c
> new file mode 100644
> index 0000000000..4769c529b0
> --- /dev/null
> +++ b/libavfilter/vf_colorkey_opencl.c
> @@ -0,0 +1,234 @@
> +/*
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +#include "libavutil/opt.h"
> +#include "libavutil/imgutils.h"
> +#include "avfilter.h"
> +#include "formats.h"
> +#include "internal.h"
> +#include "opencl.h"
> +#include "opencl_source.h"
> +#include "video.h"
> +
> +typedef struct ColorkeyOpenCLContext {
> +    OpenCLFilterContext ocf;
> +    // Whether or not the above `OpenCLFilterContext` has been initialized
> +    int initialized;
> +
> +    cl_command_queue command_queue;
> +    cl_kernel kernel_colorkey;
> +
> +    // The color we are supposed to replace with transparency
> +    cl_uchar4 colorkey_rgba[4];
> +    // Similarity percentage compared to `colorkey_rgba`, ranging from `0.01` to `1.0`
> +    // where `0.01` matches only the key color and `1.0` matches all colors
> +    float similarity;
> +    // Blending percentage where `0.0` results in fully transparent pixels, `1.0` results
> +    // in fully opaque pixels, and numbers in between result in transparency that varies
> +    // based on the similarity to the key color
> +    float blend;
> +} ColorkeyOpenCLContext;
> +
> +static int colorkey_opencl_init(AVFilterContext* avctx) 
> +{
> +    ColorkeyOpenCLContext *ctx = avctx->priv;
> +    cl_int cle;
> +    int err;
> +
> +    err = ff_opencl_filter_load_program(avctx, &ff_opencl_source_colorkey, 1);
> +    if (err < 0)
> +        goto fail;
> +
> +    ctx->command_queue = clCreateCommandQueue(
> +        ctx->ocf.hwctx->context,
> +        ctx->ocf.hwctx->device_id,
> +        0, &cle
> +    );
> +
> +    CL_FAIL_ON_ERROR(AVERROR(EIO), "Failed to create OpenCL "
> +                     "command queue %d.\n", cle);
> +
> +    ctx->kernel_colorkey = clCreateKernel(ctx->ocf.program, "colorkey", &cle);
> +    CL_FAIL_ON_ERROR(AVERROR(EIO), "Failed to create horizontal "
> +                     "kernel %d.\n", cle);
> +
> +    ctx->initialized = 1;
> +    return 0;
> +
> +fail:
> +    if (ctx->command_queue)
> +        clReleaseCommandQueue(ctx->command_queue);
> +    if (ctx->kernel_colorkey)
> +        clReleaseKernel(ctx->kernel_colorkey);
> +    return err;
> +}
> +
> +static int filter_frame(AVFilterLink* link, AVFrame* input_frame)
> +{
> +    AVFilterContext* avctx = link->dst;
> +    AVFilterLink* outlink = avctx->outputs[0];
> +    ColorkeyOpenCLContext* colorkey_ctx = avctx->priv;
> +    AVFrame* output_frame = NULL;
> +    int err;
> +    cl_int cle;
> +    size_t global_work[2];
> +    cl_mem src, dst;
> +
> +    if (!input_frame->hw_frames_ctx)
> +        return AVERROR(EINVAL);
> +
> +    if (!colorkey_ctx->initialized) {
> +        AVHWFramesContext *input_frames_ctx =
> +            (AVHWFramesContext*)input_frame->hw_frames_ctx->data;
> +        int fmt = input_frames_ctx->sw_format;
> +
> +        // Make sure the input is a format we support
> +        if (fmt != AV_PIX_FMT_ARGB &&
> +            fmt != AV_PIX_FMT_RGBA &&
> +            fmt != AV_PIX_FMT_ABGR &&
> +            fmt != AV_PIX_FMT_BGRA &&
> +            fmt != AV_PIX_FMT_NONE

Why NONE here?

> +        ) {
> +            av_log(avctx, AV_LOG_ERROR, "unsupported (non-RGB) format in colorkey_opencl.\n");
> +            err = AVERROR(ENOSYS);
> +            goto fail;
> +        }
> +
> +        err = colorkey_opencl_init(avctx);
> +        if (err < 0)
> +            goto fail;
> +    }
> +
> +    // This filter only operates on RGB data and we know that will be on the first plane
> +    src = (cl_mem)input_frame->data[0];
> +    output_frame = ff_get_video_buffer(outlink, outlink->w, outlink->h);
> +    if (!output_frame) {
> +        err = AVERROR(ENOMEM);
> +        goto fail;
> +    }
> +    dst = (cl_mem)output_frame->data[0];
> +
> +    CL_SET_KERNEL_ARG(colorkey_ctx->kernel_colorkey, 0, cl_mem, &src);
> +    CL_SET_KERNEL_ARG(colorkey_ctx->kernel_colorkey, 1, cl_mem, &dst);
> +    CL_SET_KERNEL_ARG(colorkey_ctx->kernel_colorkey, 2, cl_uchar4, &colorkey_ctx->colorkey_rgba);
> +    CL_SET_KERNEL_ARG(colorkey_ctx->kernel_colorkey, 3, float, &colorkey_ctx->similarity);
> +    CL_SET_KERNEL_ARG(colorkey_ctx->kernel_colorkey, 4, float, &colorkey_ctx->blend);
> +
> +    err = ff_opencl_filter_work_size_from_image(avctx, global_work, input_frame, 0, 0);
> +    if (err < 0)
> +        goto fail;
> +
> +    cle = clEnqueueNDRangeKernel(
> +        colorkey_ctx->command_queue,
> +        colorkey_ctx->kernel_colorkey,
> +        2,
> +        NULL,
> +        global_work,
> +        NULL,
> +        0,
> +        NULL,
> +        NULL
> +    );
> +
> +    CL_FAIL_ON_ERROR(AVERROR(EIO), "Failed to enqueue colorkey kernel: %d.\n", cle);
> +
> +    // Run queued kernel
> +    cle = clFinish(colorkey_ctx->command_queue);
> +    CL_FAIL_ON_ERROR(AVERROR(EIO), "Failed to finish command queue: %d.\n", cle);
> +
> +    err = av_frame_copy_props(output_frame, input_frame);
> +    if (err < 0)
> +        goto fail;
> +
> +    av_frame_free(&input_frame);
> +
> +    return ff_filter_frame(outlink, output_frame);
> +
> +fail:
> +    clFinish(colorkey_ctx->command_queue);
> +    av_frame_free(&input_frame);
> +    av_frame_free(&output_frame);
> +    return err;
> +}
> +
> +static av_cold void colorkey_opencl_uninit(AVFilterContext* avctx)
> +{
> +    ColorkeyOpenCLContext *ctx = avctx->priv;
> +    cl_int cle;
> +
> +    if (ctx->kernel_colorkey) {
> +        cle = clReleaseKernel(ctx->kernel_colorkey);
> +        if (cle != CL_SUCCESS)
> +            av_log(avctx, AV_LOG_ERROR, "Failed to release "
> +                   "kernel: %d.\n", cle);
> +    }
> +
> +    if (ctx->command_queue) {
> +        cle = clReleaseCommandQueue(ctx->command_queue);
> +        if (cle != CL_SUCCESS)
> +            av_log(avctx, AV_LOG_ERROR, "Failed to release "
> +                   "command queue: %d.\n", cle);
> +    }
> +
> +    ff_opencl_filter_uninit(avctx);
> +}
> +
> +static const AVFilterPad colorkey_opencl_inputs[] = {
> +    {
> +        .name = "default",
> +        .type = AVMEDIA_TYPE_VIDEO,
> +        .filter_frame = filter_frame,
> +        .config_props = &ff_opencl_filter_config_input,
> +    },
> +    { NULL }
> +};
> +
> +static const AVFilterPad colorkey_opencl_outputs[] = {
> +    {
> +        .name = "default",
> +        .type = AVMEDIA_TYPE_VIDEO,
> +        .config_props = &ff_opencl_filter_config_output,
> +    },
> +    { NULL }
> +};
> +
> +#define OFFSET(x) offsetof(ColorkeyOpenCLContext, x)
> +#define FLAGS AV_OPT_FLAG_FILTERING_PARAM|AV_OPT_FLAG_VIDEO_PARAM
> +
> +static const AVOption colorkey_opencl_options[] = {
> +    { "color", "set the colorkey key color", OFFSET(colorkey_rgba), AV_OPT_TYPE_COLOR, { .str = "black" }, CHAR_MIN, CHAR_MAX, FLAGS },
> +    { "similarity", "set the colorkey similarity value", OFFSET(similarity), AV_OPT_TYPE_FLOAT, { .dbl = 0.01 }, 0.01, 1.0, FLAGS },
> +    { "blend", "set the colorkey key blend value", OFFSET(blend), AV_OPT_TYPE_FLOAT, { .dbl = 0.0 }, 0.0, 1.0, FLAGS },
> +    { NULL }
> +};
> +
> +AVFILTER_DEFINE_CLASS(colorkey_opencl);
> +
> +AVFilter ff_vf_colorkey_opencl = {
> +    .name           = "colorkey_opencl",
> +    .description    = NULL_IF_CONFIG_SMALL("Turns a certain color into transparency. Operates on RGB colors."),
> +    .priv_size      = sizeof(ColorkeyOpenCLContext),
> +    .priv_class     = &colorkey_opencl_class,
> +    .init           = &ff_opencl_filter_init,
> +    .uninit         = &colorkey_opencl_uninit,
> +    .query_formats  = &ff_opencl_filter_query_formats,
> +    .inputs         = colorkey_opencl_inputs,
> +    .outputs        = colorkey_opencl_outputs,
> +    .flags_internal = FF_FILTER_FLAG_HWFRAME_AWARE
> +};
> +
> 

Code looks good.

Thanks,

- Mark


More information about the ffmpeg-devel mailing list