[FFmpeg-devel] [PATCH] Add a filter implementing HDR image reconstruction from a single exposure using deep CNNs

Wed Oct 17 10:03:45 EEST 2018

thanks for your comments, please see inline.

> -----Original Message-----
> From: ffmpeg-devel [mailto:ffmpeg-devel-bounces at ffmpeg.org] On Behalf
> Of Paul B Mahol
> Sent: Tuesday, October 16, 2018 5:00 PM
> To: FFmpeg development discussions and patches <ffmpeg-
> devel at ffmpeg.org>
> Subject: Re: [FFmpeg-devel] [PATCH] Add a filter implementing HDR image
> reconstruction from a single exposure using deep CNNs
> 
> On 10/16/18, Guo, Yejun <yejun.guo at intel.com> wrote:
> > see the algorithm's paper and code below.
> >
> > the filter's parameter looks like:
> >
> sdr2hdr=model_filename=/path_to_tensorflow_graph.pb:out_fmtname=gb
> rp10
> > le
> >
> > The input of the deep CNN model is RGB24 while the output is float for
> > each color channel. This is the filter's default behavior to output
> > format with gbrpf32le. And gbrp10le is also supported as the output,
> > so we can see the rendering result in a player, as a reference.
> >
> > To generate the model file, we need modify the original script a little.
> > - set name='y' for y_final within script at
> > https://github.com/gabrieleilertsen/hdrcnn/blob/master/network.py
> > - add the following code to the script at
> > https://github.com/gabrieleilertsen/hdrcnn/blob/master/hdrcnn_predict.
> > py
> >
> > graph = tf.graph_util.convert_variables_to_constants(sess,
> > sess.graph_def,
> > ["y"])
> > tf.train.write_graph(graph, '.', 'graph.pb', as_text=False)
> >
> > The filter only works when tensorflow C api is supported in the
> > system, native backend is not supported since there are some different
> > types of layers in the deep CNN model, besides CONV and
> DEPTH_TO_SPACE.
> >
> > btw, as a whole solution, metadata should also be generated from the
> > sdr video, so to be encoded as a HDR video. Not supported yet.
> > This patch just focuses on this paper.
> >
> > https://arxiv.org/pdf/1710.07480.pdf:
> >   author       = "Eilertsen, Gabriel and Kronander, Joel, and Denes, Gyorgy
> > and Mantiuk, Rafal/ and Unger, Jonas",
> >   title        = "HDR image reconstruction from a single exposure using deep
> > CNNs",
> >   journal      = "ACM Transactions on Graphics (TOG)",
> >   number       = "6",
> >   volume       = "36",
> >   articleno    = "178",
> >   year         = "2017"
> >
> > https://github.com/gabrieleilertsen/hdrcnn
> > Signed-off-by: Guo, Yejun <yejun.guo at intel.com>
> > ---
> >  libavfilter/Makefile     |   1 +
> >  libavfilter/allfilters.c |   1 +
> >  libavfilter/vf_sdr2hdr.c | 283
> > +++++++++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 285 insertions(+)
> >  create mode 100644 libavfilter/vf_sdr2hdr.c
> >
> > diff --git a/libavfilter/Makefile b/libavfilter/Makefile index
> > 62cc2f5..88e7da6 100644
> > --- a/libavfilter/Makefile
> > +++ b/libavfilter/Makefile
> > @@ -360,6 +360,7 @@ OBJS-$(CONFIG_SOBEL_OPENCL_FILTER)           +=
> > vf_convolution_opencl.o opencl.o
> >  OBJS-$(CONFIG_SPLIT_FILTER)                  += split.o
> >  OBJS-$(CONFIG_SPP_FILTER)                    += vf_spp.o
> >  OBJS-$(CONFIG_SR_FILTER)                     += vf_sr.o
> > +OBJS-$(CONFIG_SDR2HDR_FILTER)                += vf_sdr2hdr.o
> 
> Alphabetical order please.

Looks that this file is not strictly in alphabetical order, I added here
because that sdr2hdr is implemented with the reference to vf_sr
which firstly introduced tensorflow C API support.

(I noticed that allfilters.c is in alphabetical order, and had followed)

> 
> >  OBJS-$(CONFIG_SSIM_FILTER)                   += vf_ssim.o framesync.o
> >  OBJS-$(CONFIG_STEREO3D_FILTER)               += vf_stereo3d.o
> >  OBJS-$(CONFIG_STREAMSELECT_FILTER)           += f_streamselect.o
> > framesync.o
> > diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c index
> > 5e72803..1645c0f 100644
> > --- a/libavfilter/allfilters.c
> > +++ b/libavfilter/allfilters.c
> > @@ -319,6 +319,7 @@ extern AVFilter ff_vf_scale_npp;  extern AVFilter
> > ff_vf_scale_qsv;  extern AVFilter ff_vf_scale_vaapi;  extern AVFilter
> > ff_vf_scale2ref;
> > +extern AVFilter ff_vf_sdr2hdr;
> >  extern AVFilter ff_vf_select;
> >  extern AVFilter ff_vf_selectivecolor;  extern AVFilter ff_vf_sendcmd;
> > diff --git a/libavfilter/vf_sdr2hdr.c b/libavfilter/vf_sdr2hdr.c new
> > file mode 100644 index 0000000..52f408e
> > --- /dev/null
> > +++ b/libavfilter/vf_sdr2hdr.c
> > @@ -0,0 +1,283 @@
> > +/*
> > + * Copyright (c) 2018 Guo Yejun
> > + *
> > + * This file is part of FFmpeg.
> > + *
> > + * FFmpeg is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU Lesser General Public
> > + * License as published by the Free Software Foundation; either
> > + * version 2.1 of the License, or (at your option) any later version.
> > + *
> > + * FFmpeg is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> GNU
> > + * Lesser General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU Lesser General Public
> > + * License along with FFmpeg; if not, write to the Free Software
> > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
> > +02110-1301
> > USA
> > + */
> > +
> > +/**
> > + * @file
> > + * Filter implementing HDR image reconstruction from a single
> > +exposure
> > using deep CNNs.
> > + * https://arxiv.org/pdf/1710.07480.pdf
> > + */
> > +
> > +#include "avfilter.h"
> > +#include "formats.h"
> > +#include "internal.h"
> > +#include "libavutil/opt.h"
> > +#include "libavutil/qsort.h"
> > +#include "libavformat/avio.h"
> > +#include "libswscale/swscale.h"
> > +#include "dnn_interface.h"
> > +#include <math.h>
> > +
> > +typedef struct SDR2HDRContext {
> > +    const AVClass *class;
> > +
> > +    char* model_filename;
> > +    char* out_fmtname;
> > +    DNNModule* dnn_module;
> > +    DNNModel* model;
> > +    DNNData input, output;
> > +    enum AVPixelFormat out_fmt;
> > +} SDR2HDRContext;
> > +
> > +#define OFFSET(x) offsetof(SDR2HDRContext, x) #define FLAGS
> > +AV_OPT_FLAG_FILTERING_PARAM | AV_OPT_FLAG_VIDEO_PARAM
> static const
> > +AVOption sdr2hdr_options[] = {
> > +    { "model_filename", "path to model file specifying network
> > +architecture
> > and its parameters", OFFSET(model_filename), AV_OPT_TYPE_STRING,
> > {.str=NULL}, 0, 0, FLAGS },
> > +    { "out_fmtname", "the data format of the filter's output, it
> > + could be
> > gbrpf32le [default] or gbrp10le", OFFSET(out_fmtname),
> > AV_OPT_TYPE_STRING, {.str=NULL}, 0, 0, FLAGS },
> 
> Please use AV_OPT_TYPE_PIX_FMT.

thanks, will fix.

> 
> > +    { NULL }
> > +};
> > +
> > +AVFILTER_DEFINE_CLASS(sdr2hdr);
> > +
> > +static av_cold int init(AVFilterContext* context) {
> > +    SDR2HDRContext* ctx = context->priv;
> > +
> > +    ctx->out_fmt = AV_PIX_FMT_GBRPF32LE;
> > +    if (ctx->out_fmtname) {
> > +        if (strncmp(ctx->out_fmtname, "gbrp10le", strlen("gbrp10le"))
> > + == 0)
> > {
> > +            ctx->out_fmt = AV_PIX_FMT_GBRP10LE;
> > +        }
> > +    }
> 
> Please use AV_OPT_TYPE_PIX_FMT.

thanks, will fix.

> 
> > +
> > +#if (CONFIG_LIBTENSORFLOW == 1)
> > +    ctx->dnn_module = ff_get_dnn_module(DNN_TF);
> > +    if (!ctx->dnn_module){
> > +        av_log(context, AV_LOG_ERROR, "could not create DNN module
> > +for
> > tensorflow backend\n");
> > +        return AVERROR(ENOMEM);
> > +    }
> > +    if (!ctx->model_filename){
> > +        av_log(context, AV_LOG_ERROR, "model file for network was not
> > specified\n");
> > +        return AVERROR(EIO);
> > +    }
> > +    else{
> > +        if (!ctx->dnn_module->load_model) {
> > +            av_log(context, AV_LOG_ERROR, "load_model for network was
> > + not
> > specified\n");
> > +            return AVERROR(EIO);
> > +        } else {
> > +            ctx->model =
> > (ctx->dnn_module->load_model)(ctx->model_filename);
> > +        }
> > +    }
> > +    if (!ctx->model){
> > +        av_log(context, AV_LOG_ERROR, "could not load DNN model\n");
> > +        return AVERROR(EIO);
> > +    }
> > +    return 0;
> > +#else
> > +    return AVERROR(EIO);
> > +#endif
> > +}
> > +
> > +static int query_formats(AVFilterContext* context) {
> > +    const enum AVPixelFormat in_formats[] = {AV_PIX_FMT_RGB24,
> > +                                             AV_PIX_FMT_NONE};
> > +    enum AVPixelFormat out_formats[2];
> > +    SDR2HDRContext* ctx = context->priv;
> > +    AVFilterFormats* formats_list;
> > +    int ret = 0;
> > +
> > +    formats_list = ff_make_format_list(in_formats);
> > +    if (!formats_list){
> > +        av_log(context, AV_LOG_ERROR, "could not create formats
> > + list\n");
> 
> Remove this log, not needed.

got it, will remove.
I added it with reference to other filters.

> 
> > +        return AVERROR(ENOMEM);
> > +    }
> > +    if ((ret = ff_formats_ref(formats_list,
> > &context->inputs[0]->out_formats)) < 0)
> > +        return ret;
> > +
> > +
> > +    out_formats[0] = ctx->out_fmt;
> > +    out_formats[1] = AV_PIX_FMT_NONE;
> > +    formats_list = ff_make_format_list(out_formats);
> > +    if (!formats_list){
> > +        av_log(context, AV_LOG_ERROR, "could not create formats
> > + list\n");
> 
> Remove this log, not needed.

will remove, see comment above.

> 
> > +        return AVERROR(ENOMEM);
> > +    }
> > +    if ((ret = ff_formats_ref(formats_list,
> > &context->outputs[0]->in_formats)) < 0)
> > +        return ret;
> > +
> > +    return 0;
> > +}
> > +
> > +static int config_props(AVFilterLink* inlink) {
> > +    AVFilterContext* context = inlink->dst;
> > +    SDR2HDRContext* ctx = context->priv;
> > +    AVFilterLink* outlink = context->outputs[0];
> > +    DNNReturnType result;
> > +
> > +    // the dnn model is tied with resolution due to deconv layer of
> > tensorflow
> > +    // now just support 1920*1080 and so the magic numbers within this file
> > +    if (inlink->w != 1920 || inlink->h != 1080)
> > +        return AVERROR(EIO);
> > +
> > +    ctx->input.width = 1920;
> > +    ctx->input.height = 1088;  //the model requires height is a
> > + multiple of
> > 32,
> > +    ctx->input.channels = 3;
> > +
> > +    result = (ctx->model->set_input_output)(ctx->model->model,
> > + &ctx->input,
> > &ctx->output);
> > +    if (result != DNN_SUCCESS){
> > +        av_log(context, AV_LOG_ERROR, "could not set input and output
> > + for
> > the model\n");
> > +        return AVERROR(EIO);
> > +    }
> > +    else{
> > +        memset(ctx->input.data, 0, ctx->input.channels *
> > + ctx->input.width *
> > ctx->input.height * sizeof(float));
> > +        outlink->h = 1080;
> > +        outlink->w = 1920;
> > +        return 0;
> > +    }
> > +}
> > +
> > +static float qsort_comparison_function_float(const void *a, const
> > +void *b) {
> > +    return *(const float *)a - *(const float *)b; }
> > +
> > +static int filter_frame(AVFilterLink* inlink, AVFrame* in) {
> > +    DNNReturnType dnn_result = DNN_SUCCESS;
> > +    AVFilterContext* context = inlink->dst;
> > +    SDR2HDRContext* ctx = context->priv;
> > +    AVFilterLink* outlink = context->outputs[0];
> > +    AVFrame* out = ff_get_video_buffer(outlink, outlink->w, outlink->h);
> > +    int total_pixels = in->height * in->width;
> > +
> > +    av_frame_copy_props(out, in);
> > +
> > +    for (int i = 0; i < total_pixels * 3; ++i) {
> > +        ctx->input.data[i] = in->data[0][i] / 255.0f;
> 
> Incorrect code. Use in->linesize[0].

based on the filter's assumption to only support 1080p RGB24, in functionality, the code is correct. 
looks that in->linesize[0] is commonly used, I'll change it to be in->linesize[0]*in->height.

> 
> > +    }
> > +
> > +    dnn_result = (ctx->dnn_module->execute_model)(ctx->model);
> > +    if (dnn_result != DNN_SUCCESS){
> > +        av_log(context, AV_LOG_ERROR, "failed to execute loaded
> model\n");
> > +        return AVERROR(EIO);
> > +    }
> > +
> > +    if (ctx->out_fmt == AV_PIX_FMT_GBRPF32LE) {
> > +        float* outg = (float*)out->data[0];
> > +        float* outb = (float*)out->data[1];
> > +        float* outr = (float*)out->data[2];
> > +        for (int i = 0; i < total_pixels; ++i) {
> > +            float r = ctx->output.data[i*3];
> > +            float g = ctx->output.data[i*3+1];
> > +            float b = ctx->output.data[i*3+2];
> > +            outr[i] = r;
> > +            outg[i] = g;
> > +            outb[i] = b;
> > +        }
> > +    } else {
> > +        // here, we just use a rough mapping to the 10bit contents
> > +        // meta data generation for HDR video encoding is not supported yet
> > +        float* converted_data = (float*)malloc(total_pixels * 3 *
> > sizeof(float));
> > +        short* outg = (short*)out->data[0];
> > +        short* outb = (short*)out->data[1];
> > +        short* outr = (short*)out->data[2];
> > +
> > +        float max = 1.0f;
> > +        for (int i = 0; i < total_pixels * 3; ++i) {
> > +            float d = ctx->output.data[i];
> > +            d = sqrt(d);
> > +            converted_data[i] = d;
> > +            if (d > max)
> > +                max = d;
> 
> Please use FFMIN() macro.

thanks, will use FFMIN/FFMAX here and in following code.

> 
> > +        }
> > +
> > +        if (max > 1.0f) {
> > +            AV_QSORT(converted_data, total_pixels * 3, float,
> > qsort_comparison_function_float);
> > +            // 0.5% pixels are clipped
> > +            max = converted_data[(int)(total_pixels * 3 * 0.995)];
> > +
> > +            if (max < 1.0f)
> > +                max = 1.0f;
> > +
> > +            for (int i = 0; i < total_pixels * 3; ++i) {
> > +                float d = ctx->output.data[i];
> > +                d = sqrt(d);
> > +                d = d < max ? d : max;
> > +                converted_data[i] = d;
> > +            }
> > +        }
> > +
> > +        for (int i = 0; i < total_pixels; ++i) {
> > +            float r = converted_data[i*3];
> > +            float g = converted_data[i*3+1];
> > +            float b = converted_data[i*3+2];
> > +            outr[i] = r / max * 1023;
> > +            outg[i] = g / max * 1023;
> > +            outb[i] = b / max * 1023;
> > +        }
> > +
> > +        free(converted_data);
> > +    }
> > +
> > +    av_frame_free(&in);
> > +    return ff_filter_frame(outlink, out); }
> > +
> > +static av_cold void uninit(AVFilterContext* context) {
> > +    SDR2HDRContext* ctx = context->priv;
> > +
> > +    if (ctx->dnn_module){
> > +        (ctx->dnn_module->free_model)(&ctx->model);
> > +        av_freep(&ctx->dnn_module);
> > +    }
> > +}
> > +
> > +static const AVFilterPad sdr2hdr_inputs[] = {
> > +    {
> > +        .name         = "default",
> > +        .type         = AVMEDIA_TYPE_VIDEO,
> > +        .config_props = config_props,
> > +        .filter_frame = filter_frame,
> > +    },
> > +    { NULL }
> > +};
> > +
> > +static const AVFilterPad sdr2hdr_outputs[] = {
> > +    {
> > +        .name = "default",
> > +        .type = AVMEDIA_TYPE_VIDEO,
> > +    },
> > +    { NULL }
> > +};
> > +
> > +AVFilter ff_vf_sdr2hdr = {
> > +    .name          = "sdr2hdr",
> > +    .description   = NULL_IF_CONFIG_SMALL("HDR image reconstruction
> from a
> > single exposure using deep CNNs."),
> > +    .priv_size     = sizeof(SDR2HDRContext),
> > +    .init          = init,
> > +    .uninit        = uninit,
> > +    .query_formats = query_formats,
> > +    .inputs        = sdr2hdr_inputs,
> > +    .outputs       = sdr2hdr_outputs,
> > +    .priv_class    = &sdr2hdr_class,
> > +    .flags         = AVFILTER_FLAG_SUPPORT_TIMELINE_GENERIC |
> > AVFILTER_FLAG_SLICE_THREADS,
> > +};
> > --
> > 2.7.4
> >
> > _______________________________________________
> > ffmpeg-devel mailing list
> > ffmpeg-devel at ffmpeg.org
> > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> >
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel