[FFmpeg-devel] [PATCH] libavfilter: Add more operation supports in FFmpeg dnn native mode.

Pedro Arthur bygrandao at gmail.com
Mon Apr 29 05:42:42 EEST 2019


Em dom, 28 de abr de 2019 às 23:07, Guo, Yejun <yejun.guo at intel.com> escreveu:
>
>
>
> > -----Original Message-----
> > From: ffmpeg-devel [mailto:ffmpeg-devel-bounces at ffmpeg.org] On Behalf Of
> > xwmeng at pku.edu.cn
> > Sent: Sunday, April 28, 2019 5:27 PM
> > To: ffmpeg development discussions and patches <ffmpeg-devel at ffmpeg.org>
> > Subject: [FFmpeg-devel] [PATCH] libavfilter: Add more operation supports in
> > FFmpeg dnn native mode.
> >
> > This patch is for the support of derain filter project in GSoC. It adds supports for
> > the following operations:
> >
> >
> >
> >
> >  (1) Conv padding method: "SAME" and "VALID"
> >
> >  (2) Dilation
> >
> >  (3) Activation: "NONE" and "LEAKY_RELU"
>
> how about separate this single patch into 3 patches.
>
> >
> >
> >
> >
> > These operations are all needed in derain filter. And if modify the dnn native
> > mode in FFmpeg, the generation process of Super Resolution model should be
> > changed accordingly, e.g. add padding method parameter (= 0) and dilation
> > parameter (= 1).
>
> you can create a PR at https://github.com/HighVoltageRocknRoll/sr
>
> >
> >
> >
> >
> > In addition, I have a question about the Super Resulotion implementation. The
> > model training process of SR uses "VALID" method. According to my
> > understanding of "VALID" mode in tensorflow, the size of output image should
> > be smaller than the current design in SR. Because pixels near the boundary are
> > not processed in "VALID" mode, however, these unprocessed pixels are filled
> > with adjacent pixels in current dnn native mode. I wonder why to do like this
> > here.
>
> I have the same concern that why the native model is not exactly the same as tf model,
> the pad layer is missed, and the native model also change the behavior of pad parameter of conv layer.
>
> it is only suitable for vf_sr, and not general for other models.
>
I think for training these filters the preferred method is VALID as it
uses only the data available (without filling the borders) and gives
the best possible result.
However for inference usually one expects to output an image with the
same size of the original (imagine the case of chained filters where
each one reduces the image by a few pixels, in the end one may have a
useless output).
Therefore it makes perfect sense to use different padding methods for
training/inference.

The clamp_to_edge padding was introduced before the TF backend thus it
stayed in the native backend even after the introduction of the TF
backend.
Indeed the clamp_to_edge is simpler than the other padding methods and
also gives a slight better result, If I remember correct the student
which implemented the TF backend did not find an equivalent padding
method in TF, thats why it uses different paddings.

> >
> >
> >
> >
> > From 4d92ef21a5acf064122c51f442d0e2f5437b3343 Mon Sep 17 00:00:00
> > 2001
> > From: Xuewei Meng <xwmeng at pku.edu.cn>
> > Date: Sun, 28 Apr 2019 17:21:35 +0800
> > Subject: [PATCH] Add operation supports in dnn_native
> >
> > Signed-off-by: Xuewei Meng <xwmeng at pku.edu.cn>
> > ---
> >  libavfilter/dnn_backend_native.c | 36 +++++++++++++++++++++-----------
> >  libavfilter/dnn_backend_native.h |  6 +++++-
> >  2 files changed, 29 insertions(+), 13 deletions(-)
> >
> > diff --git a/libavfilter/dnn_backend_native.c b/libavfilter/dnn_backend_native.c
> > index 70d857f5f2..0e3ef5d64d 100644
> > --- a/libavfilter/dnn_backend_native.c
> > +++ b/libavfilter/dnn_backend_native.c
> > @@ -157,13 +157,15 @@ DNNModel *ff_dnn_load_model_native(const char
> > *model_filename)
> >                  ff_dnn_free_model_native(&model);
> >                  return NULL;
> >              }
> > +            conv_params->dilation =
> > (int32_t)avio_rl32(model_file_context);
> > +            conv_params->padding_method =
> > (int32_t)avio_rl32(model_file_context);
> >              conv_params->activation =
> > (int32_t)avio_rl32(model_file_context);
> >              conv_params->input_num =
> > (int32_t)avio_rl32(model_file_context);
> >              conv_params->output_num =
> > (int32_t)avio_rl32(model_file_context);
> >              conv_params->kernel_size =
> > (int32_t)avio_rl32(model_file_context);
> >              kernel_size = conv_params->input_num *
> > conv_params->output_num *
> >                            conv_params->kernel_size *
> > conv_params->kernel_size;
> > -            dnn_size += 16 + (kernel_size + conv_params->output_num <<
> > 2);
> > +            dnn_size += 24 + (kernel_size + conv_params->output_num <<
> > 2);
> >              if (dnn_size > file_size || conv_params->input_num <= 0 ||
> >                  conv_params->output_num <= 0 ||
> > conv_params->kernel_size <= 0){
> >                  avio_closep(&model_file_context);
> > @@ -221,23 +223,28 @@ DNNModel *ff_dnn_load_model_native(const char
> > *model_filename)
> >
> >  static void convolve(const float *input, float *output, const
> > ConvolutionalParams *conv_params, int width, int height)
> >  {
> > -    int y, x, n_filter, ch, kernel_y, kernel_x;
> >      int radius = conv_params->kernel_size >> 1;
> >      int src_linesize = width * conv_params->input_num;
> >      int filter_linesize = conv_params->kernel_size *
> > conv_params->input_num;
> >      int filter_size = conv_params->kernel_size * filter_linesize;
> > +    int pad_size = (conv_params->padding_method == VALID) ?
> > (conv_params->kernel_size - 1) / 2 * conv_params->dilation : 0;
>
> for parameter 'valid', the size of feature map is changed, it should be reflected at function set_input_output_native,
> for example, the size of network->layers[layer].output should be changed, and we might add the size info into struct Layer.
>
> >
> > -    for (y = 0; y < height; ++y){
> > -        for (x = 0; x < width; ++x){
> > -            for (n_filter = 0; n_filter < conv_params->output_num;
> > ++n_filter){
> > +    for (int y = pad_size; y < height - pad_size; ++y){
> > +        for (int x = pad_size; x < width - pad_size; ++x){
> > +            for (int n_filter = 0; n_filter < conv_params->output_num;
> > ++n_filter){
> >                  output[n_filter] = conv_params->biases[n_filter];
> > -                for (ch = 0; ch < conv_params->input_num; ++ch){
> > -                    for (kernel_y = 0; kernel_y <
> > conv_params->kernel_size; ++kernel_y){
> > -                        for (kernel_x = 0; kernel_x <
> > conv_params->kernel_size; ++kernel_x){
> > -                            output[n_filter] +=
> > input[CLAMP_TO_EDGE(y + kernel_y - radius, height) * src_linesize +
> > -
> > CLAMP_TO_EDGE(x + kernel_x - radius, width) * conv_params->input_num + ch]
>
> to compatible with vf_sr.c, as a step by step method, we can keep clamp_to_edge at the first step.
>
> it means that we can support 3 parameters for conv pad, same, valid, and this extra same_clamp_to_edge,
> we can remove same_clamp_to_edge after all the things are settled.
>
> > *
> > -
> > conv_params->kernel[n_filter * filter_size + kernel_y * filter_linesize +
> > -
> > kernel_x * conv_params->input_num + ch];
> > +
> > +                for (int ch = 0; ch < conv_params->input_num; ++ch){
> > +                    for (int kernel_y = 0; kernel_y <
> > conv_params->kernel_size; ++kernel_y){
> > +                        for (int kernel_x = 0; kernel_x <
> > conv_params->kernel_size; ++kernel_x){
> > +                            int y_pos = y + (kernel_y - radius) *
> > conv_params->dilation;
> > +                            int x_pos = x + (kernel_x - radius) *
> > conv_params->dilation;
> > +
> > +                            float input_pel = (x_pos < 0 || x_pos >=
> > width || y_pos < 0 || y_pos >= height) ? 0.0 :
> > +                                               input[y_pos *
> > src_linesize + x_pos * conv_params->input_num + ch];
> > +
> > +                            output[n_filter] += input_pel *
> > conv_params->kernel[n_filter * filter_size + kernel_y * filter_linesize +
> > +
> > kernel_x * conv_params->input_num + ch];
> >                          }
> >                      }
> >                  }
> > @@ -250,6 +257,11 @@ static void convolve(const float *input, float *output,
> > const ConvolutionalParam
> >                      break;
> >                  case SIGMOID:
> >                      output[n_filter] = 1.0f / (1.0f + exp(-output[n_filter]));
> > +                    break;
> > +                case NONE:
> > +                    break;
> > +                case LEAKY_RELU:
> > +                    output[n_filter] = FFMAX(output[n_filter], 0.0) + 0.2 *
> > FFMIN(output[n_filter], 0.0);
> >                  }
> >              }
> >              output += conv_params->output_num;
> > diff --git a/libavfilter/dnn_backend_native.h b/libavfilter/dnn_backend_native.h
> > index 51d4cac955..f7d4eb823b 100644
> > --- a/libavfilter/dnn_backend_native.h
> > +++ b/libavfilter/dnn_backend_native.h
> > @@ -32,7 +32,9 @@
> >
> >  typedef enum {INPUT, CONV, DEPTH_TO_SPACE} DNNLayerType;
> >
> > -typedef enum {RELU, TANH, SIGMOID} DNNActivationFunc;
> > +typedef enum {RELU, TANH, SIGMOID, NONE, LEAKY_RELU}
> > DNNActivationFunc;
> > +
> > +typedef enum {VALID, SAME} DNNPaddingFunc;
> >
> >  typedef struct Layer{
> >      DNNLayerType type;
> > @@ -43,6 +45,8 @@ typedef struct Layer{
> >  typedef struct ConvolutionalParams{
> >      int32_t input_num, output_num, kernel_size;
> >      DNNActivationFunc activation;
> > +    DNNPaddingFunc padding_method;
> > +    int32_t dilation;
> >      float *kernel;
> >      float *biases;
> >  } ConvolutionalParams;
> > --
> > 2.17.1
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > ffmpeg-devel mailing list
> > ffmpeg-devel at ffmpeg.org
> > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> >
> > To unsubscribe, visit link above, or email
> > ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".


More information about the ffmpeg-devel mailing list