[FFmpeg-devel] [PATCH 1/2] dnn/native: add native support for avg_pool

Mon Jul 20 12:26:15 EEST 2020

> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of Guo,
> Yejun
> Sent: Monday, July 20, 2020 01:46 PM
> To: FFmpeg development discussions and patches <ffmpeg-devel at ffmpeg.org>
> Subject: Re: [FFmpeg-devel] [PATCH 1/2] dnn/native: add native support for
> avg_pool
> 
> 
> 
> > -----Original Message-----
> > From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of Ting
> > Fu
> > Sent: 2020年7月17日 23:23
> > To: ffmpeg-devel at ffmpeg.org
> > Subject: [FFmpeg-devel] [PATCH 1/2] dnn/native: add native support for
> > avg_pool
> >
> > It can be tested with the model generated with below python script:
> >
> > import tensorflow as tf
> > import numpy as np
> > import imageio
> >
> > in_img = imageio.imread('input_odd.jpg') in_img =
> > in_img.astype(np.float32)/255.0 in_data = in_img[np.newaxis, :]
> >
> > x = tf.placeholder(tf.float32, shape=[1, None, None, 3],
> > name='dnn_in') x_pool = tf.nn.avg_pool(x, ksize=[1,2,2,1],
> > strides=[1,2,2,1], padding='SAME') #please alter the params as needed
> > y = tf.identity(x_pool, name='dnn_out')
> >
> > sess=tf.Session()
> > sess.run(tf.global_variables_initializer())
> >
> > graph_def = tf.graph_util.convert_variables_to_constants(sess,
> > sess.graph_def,
> > ['dnn_out'])
> > tf.train.write_graph(graph_def, '.', 'image_process.pb',
> > as_text=False)
> >
> > print("image_process.pb generated, please use \
> > path_to_ffmpeg/tools/python/convert.py to generate
> > image_process.model\n")
> >
> > output = sess.run(y, feed_dict={x: in_data}) imageio.imsave("out.jpg",
> > np.squeeze(output))
> >
> > Signed-off-by: Ting Fu <ting.fu at intel.com>
> > ---
> >  libavfilter/dnn/Makefile                      |   1 +
> >  libavfilter/dnn/dnn_backend_native.h          |   2 +
> >  .../dnn/dnn_backend_native_layer_avgpool.c    | 136 ++++++++++++++++++
> >  .../dnn/dnn_backend_native_layer_avgpool.h    |  35 +++++
> >  .../dnn/dnn_backend_native_layer_conv2d.h     |   3 +-
> >  libavfilter/dnn/dnn_backend_native_layers.c   |   2 +
> >  tools/python/convert_from_tensorflow.py       |  31 +++-
> >  7 files changed, 207 insertions(+), 3 deletions(-)  create mode
> > 100644 libavfilter/dnn/dnn_backend_native_layer_avgpool.c
> >  create mode 100644 libavfilter/dnn/dnn_backend_native_layer_avgpool.h
> >
[...]
> > +    int32_t input_operand_index = input_operand_indexes[0];
> > +    int number = operands[input_operand_index].dims[0];
> > +    int height = operands[input_operand_index].dims[1];
> > +    int width = operands[input_operand_index].dims[2];
> > +    int channel = operands[input_operand_index].dims[3];
> 
> the input channel should come from here, not in AvgPoolParams.
> And so as output channel.

HI Yejun,

I got it that the in_channel should come from here. Does the 'so as output channel' mean out_channel = in_channel here (since the pooling of channel is not supported)?

> 
> > +    const float *input = operands[input_operand_index].data;
> > +    const AvgPoolParams *avgpool_params = (const AvgPoolParams
> > *)parameters;
> > +
> > +    float kernel_strides = avgpool_params->strides;
> 
> why float?

In order to calculate height/kernel_strides with float output in following ceil(). Or should I multiply kernel_strides with 1.0  when using ceil function?

> 
> > +    int src_linesize = width * avgpool_params->in_channels;
> > +    DnnOperand *output_operand = &operands[output_operand_index];
> > +
> > +    if (avgpool_params->padding_method == SAME) {
> > +        height_end = height;
> > +        width_end = width;
> > +        height_radius = (avgpool_params->kernel_size - ((height - 1)
> > + % (int)
> > kernel_strides + 1));
> 
> don't need the first '(' and last ')'.

OK

> 
> why we need to consider kernel_strides here?

Because when padding_method=SAME, the tensorflow will only padding the half number of 0 pixels except the remainders.
Eg: if the width is 1080, strides=11, so the 1080%11=2
		And if ksize=5, it will fill (5-2)>>1=1 column before image and 2 columns after the image.
		And if ksize=2, so 2-2=0, so the remainder pixels just meet the need of calculating one time pooling, so no 0 pixels will be filled.
Which means the numbers of filling 0-pixels rely on the remainder-pixels.
Does the example make any sense?

> 
> > +        width_radius = (avgpool_params->kernel_size - ((width - 1) %
> > + (int)
> > kernel_strides + 1));
> 
> same as above.
> 
> > +        height_radius = height_radius < 0 ? 0 : height_radius >> 1;
> > +        width_radius = width_radius < 0 ? 0 : width_radius >> 1;
[...]
> > +    for (int y = 0; y < height_end; y += kernel_strides) {
> > +        for (int x = 0; x < width_end; x += kernel_strides) {
> > +            for (int n_filter = 0; n_filter <
> > + avgpool_params->out_channels;
> > ++n_filter) {
> []
> better to use n_channel, instead of n_filter.

Sure

> 
> > +                output[n_filter] = 0.0;
> > +                kernel_area = 0;
[...]
> > +    def dump_avg_pool_to_file(self, node, f):
> > +        assert(node.op == 'AvgPool')
> > +        self.layer_number = self.layer_number + 1
> > +        self.converted_nodes.add(node.name)
> > +        node0 = self.name_node_dict[node.input[0]]
> > +        strides = node.attr['strides']
> > +        assert(strides.list.i[1]==strides.list.i[2])
> > +        strides = strides.list.i[1]
> > +        filter_node = node.attr['ksize']
> > +        input_name = node.input[0]
> []
> we can save strides[4] and ksize[4] in .model file, and do part support in .c file.

Do you mean save all 4 numbers of strides and ksize in .model file, and extract the number we need in .c file?

> 
> > +
> > +        filter_height = filter_node.list.i[1]
> > +        filter_width = filter_node.list.i[2]
> > +
> > +        in_channels = node0.attr['shape'].shape.dim[3].size
> > +        out_channels = in_channels
> > +        padding = node.attr['padding'].s.decode("utf-8")
> > +        np.array([self.op2code[node.op], strides,
> > + self.pool_paddings[padding],
> > in_channels, out_channels,
> > +                  filter_height],dtype=np.uint32).tofile(f)
> > +
> > +        input_operand_index = self.add_operand(input_name,
> > Operand.IOTYPE_INPUT)
> > +        output_operand_index = self.add_operand(node.name,
> > Operand.IOTYPE_OUTPUT)
> > +        np.array([input_operand_index,
> > output_operand_index],dtype=np.uint32).tofile(f)
> > +
> > +
> >      def dump_layers_to_file(self, f):
> >          for node in self.nodes:
> >              if node.name in self.converted_nodes:
> > @@ -311,6 +338,8 @@ class TFConverter:
> >
> >              if node.op == 'Conv2D':
> >                  self.dump_simple_conv2d_to_file(node, f)
> > +            if node.op == 'AvgPool':
> > +                self.dump_avg_pool_to_file(node, f)
> >              elif node.op == 'DepthToSpace':
> >                  self.dump_depth2space_to_file(node, f)
> >              elif node.op == 'MirrorPad':
> > --
> > 2.17.1
> >
> > _______________________________________________
> > ffmpeg-devel mailing list
> > ffmpeg-devel at ffmpeg.org
> > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> >
> > To unsubscribe, visit link above, or email
> > ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email ffmpeg-devel-request at ffmpeg.org
> with subject "unsubscribe".