[FFmpeg-devel] [PATCH 3/4] avfilter/vf_dnn_processing: add format GRAY8 and GRAYF32 support

Tue Dec 17 09:05:16 EET 2019

> > > > > >
> > > > > > > From: Pedro Arthur [mailto:bygrandao at gmail.com]
> > > > > > > Sent: Friday, December 13, 2019 12:45 AM
> > > > > > > To: FFmpeg development discussions and patches
> > > > > <ffmpeg-devel at ffmpeg.org>
> > > > > > > Cc: Guo, Yejun <yejun.guo at intel.com>
> > > > > > > Subject: Re: [FFmpeg-devel] [PATCH 3/4] avfilter/vf_dnn_processing:
> add
> > > > > > > format GRAY8 and GRAYF32 support
> > > > > > > Hi,
> > > > > > >
> > > > > > > how should I test this patch?
> > > > > >
> > > > > > the fourth patch of this patch set is the fate test for this feature, so I
> > > ignored
> > > > > comments here.
> > > > > > I'll add the test descriptions back in v2.
> > > > > >
> > > > > > >
> > > > > > > Em sex., 22 de nov. de 2019 às 04:57, Guo, Yejun <yejun.guo at
> > > intel.com>
> > > > > > > escreveu:
> > > > > > >
> > > > > > > > Signed-off-by: Guo, Yejun <yejun.guo at intel.com>
> > > > > > > > ---
> > > > > > > >  doc/filters.texi                |   8 ++-
> > > > > > > >  libavfilter/vf_dnn_processing.c | 147
> > > > > > > > ++++++++++++++++++++++++++++++----------
> > > > > > > >  2 files changed, 118 insertions(+), 37 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/doc/filters.texi b/doc/filters.texi
> > > > > > > > index 1f86ae1..c3f7997 100644
> > > > > > > > --- a/doc/filters.texi
> > > > > > > > +++ b/doc/filters.texi
> > > > > > > > @@ -8992,7 +8992,13 @@ Set the input name of the dnn
> network.
> > > > > > > >  Set the output name of the dnn network.
> > > > > > > >
> > > > > > > >  @item fmt
> > > > > > > > -Set the pixel format for the Frame. Allowed values are
> > > > > > > > @code{AV_PIX_FMT_RGB24}, and @code{AV_PIX_FMT_BGR24}.
> > > > > > > > +Set the pixel format for the Frame, the value is determined by the
> > > input
> > > > > > > > of the dnn network model.
> > > > > > > >
> > > > > > > This sentence is a bit confusing, also I think this property should be
> > > > > > > removed. (I will explain bellow).
> > > > > >
> > > > > > sure, no problem.
> > > > > >
> > > > > > >
> > > > > > > +
> > > > > > > > +If the model handles RGB (or BGR) image and the data type of
> model
> > > > > input
> > > > > > > > is uint8, fmt must be @code{AV_PIX_FMT_RGB24} (or
> > > > > > > @code{AV_PIX_FMT_BGR24}.
> > > > > > > > +If the model handles RGB (or BGR) image and the data type of
> model
> > > > > input
> > > > > > > > is float, fmt must be @code{AV_PIX_FMT_RGB24} (or
> > > > > > > @code{AV_PIX_FMT_BGR24},
> > > > > > > > and this filter will do data type conversion internally.
> > > > > > > > +If the model handles GRAY image and the data type of model
> input is
> > > > > > > > uint8, fmt must be @code{AV_PIX_FMT_GRAY8}.
> > > > > > > > +If the model handles GRAY image and the data type of model
> input is
> > > > > > > > float, fmt must be @code{AV_PIX_FMT_GRAYF32}.
> > > > > > > > +
> > > > > > > >  Default value is @code{AV_PIX_FMT_RGB24}.
> > > > > > > >
> > > > > > > >  @end table
> > > > > > > > diff --git a/libavfilter/vf_dnn_processing.c
> > > > > > > > b/libavfilter/vf_dnn_processing.c
> > > > > > > > index ce976ec..963dd5e 100644
> > > > > > > > --- a/libavfilter/vf_dnn_processing.c
> > > > > > > > +++ b/libavfilter/vf_dnn_processing.c
> > > > > > > > @@ -70,10 +70,12 @@ static av_cold int init(AVFilterContext
> > > *context)
> > > > > > > >  {
> > > > > > > >      DnnProcessingContext *ctx = context->priv;
> > > > > > > >      int supported = 0;
> > > > > > > > -    // as the first step, only rgb24 and bgr24 are supported
> > > > > > > > +    // to support more formats
> > > > > > > >      const enum AVPixelFormat supported_pixel_fmts[] = {
> > > > > > > >          AV_PIX_FMT_RGB24,
> > > > > > > >          AV_PIX_FMT_BGR24,
> > > > > > > > +        AV_PIX_FMT_GRAY8,
> > > > > > > > +        AV_PIX_FMT_GRAYF32,
> > > > > > > >      };
> > > > > > > >      for (int i = 0; i < sizeof(supported_pixel_fmts) / sizeof(enum
> > > > > > > > AVPixelFormat); ++i) {
> > > > > > > >          if (supported_pixel_fmts[i] == ctx->fmt) {
> > > > > > > > @@ -156,14 +158,38 @@ static int config_input(AVFilterLink
> *inlink)
> > > > > > > >          return AVERROR(EIO);
> > > > > > > >      }
> > > > > > > >
> > > > > > > I think the filter should not check formats manually in the init
> function
> > > > > > > (unless I'm missing something), it would be best if you query for all
> the
> > > > > > > above supported formats in query_formats and later in config_input
> you
> > > > > make
> > > > > > > sure the expected model format matches the frame format.
> > > > > >
> > > > > > I'm afraid it is too late if we find the format mismatch in function
> > > > > config_input.
> > > > > >
> > > > > > For example, the dnn module is designed to accept BGR24 data, and
> the
> > > > > actual
> > > > > > format comes into config_input is RGB24 or YUV420P (we'll add yuv
> > > formats
> > > > > later in
> > > > > > supported pixel fmts) or something else such as GRAY8. We have two
> > > choices:
> > > > > >
> > > > > > - return error, and the application ends.
> > > > > >   This is not what we want.
> > > > > > - return no_error, and do format conversion at the beginning of
> function
> > > > > filter_frame.
> > > > > >   It makes this filter complex, and our implementation for the
> conversion
> > > > > might not be the best optimized.
> > > > > >   My idea is to keep this filter simple. And the users can choose what
> > > they
> > > > > want to do conversion in the filters graph.
> > > > > >
> > > > > Indeed if the filter receives the format already converted is much
> > > > > better, but if you already have to specify the format the model
> > > > > expects as a parameter one could simply use the -pix_fmt to set the
> > > > > format instead of having one more filter parameter.
> > > > > Is there any downside if it uses "-pix_fmt" that "fmt" avoids?
> > > >
> > > > I see, your meaning is to add a format conversion filter explicitly before
> > > dnn_processing.
> > > >
> > > > so, there are two options.
> > > > 1). no parameter 'fmt' for dnn_processing
> > > > We must specify the format filter (or other filters converting format)
> explicitly,
> > > the command line looks like:
> > > > -vf "filter0, filter1, ..., format=pix_fmts=rgb24,
> > > dnn_processing=model=my_model_file, ..."
> > > >
> > > > The advantage is dnn_processing is simpler without 'fmt'.
> > > > The disadvantage is we need to change parameters of two filters (format
> and
> > > dnn_processing) when a new model with different input is used.
> > > >
> > > >
> > > > 2). dnn_processing has parameter 'fmt'.
> > > > This option supports two styles of command line:
> > > > a) -vf "filter0, filter1, ...,
> > > dnn_processing=model=my_model_file:fmt=rgb24, ..."
> > > > The filter 'format=pix_fmts=rgb24' is automatically inserted by ffmpeg, so
> we
> > > just need to change the parameters
> > > > of one filter (dnn_processing) when a new model with different input is
> used.
> > > >
> > > > b) -vf "filter0, filter1, ..., format=pix_fmts=rgb24,
> > > dnn_processing=model=my_model_file:fmt=rgb24, ..."
> > > > It is redundant for the rgb24.
> > > >
> > > >
> > > > Another point is that logically the format is tied closely with the input of
> the
> > > model, I think
> > > > it's more nature to put them together in one filter, so I prefer option 2).
> > > Anyway, each option has
> > > > advantages and disadvantages, I do not find an ideal one and not insist on
> > > option 2).
> > > >
> > > >
> > >
> > > Based on the flexibility, I prefer option 1
> >
> > thanks for the discussion, I might not explain the background well.
> > For a given dnn model, the fmt is determined. 'fmt' has two meanings, the
> format of the Frame and also denotes the input format of dnn model.
> I'm not 100% sure about what you mean, but if the model determines the
> expected format, it should be present in the model file and require no
> manual setting by the user, at least for our own model file format.

yes, I agree your opinion that the format *should* be present in the model file.
But, I checked the model file of tensorflow, caffe, pytorch and openvino, such
format information is all missing. In other words, the model file contains the
channel number (3 for rgb, 1 for gray/Y, ...) and the data type (float32 or uint8, ...),
but it does not contain the meaning of the first channel is Red (RGB24 for example) or Blue (BGR24 for example)...
and it does not denote it requires gray data or Y value when channel number is 1.

This is the reason that OpenCV function (cv::dnn::blobFromImage) has a bool flag swapRB to denote
the base format is RGB or BGR, see more detail at https://docs.opencv.org/master/d6/d0f/group__dnn.html#ga29f34df9376379a603acd8df581ac8d7

We can add a command line parameter for python script generate.py to ask user to set the format
when executing the script to convert .pb file to .model file. I'll add into todo list with low priority.

> 
> >
> > I think the advantage of option 1 is simpler, but the disadvantage of option 1
> is no flexible since
> > we need to change parameters of two filters when a new model with
> different input is used.
> I agree we have one more filter with option 1 but it seems a more
> standard (and direct) way of setting formats. If it can achieve the
> same features as option 2 I would still prefer option 1.
> If you have a specific use case which would be penalized by using
> option 1 let us know.

I do not have such use case, let's change to option 1.

> 
> >
> > > _______________________________________________
> > > ffmpeg-devel mailing list
> > > ffmpeg-devel at ffmpeg.org
> > > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> > >
> > > To unsubscribe, visit link above, or email
> > > ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".