[FFmpeg-devel] [PATCH V2 08/10] libavutil: add side data AVDnnBoundingBox for dnn based detect/classify filters

Guo, Yejun yejun.guo at intel.com
Wed Feb 17 03:46:20 EET 2021



> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of Guo,
> Yejun
> Sent: 2021年2月16日 18:37
> To: FFmpeg development discussions and patches <ffmpeg-devel at ffmpeg.org>
> Subject: Re: [FFmpeg-devel] [PATCH V2 08/10] libavutil: add side data
> AVDnnBoundingBox for dnn based detect/classify filters
> 
> 
> 
> > -----Original Message-----
> > From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of Mark
> > Thompson
> > Sent: 2021年2月16日 7:48
> > To: ffmpeg-devel at ffmpeg.org
> > Subject: Re: [FFmpeg-devel] [PATCH V2 08/10] libavutil: add side data
> > AVDnnBoundingBox for dnn based detect/classify filters
> >
> > On 11/02/2021 08:15, Guo, Yejun wrote:
> > >> -----Original Message-----
> > >> From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of
> > >> Mark Thompson
> > >> Sent: 2021年2月11日 6:19
> > >> To: ffmpeg-devel at ffmpeg.org
> > >> Subject: Re: [FFmpeg-devel] [PATCH V2 08/10] libavutil: add side
> > >> data AVDnnBoundingBox for dnn based detect/classify filters
> > >>
> > >> On 10/02/2021 09:34, Guo, Yejun wrote:
> > >>> Signed-off-by: Guo, Yejun <yejun.guo at intel.com>
> > >>> ---
> > >>>    doc/APIchanges       |  2 ++
> > >>>    libavutil/Makefile   |  1 +
> > >>>    libavutil/dnn_bbox.h | 68
> > >> ++++++++++++++++++++++++++++++++++++++++++++
> > >>>    libavutil/frame.c    |  1 +
> > >>>    libavutil/frame.h    |  7 +++++
> > >>>    libavutil/version.h  |  2 +-
> > >>>    6 files changed, 80 insertions(+), 1 deletion(-)
> > >>>    create mode 100644 libavutil/dnn_bbox.h
> > >>
> > >> What is the intended consumer of this box information?  (Is there
> > >> some other filter which will read these are do something with them,
> > >> or some sort of user
> > >> program?)
> > >>
> > >> If there is no use in ffmpeg outside libavfilter then the header
> > >> should probably be in libavfilter.
> > >
> > >
> > > Thanks for the feedback.
> > >
> > > For most case, other filters will use this box information, for
> > > example, a classify filter will recognize the car number after the
> > > car plate is detected, another filter can apply 'beauty' if a face
> > > is detected, and updated drawbox filter (in plan) can draw the box
> > > for visualization, and a new filter such as bbox_to_roi can be added
> > > to apply roi
> > encoding for the detected result.
> > >
> > > It is possible that some others will use it, for example, the new
> > > codec is adding AI labels and so libavcodec might need it in the
> > > future, and a user program might do something special such as:
> > > 1. use libavcodec to decode
> > > 2. use filter detect
> > > 3. write his own code to handle the detect result
> > >
> > > As the first step, how about to put it in the libavfilter (so do not
> > > expose it at API level and we are free to change it when needed)?
> > > And we can move it to libavutil once it is required.
> >
> > Sure.
> >
> > >> How tied is this to the DNN implementation, and hence the DNN name?
> > >> If someone made a standalone filter doing object detection by some
> > >> other method, would it make sense for them to reuse this structure?
> > >
> > > Yes, this structure is general, I add dnn prefix because of two reasons:
> > > 1. There's already bounding box in libavfilter/bbox.h, see below,
> > > it's simple and we could not reuse it, so we need a new name.
> > > typedef struct FFBoundingBox {
> > >      int x1, x2, y1, y2;
> > > } FFBoundingBox;
> >
> > Right, really this is just the return type for the internal
> > ff_calculate_bounding_box() function - if you want to reuse the name
> > externally then it would be fine to rename the existing stuff to get
> > it out of the way.
> 
> yeah, I'll consider to rename it after these patches are done, since they now
> are not conflict from compiler's perspective.
> 
> >
> > > 2. DNN is currently the dominate method for object detection.
> >
> > Unless your ID values or something else about the output are
> > DNN-specific then I'm not really seeing the attraction of associating
> > them with the DNN name for external use.  If a user wants to detect
> > some objects in an image and then do something with the result then
> > maybe they know they are using DNN for first step, but they won't care
> about where the result came from after that.
> 
> It reminds me that we might need to save some information such as model
> name, name of data set trained, name/parameters of other non-dnn
> implementations etc., and so the user knows better about the bbox. For
> example, we can add 'char source[128]'
> as box header for all the bboxes.
> 
> I'll think about it and send new patches for the side data and detect filter.

hi, I'll push the first 7 patches of this patch set tomorrow if there's no other
comment for them, and then send new patch set for side data and detect filter,
thanks.



More information about the ffmpeg-devel mailing list