[FFmpeg-devel] [PATCH V6 4/6] lavu: add side data AV_FRAME_DATA_BOUNDING_BOXES

Thu Apr 1 14:31:31 EEST 2021

> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of Anton
> Khirnov
> Sent: 2021年4月1日 16:13
> To: FFmpeg development discussions and patches <ffmpeg-devel at ffmpeg.org>
> Subject: Re: [FFmpeg-devel] [PATCH V6 4/6] lavu: add side data
> AV_FRAME_DATA_BOUNDING_BOXES
> 
> Quoting Guo, Yejun (2021-03-26 09:09:29)
> > Signed-off-by: Guo, Yejun <yejun.guo at intel.com>
> > ---
> >  doc/APIchanges          |  2 ++
> >  libavutil/Makefile      |  1 +
> >  libavutil/boundingbox.h | 79
> +++++++++++++++++++++++++++++++++++++++++
> >  libavutil/frame.c       |  1 +
> >  libavutil/frame.h       |  7 ++++
> >  5 files changed, 90 insertions(+)
> > +
> > +#ifndef AVUTIL_BOUNDINGBOX_H
> > +#define AVUTIL_BOUNDINGBOX_H
> > +
> > +#include "libavutil/rational.h"
> > +
> > +typedef struct AVBoundingBox {
> > +    /**
> > +     * Distance in pixels from the top edge of the frame to top
> > +     * and bottom, and from the left edge of the frame to left and
> > +     * right, defining the bounding box.
> > +     */
> > +    int top;
> > +    int left;
> > +    int bottom;
> > +    int right;
> 
> Is this allowed to be negative? Can/should this be size_t?

There was a long discussion about size_t/int/uint32_t when I added
struct AVRegionOfInterest in frame.h, and finally size_t is not chosen.

> 
> > +
> > +#define AV_BBOX_LABEL_NAME_MAX_SIZE 32
> > +
> > +    /**
> > +     * Detect result with confidence
> > +     */
> > +    char detect_label[AV_BBOX_LABEL_NAME_MAX_SIZE];
> > +    AVRational detect_confidence;
> > +
> > +    /**
> > +     * At most 4 classifications based on the detected bounding box.
> > +     * For example, we can get max 4 different attributes with 4 different
> > +     * DNN models on one bounding box.
> > +     * classify_count is zero if no classification.
> > +     */
> > +#define AV_NUM_BBOX_CLASSIFY 4
> > +    uint32_t classify_count;
> 
> size_t?
> 
> > +    char
> classify_labels[AV_NUM_BBOX_CLASSIFY][AV_BBOX_LABEL_NAME_MAX_SIZE];
> > +    AVRational classify_confidences[AV_NUM_BBOX_CLASSIFY];
> > +} AVBoundingBox;
> > +
> > +typedef struct AVBoundingBoxHeader {
> 
> This structure is not extensible in an ABI-compatible way.

yes, we can add a version number (enum AVBBoxHeaderVersion, see below) at the
beginning of the struct, and the user needs to set it with AV_BBOX_HEADER_CURRENT.
It is not elegant, is this what we really want?

enum AVBBoxHeaderVersion {
  AV_BBOX_HEADER_NONE = 0,
  AV_BBOX_HEADER_V1,
  AV_BBOX_HEADER_CURRENT = AV_BBOX_HEADER_V1,
};

typedef struct AVBoundingBoxHeader {
  enum AVBBoxHeaderVersion version;
  union {
    AVBoundingBoxHeaderV1 v1;
  } header;
} AVBoundingBoxHeader;

We have carefully thought/discussed about the struct for months, and there's
little chance for later update, imho.

> 
> > +    /**
> > +     * Information about how the bounding box is generated.
> > +     * for example, the DNN model name.
> > +     */
> > +    char source[128];
> > +
> > +    /**
> > +     * The size of frame when it is detected.
> > +     */
> > +    int frame_width;
> > +    int frame_height;
> 
> Why?

so we know if the value of top/left/bottom/right in AVBoundingBox
can be used directly in other later filters.

for example, the filter chain is: dnn_detect -> scale -> dnn_classify (in plan)

In filter dnn_classify, it needs to check the frame size to know if the values
in bounding boxes (the detection result) are still valid.

> 
> > +
> > +    /**
> > +     * Number of bounding boxes.
> > +     */
> > +    uint32_t nb_bbox;
> 
> size_t?
> 
> --
> Anton Khirnov
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email ffmpeg-devel-request at ffmpeg.org
> with subject "unsubscribe".