[FFmpeg-devel] [PATCH V6 4/6] lavu: add side data AV_FRAME_DATA_BOUNDING_BOXES

Guo, Yejun yejun.guo at intel.com
Fri Apr 2 05:27:17 EEST 2021



> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of Guo,
> Yejun
> Sent: 2021年4月1日 19:32
> To: FFmpeg development discussions and patches <ffmpeg-devel at ffmpeg.org>
> Subject: Re: [FFmpeg-devel] [PATCH V6 4/6] lavu: add side data
> AV_FRAME_DATA_BOUNDING_BOXES
> 
> 
> 
> > -----Original Message-----
> > From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of
> Anton
> > Khirnov
> > Sent: 2021年4月1日 16:13
> > To: FFmpeg development discussions and patches
> <ffmpeg-devel at ffmpeg.org>
> > Subject: Re: [FFmpeg-devel] [PATCH V6 4/6] lavu: add side data
> > AV_FRAME_DATA_BOUNDING_BOXES
> >
> > Quoting Guo, Yejun (2021-03-26 09:09:29)
> > > Signed-off-by: Guo, Yejun <yejun.guo at intel.com>
> > > ---
> > >  doc/APIchanges          |  2 ++
> > >  libavutil/Makefile      |  1 +
> > >  libavutil/boundingbox.h | 79
> > +++++++++++++++++++++++++++++++++++++++++
> > >  libavutil/frame.c       |  1 +
> > >  libavutil/frame.h       |  7 ++++
> > >  5 files changed, 90 insertions(+)
> > > +
> > > +#ifndef AVUTIL_BOUNDINGBOX_H
> > > +#define AVUTIL_BOUNDINGBOX_H
> > > +
> > > +#include "libavutil/rational.h"
> > > +
> > > +typedef struct AVBoundingBox {
> > > +    /**
> > > +     * Distance in pixels from the top edge of the frame to top
> > > +     * and bottom, and from the left edge of the frame to left and
> > > +     * right, defining the bounding box.
> > > +     */
> > > +    int top;
> > > +    int left;
> > > +    int bottom;
> > > +    int right;
> >
> > Is this allowed to be negative? Can/should this be size_t?
> 
> There was a long discussion about size_t/int/uint32_t when I added
> struct AVRegionOfInterest in frame.h, and finally size_t is not chosen.
> 
> >
> > > +
> > > +#define AV_BBOX_LABEL_NAME_MAX_SIZE 32
> > > +
> > > +    /**
> > > +     * Detect result with confidence
> > > +     */
> > > +    char detect_label[AV_BBOX_LABEL_NAME_MAX_SIZE];
> > > +    AVRational detect_confidence;
> > > +
> > > +    /**
> > > +     * At most 4 classifications based on the detected bounding box.
> > > +     * For example, we can get max 4 different attributes with 4 different
> > > +     * DNN models on one bounding box.
> > > +     * classify_count is zero if no classification.
> > > +     */
> > > +#define AV_NUM_BBOX_CLASSIFY 4
> > > +    uint32_t classify_count;
> >
> > size_t?
> >
> > > +    char
> >
> classify_labels[AV_NUM_BBOX_CLASSIFY][AV_BBOX_LABEL_NAME_MAX_SIZE];
> > > +    AVRational classify_confidences[AV_NUM_BBOX_CLASSIFY];
> > > +} AVBoundingBox;
> > > +
> > > +typedef struct AVBoundingBoxHeader {
> >
> > This structure is not extensible in an ABI-compatible way.
> 
> yes, we can add a version number (enum AVBBoxHeaderVersion, see below) at
> the
> beginning of the struct, and the user needs to set it with
> AV_BBOX_HEADER_CURRENT.
> It is not elegant, is this what we really want?
> 
> enum AVBBoxHeaderVersion {
>   AV_BBOX_HEADER_NONE = 0,
>   AV_BBOX_HEADER_V1,
>   AV_BBOX_HEADER_CURRENT = AV_BBOX_HEADER_V1,
> };
> 
> typedef struct AVBoundingBoxHeader {
>   enum AVBBoxHeaderVersion version;
>   union {
>     AVBoundingBoxHeaderV1 v1;
>   } header;
> } AVBoundingBoxHeader;
> 
> 
> We have carefully thought/discussed about the struct for months, and there's
> little chance for later update, imho.

And my opinion is to still use the code in this patch. For the worst case, if we need to
change the struct some time later, we still have a timing before the next release 4.5.

> 
> 
> >
> > > +    /**
> > > +     * Information about how the bounding box is generated.
> > > +     * for example, the DNN model name.
> > > +     */
> > > +    char source[128];
> > > +
> > > +    /**
> > > +     * The size of frame when it is detected.
> > > +     */
> > > +    int frame_width;
> > > +    int frame_height;
> >
> > Why?
> 
> so we know if the value of top/left/bottom/right in AVBoundingBox
> can be used directly in other later filters.
> 
> for example, the filter chain is: dnn_detect -> scale -> dnn_classify (in plan)
> 
> In filter dnn_classify, it needs to check the frame size to know if the values
> in bounding boxes (the detection result) are still valid.
> 
> >
> > > +
> > > +    /**
> > > +     * Number of bounding boxes.
> > > +     */
> > > +    uint32_t nb_bbox;
> >
> > size_t?
> >
> > --
> > Anton Khirnov
> > _______________________________________________
> > ffmpeg-devel mailing list
> > ffmpeg-devel at ffmpeg.org
> > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> >
> > To unsubscribe, visit link above, or email ffmpeg-devel-request at ffmpeg.org
> > with subject "unsubscribe".
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".


More information about the ffmpeg-devel mailing list