[FFmpeg-devel] [PATCH V4 2/4] libavfilter/buffersink.c: unref private_ref when frame leaves libavfilter

Thu Mar 4 13:43:34 EET 2021

> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of Paul B
> Mahol
> Sent: 2021年3月4日 17:26
> To: FFmpeg development discussions and patches <ffmpeg-devel at ffmpeg.org>
> Subject: Re: [FFmpeg-devel] [PATCH V4 2/4] libavfilter/buffersink.c: unref
> private_ref when frame leaves libavfilter
> 
> On Mon, Mar 1, 2021 at 4:46 PM Guo, Yejun <yejun.guo at intel.com> wrote:
> 
> >
> >
> > > -----Original Message-----
> > > From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of
> > Nicolas
> > > George
> > > Sent: 2021年3月1日 23:07
> > > To: FFmpeg development discussions and patches
> > > <ffmpeg-devel at ffmpeg.org>
> > > Subject: Re: [FFmpeg-devel] [PATCH V4 2/4] libavfilter/buffersink.c:
> > unref
> > > private_ref when frame leaves libavfilter
> > >
> > > Guo, Yejun (12021-03-01):
> > > > Actually, I think private_ref in libavfilter can only be used for
> > > > an exclusive usage at a time.
> > >
> > > Exactly. If we use it for this, then we cannot use for anything else
> > > in
> > libavfilter.
> > > This use seems too specific to warrant dedicating such an unique
> > > field
> > to it,
> > > even though we do not have a better use in sight.
> > >
> > > > As Paul mentioned, I think AVFrame.metadata is a better choice.
> > >
> > > If you can express it as a string or set of strings with a clear
> > > syntax
> > that can
> > > easily be parsed, then possibly, yes.
> >
> > ooo, it is not easy to express the bounding boxes as strings in
> > AVDictionaryEntry.value, the bounding box has several data members,
> > and they are data and have high possibility to contain '\0' in the
> > middle of the data. So, we might not use AVFrame.metadata.
> >
> 
> What your bounding box actually have?
> 
> I thought it is just few numbers and string describing box, no?
> 
> 

bounding box is the term that used by object detection papers for
the objects detected in the picture, here is a visualized example at
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/img/kites_detections_output.jpg

A frame can have multiple objects (or zero object), a bounding box
is used to represent one detected object, including a rectangle, object
label, and confidence for the detection. For each object, we can also
apply 'classification filter', and so the bounding box also contains
classification labels and classification confidences. See code at
https://patchwork.ffmpeg.org/project/ffmpeg/patch/20210301132053.30264-1-yejun.guo@intel.com/.

It is possible that we present them in string, for example, "bbox0.top", 
"bbox1.detect_label" and "bbox2.classify_confidences1.den" etc.
and I think the code is not elegant, and it is not close to our final solution
(go into side data as structs).