[FFmpeg-devel] [PATCHv2] add signature filter for MPEG7 video signature
Michael Niedermayer
michael at niedermayer.cc
Mon Apr 11 14:54:57 CEST 2016
On Mon, Apr 11, 2016 at 02:30:37PM +0200, Gerion Entrup wrote:
> On Montag, 11. April 2016 12:57:17 CEST Michael Niedermayer wrote:
> > On Mon, Apr 11, 2016 at 04:25:28AM +0200, Gerion Entrup wrote:
> > > On Donnerstag, 7. April 2016 00:35:25 CEST Michael Niedermayer wrote:
> > > > On Wed, Mar 30, 2016 at 11:02:36PM +0200, Gerion Entrup wrote:
> > > > > On Mittwoch, 30. März 2016 22:57:47 CEST Gerion Entrup wrote:
> > > > > > Add improved patch.
> > > > >
> > > > > Rebased to master.
> > > > >
> > > >
> > > > > Changelog | 1
> > > > > configure | 1
> > > > > doc/filters.texi | 70 +++
> > > > > libavfilter/Makefile | 1
> > > > > libavfilter/allfilters.c | 1
> > > > > libavfilter/signature.h | 554 ++++++++++++++++++++++++++++++
> > > > > libavfilter/signature_lookup.c | 550 ++++++++++++++++++++++++++++++
> > > > > libavfilter/version.h | 4
> > > > > libavfilter/vf_signature.c | 741 +++++++++++++++++++++++++++++++++++++++++
> > > > > 9 files changed, 1921 insertions(+), 2 deletions(-)
> > > > > 9192f27ded45c607996b4e266b6746f807c9a7fd 0001-add-signature-filter-for-MPEG7-video-signature.patch
> > > > > From 9646ed6f0cf78356cf2914a60705c98d8f21fe8a Mon Sep 17 00:00:00 2001
> > > > > From: Gerion Entrup <gerion.entrup at flump.de>
> > > > > Date: Sun, 20 Mar 2016 11:10:31 +0100
> > > > > Subject: [PATCH] add signature filter for MPEG7 video signature
> > > > >
> > > > > This filter does not implement all features of MPEG7. Missing features:
> > > > > - compression of signature files
> > > > > - work only on (cropped) parts of the video
> > > > > ---
> > > > > Changelog | 1 +
> > > > > configure | 1 +
> > > > > doc/filters.texi | 70 ++++
> > > > > libavfilter/Makefile | 1 +
> > > > > libavfilter/allfilters.c | 1 +
> > > > > libavfilter/signature.h | 554 ++++++++++++++++++++++++++++++
> > > > > libavfilter/signature_lookup.c | 550 ++++++++++++++++++++++++++++++
> > > > > libavfilter/version.h | 4 +-
> > > > > libavfilter/vf_signature.c | 741 +++++++++++++++++++++++++++++++++++++++++
> > > > > 9 files changed, 1921 insertions(+), 2 deletions(-)
> > > > > create mode 100644 libavfilter/signature.h
> > > > > create mode 100644 libavfilter/signature_lookup.c
> > > > > create mode 100644 libavfilter/vf_signature.c
> > > > >
> > > > > diff --git a/Changelog b/Changelog
> > > > > index 7b0187d..8a2b7fd 100644
> > > > > --- a/Changelog
> > > > > +++ b/Changelog
> > > > > @@ -18,6 +18,7 @@ version <next>:
> > > > > - coreimage filter (GPU based image filtering on OSX)
> > > > > - libdcadec removed
> > > > > - bitstream filter for extracting DTS core
> > > > > +- MPEG-7 Video Signature filter
> > > > >
> > > > > version 3.0:
> > > > > - Common Encryption (CENC) MP4 encoding and decoding support
> > > > > diff --git a/configure b/configure
> > > > > index e550547..fe29827 100755
> > > > > --- a/configure
> > > > > +++ b/configure
> > > > > @@ -2979,6 +2979,7 @@ showspectrum_filter_deps="avcodec"
> > > > > showspectrum_filter_select="fft"
> > > > > showspectrumpic_filter_deps="avcodec"
> > > > > showspectrumpic_filter_select="fft"
> > > > > +signature_filter_deps="gpl avcodec avformat"
> > > > > smartblur_filter_deps="gpl swscale"
> > > > > sofalizer_filter_deps="netcdf avcodec"
> > > > > sofalizer_filter_select="fft"
> > > > > diff --git a/doc/filters.texi b/doc/filters.texi
> > > > > index 5d6cf52..a95f5a7 100644
> > > > > --- a/doc/filters.texi
> > > > > +++ b/doc/filters.texi
> > > > > @@ -11559,6 +11559,76 @@ saturation maximum: %@{metadata:lavfi.signalstats.SATMAX@}
> > > > > @end example
> > > > > @end itemize
> > > > >
> > > > > + at anchor{signature}
> > > > > + at section signature
> > > > > +
> > > > > +Calculates the MPEG-7 Video Signature. The filter could handle more than one
> > > > > +input. In this case the matching between the inputs could be calculated. The
> > > > > +filter passthrough the first input. The output is written in XML.
> > > > > +
> > > > > +It accepts the following options:
> > > > > +
> > > > > + at table @option
> > > > > + at item mode
> > > >
> > > > > +Enable the calculation of the matching. The option value must be 0 (to disable
> > > > > +or 1 (to enable). Optionally you can set the mode to 2. Then the detection ends,
> > > > > +if the first matching sequence it reached. This should be slightly faster.
> > > > > +Per default the detection is disabled.
> > > >
> > > > these shuld probably support named identifers not (only) 0/1/2
> > > done
> >
> > it should use AV_OPT_TYPE_INT and AV_OPT_TYPE_CONST not a string
> >
> >
> > >
> > > >
> > > >
> > > > > +
> > > > > + at item nb_inputs
> > > > > +Set the number of inputs. The option value must be a non negative interger.
> > > > > +Default value is 1.
> > > > > +
> > > > > + at item filename
> > > > > +Set the path to witch the output is written. If there is more than one input,
> > > > > +the path must be a prototype, i.e. must contain %d or %0nd (where n is a positive
> > > > > +integer), that will be replaced with the input number. If no filename is
> > > > > +specified, no output will be written. This is the default.
> > > > > +
> > > >
> > > > > + at item xml
> > > > > +Choose the output format. If set to 1 the filter will write XML, if set to 0
> > > > > +the filter will write binary output. The default is 0.
> > > >
> > > > format=xml/bin/whatever
> > > > seems better as its more extensible
> > > done
> > >
> > > >
> > > >
> > > > > +
> > > > > + at item th_d
> > > > > +Set threshold to detect one word as similar. The option value must be an integer
> > > > > +greater than zero. The default value is 9000.
> > > > > +
> > > > > + at item th_dc
> > > > > +Set threshold to detect all words as similar. The option value must be an integer
> > > > > +greater than zero. The default value is 60000.
> > > > > +
> > > > > + at item th_xh
> > > > > +Set threshold to detect frames as similar. The option value must be an integer
> > > > > +greater than zero. The default value is 116.
> > > > > +
> > > > > + at item th_di
> > > > > +Set the minimum length of a sequence in frames to recognize it as matching
> > > > > +sequence. The option value must be a non negative integer value.
> > > > > +The default value is 0.
> > > > > +
> > > > > + at item th_it
> > > > > +Set the minimum relation, that matching frames to all frames must have.
> > > > > +The option value must be a double value between 0 and 1. The default value is 0.5.
> > > > > + at end table
> > > > > +
> > > > > + at subsection Examples
> > > > > +
> > > > > + at itemize
> > > > > + at item
> > > > > +To calculate the signature of an input video and store it in signature.xml:
> > > > > + at example
> > > > > +ffmpeg -i input.mkv -vf signature=filename=signature.xml -map 0:v -c rawvideo -f null -
> > > > > + at end example
> > > >
> > > > the output seems to differ between 32 an 64bit x86
> > > > this would make any regression testing rather difficult
> > > > why is there a difference ? can this be avoided or would that result in
> > > > some disadvantage ?
> > > This is due to this line:
> > > sum -= ((double) blocksum)/(blocksize * denum);
> > >
> > > sum was a double. It seems the difference leads to different results in 32 and 64 bit
> > > (the 5 decimal place). I have reworked the filter part so it does not use double at all.
> > > This also leads in some fewer divisions, but the numbers get really big. The relevant
> > > parts use int63_t.
> > >
> > > If the videos gets really big, the numbers could overflow. Can I restrict this someway?
> > >
> > > An upper bound could be find with:
> > > 255 * BLOCK_LCM * (width/32+1)^2 * (height/32+1)^2 < 2^63
> > > I tested it with 4K (UHD) input. This does not give any problems, but it is near the limit.
> > > (As a note: Especially 4K is a certain amount under the limit, because the width 3840 is
> > > dividable by 32, so the square in the above formula could be deleted)
> > >
> > > The filter should generate the same signatures as in 64 bit before, now with 32 and 64 bit.
> >
> > if you really need more tha 64bit ints you can take a look at
> > libavutil/integer.h
> > it would be better if the operations can be reshuffled to keep using
> > intXY_t
> This depends, IMHO 4K UHD is enough for now, and given, that you can simply rescale a higher
> resolution to somewhat below, without changing the function of the signature, I would simply add
> a check in config_input or so, that throws an error, if the resolution is too high. Would this be ok?
not optimal but ok i guess
[...]
[...]
>
> >
> >
> > >
> > > Then I added a few TODOs in the code, was about parts I don't know. Would be nice,
> > > if you comment there, too.
> > >
> >
> > > I attached the new (complete) patch, the diff to the last time and the updated check script.
> >
> > looks like the old patch + diff to the new
> Yes. Thought you can see the differences to the already rewieved patch much faster.
the new patch + diff is better than the old + diff
for testing that is
>
> >
> > [...]
> > > +static int filter_frame(AVFilterLink *inlink, AVFrame *picref)
> > > +{
> > > + AVFilterContext *ctx = inlink->dst;
> > > + SignatureContext *sic = ctx->priv;
> > > + StreamContext *sc = &(sic->streamcontexts[FF_INLINK_IDX(inlink)]);
> > > + FineSignature* fs;
> > > +
> > > + static const uint8_t pot3[5] = { 3*3*3*3, 3*3*3, 3*3, 3, 1 };
> > > + /* indexes of words : 210,217,219,274,334 44,175,233,270,273 57,70,103,237,269 100,285,295,337,354 101,102,111,275,296
> > > + s2usw = sorted to unsorted wordvec: 44 is at index 5, 57 at index 10...
> > > + */
> > > + static const unsigned int wordvec[25] = {44,57,70,100,101,102,103,111,175,210,217,219,233,237,269,270,273,274,275,285,295,296,334,337,354};
> > > + static const uint8_t s2usw[25] = { 5,10,11, 15, 20, 21, 12, 22, 6, 0, 1, 2, 7, 13, 14, 8, 9, 3, 23, 16, 17, 24, 4, 18, 19};
> > > +
> > > + uint8_t wordt2b[5] = { 0, 0, 0, 0, 0 }; /* word ternary to binary */
> > > + uint64_t intpic[32][32];
> > > + uint64_t rowcount;
> > > + uint8_t *p = picref->data[0];
> > > + int inti, intj;
> > > + int *intjlut;
> > > +
> > > + double conflist[DIFFELEM_SIZE];
> > > + int f = 0, g = 0, w = 0;
> > > + int dh1 = 1, dh2 = 1, dw1 = 1, dw2 = 1, denum, a, b;
> > > + int i,j,k,ternary;
> > > + uint64_t blocksum;
> > > + int blocksize;
> > > + double th; /* threshold */
> > > + double sum;
> > > +
> > > + /* initialize fs */
> > > + if(sc->curfinesig){
> > > + fs = av_mallocz(sizeof(FineSignature));
> > > + if (!fs)
> > > + return AVERROR(ENOMEM);
> > > + sc->curfinesig->next = fs;
> > > + fs->prev = sc->curfinesig;
> > > + sc->curfinesig = fs;
> > > + }else{
> > > + fs = sc->curfinesig = sc->finesiglist;
> > > + sc->curcoursesig1->first = fs;
> > > + }
> > > +
> > > + fs->pts = picref->pts;
> > > + fs->index = sc->lastindex++;
> > > +
> > > + memset(intpic, 0, sizeof(uint64_t)*32*32);
> > > + intjlut = av_malloc(inlink->w * sizeof(int));
> > > + if (!intjlut)
> > > + return AVERROR(ENOMEM);
> > > + for (i=0; i < inlink->w; i++){
> > > + intjlut[i] = (i<<5)/inlink->w;
> > > + }
> > > +
> > > + for (i=0; i < inlink->h; i++){
> > > + inti = (i<<5)/inlink->h;
> > > + for (j=0; j< inlink->w; j++){
> > > + intj = intjlut[j];
> > > + intpic[inti][intj] += p[j];
> > > + }
> > > + p += picref->linesize[0];
> > > + }
> > > + av_free(intjlut);
> > > +
> > > + /* The following calculate a summed area table (intpic) and brings the numbers
> > > + * in intpic to to the same denuminator.
> > > + * So you only have to handle the numinator in the following sections.
> > > + */
> > > + dh1 = inlink->h/32;
> > > + if (inlink->h%32)
> > > + dh2 = dh1 + 1;
> > > + dw1 = inlink->w/32;
> > > + if (inlink->w%32)
> > > + dw2 = dw1 + 1;
> >
> > > + denum = dh1 * dh2 * dw1 * dw2;
> >
> > this will overflow if w and h are not multiplies of 32 and large
> > the multiplication is done in 32bit not 64
> Don't get it. All of this are 32 bit integer. Given the input is:
> 3842x2160 (nearly 4K), this would lead in a denum of:
> 120 * 121 * 67 * 68 = 66153120
>
> This is far below the 32 bit maximum.
it will overflow with higher resolution
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
I do not agree with what you have to say, but I'll defend to the death your
right to say it. -- Voltaire
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20160411/7f842623/attachment.sig>
More information about the ffmpeg-devel
mailing list