[FFmpeg-devel] [PATCHv2] add signature filter for MPEG7 video signature

Thu Apr 14 19:06:29 CEST 2016

On Montag, 11. April 2016 14:54:57 CEST Michael Niedermayer wrote:
> On Mon, Apr 11, 2016 at 02:30:37PM +0200, Gerion Entrup wrote:
> > On Montag, 11. April 2016 12:57:17 CEST Michael Niedermayer wrote:
> > > On Mon, Apr 11, 2016 at 04:25:28AM +0200, Gerion Entrup wrote:
> > > > On Donnerstag, 7. April 2016 00:35:25 CEST Michael Niedermayer wrote:
> > > > > On Wed, Mar 30, 2016 at 11:02:36PM +0200, Gerion Entrup wrote:
> > > > > > On Mittwoch, 30. März 2016 22:57:47 CEST Gerion Entrup wrote:
> > > > > > > Add improved patch.
> > > > > > 
> > > > > > Rebased to master.
> > > > > > 
> > > > > >  Changelog                      |    1
> > > > > >  configure                      |    1
> > > > > >  doc/filters.texi               |   70 +++
> > > > > >  libavfilter/Makefile           |    1
> > > > > >  libavfilter/allfilters.c       |    1
> > > > > >  libavfilter/signature.h        |  554
> > > > > >  ++++++++++++++++++++++++++++++
> > > > > >  libavfilter/signature_lookup.c |  550
> > > > > >  ++++++++++++++++++++++++++++++
> > > > > >  libavfilter/version.h          |    4
> > > > > >  libavfilter/vf_signature.c     |  741
> > > > > >  +++++++++++++++++++++++++++++++++++++++++ 9 files changed, 1921
> > > > > >  insertions(+), 2 deletions(-)
> > > > > > 
> > > > > > 9192f27ded45c607996b4e266b6746f807c9a7fd 
> > > > > > 0001-add-signature-filter-for-MPEG7-video-signature.patch From
> > > > > > 9646ed6f0cf78356cf2914a60705c98d8f21fe8a Mon Sep 17 00:00:00 2001
> > > > > > From: Gerion Entrup <gerion.entrup at flump.de>
> > > > > > Date: Sun, 20 Mar 2016 11:10:31 +0100
> > > > > > Subject: [PATCH] add signature filter for MPEG7 video signature
> > > > > > 
> > > > > > This filter does not implement all features of MPEG7. Missing
> > > > > > features:
> > > > > > - compression of signature files
> > > > > > - work only on (cropped) parts of the video
> > > > > > ---
> > > > > > 
> > > > > >  Changelog                      |   1 +
> > > > > >  configure                      |   1 +
> > > > > >  doc/filters.texi               |  70 ++++
> > > > > >  libavfilter/Makefile           |   1 +
> > > > > >  libavfilter/allfilters.c       |   1 +
> > > > > >  libavfilter/signature.h        | 554
> > > > > >  ++++++++++++++++++++++++++++++
> > > > > >  libavfilter/signature_lookup.c | 550
> > > > > >  ++++++++++++++++++++++++++++++
> > > > > >  libavfilter/version.h          |   4 +-
> > > > > >  libavfilter/vf_signature.c     | 741
> > > > > >  +++++++++++++++++++++++++++++++++++++++++ 9 files changed, 1921
> > > > > >  insertions(+), 2 deletions(-)
> > > > > >  create mode 100644 libavfilter/signature.h
> > > > > >  create mode 100644 libavfilter/signature_lookup.c
> > > > > >  create mode 100644 libavfilter/vf_signature.c
> > > > > > 
> > > > > > diff --git a/Changelog b/Changelog
> > > > > > index 7b0187d..8a2b7fd 100644
> > > > > > --- a/Changelog
> > > > > > +++ b/Changelog
> > > > > > 
> > > > > > @@ -18,6 +18,7 @@ version <next>:
> > > > > >  - coreimage filter (GPU based image filtering on OSX)
> > > > > >  - libdcadec removed
> > > > > >  - bitstream filter for extracting DTS core
> > > > > > 
> > > > > > +- MPEG-7 Video Signature filter
> > > > > > 
> > > > > >  version 3.0:
> > > > > >  - Common Encryption (CENC) MP4 encoding and decoding support
> > > > > > 
> > > > > > diff --git a/configure b/configure
> > > > > > index e550547..fe29827 100755
> > > > > > --- a/configure
> > > > > > +++ b/configure
> > > > > > @@ -2979,6 +2979,7 @@ showspectrum_filter_deps="avcodec"
> > > > > > 
> > > > > >  showspectrum_filter_select="fft"
> > > > > >  showspectrumpic_filter_deps="avcodec"
> > > > > >  showspectrumpic_filter_select="fft"
> > > > > > 
> > > > > > +signature_filter_deps="gpl avcodec avformat"
> > > > > > 
> > > > > >  smartblur_filter_deps="gpl swscale"
> > > > > >  sofalizer_filter_deps="netcdf avcodec"
> > > > > >  sofalizer_filter_select="fft"
> > > > > > 
> > > > > > diff --git a/doc/filters.texi b/doc/filters.texi
> > > > > > index 5d6cf52..a95f5a7 100644
> > > > > > --- a/doc/filters.texi
> > > > > > +++ b/doc/filters.texi
> > > > > > @@ -11559,6 +11559,76 @@ saturation maximum:
> > > > > > %@{metadata:lavfi.signalstats.SATMAX@}> > > > > 
> > > > > >  @end example
> > > > > >  @end itemize
> > > > > > 
> > > > > > + at anchor{signature}
> > > > > > + at section signature
> > > > > > +
> > > > > > +Calculates the MPEG-7 Video Signature. The filter could handle
> > > > > > more than one +input. In this case the matching between the
> > > > > > inputs could be calculated. The +filter passthrough the first
> > > > > > input. The output is written in XML. +
> > > > > > +It accepts the following options:
> > > > > > +
> > > > > > + at table @option
> > > > > > + at item mode
> > > > > > 
> > > > > > +Enable the calculation of the matching. The option value must be
> > > > > > 0 (to disable +or 1 (to enable). Optionally you can set the mode
> > > > > > to 2. Then the detection ends, +if the first matching sequence it
> > > > > > reached. This should be slightly faster. +Per default the
> > > > > > detection is disabled.
> > > > > 
> > > > > these shuld probably support named identifers not (only) 0/1/2
> > > > 
> > > > done
> > > 
> > > it should use AV_OPT_TYPE_INT and AV_OPT_TYPE_CONST not a string
> > > 
> > > > > > +
> > > > > > + at item nb_inputs
> > > > > > +Set the number of inputs. The option value must be a non negative
> > > > > > interger. +Default value is 1.
> > > > > > +
> > > > > > + at item filename
> > > > > > +Set the path to witch the output is written. If there is more
> > > > > > than one input, +the path must be a prototype, i.e. must contain
> > > > > > %d or %0nd (where n is a positive +integer), that will be
> > > > > > replaced with the input number. If no filename is +specified, no
> > > > > > output will be written. This is the default.
> > > > > > +
> > > > > > 
> > > > > > + at item xml
> > > > > > +Choose the output format. If set to 1 the filter will write XML,
> > > > > > if set to 0 +the filter will write binary output. The default is
> > > > > > 0.
> > > > > 
> > > > > format=xml/bin/whatever
> > > > > seems better as its more extensible
> > > > 
> > > > done
> > > > 
> > > > > > +
> > > > > > + at item th_d
> > > > > > +Set threshold to detect one word as similar. The option value
> > > > > > must be an integer +greater than zero. The default value is 9000.
> > > > > > +
> > > > > > + at item th_dc
> > > > > > +Set threshold to detect all words as similar. The option value
> > > > > > must be an integer +greater than zero. The default value is
> > > > > > 60000.
> > > > > > +
> > > > > > + at item th_xh
> > > > > > +Set threshold to detect frames as similar. The option value must
> > > > > > be an integer +greater than zero. The default value is 116.
> > > > > > +
> > > > > > + at item th_di
> > > > > > +Set the minimum length of a sequence in frames to recognize it as
> > > > > > matching
> > > > > > +sequence. The option value must be a non negative integer value.
> > > > > > +The default value is 0.
> > > > > > +
> > > > > > + at item th_it
> > > > > > +Set the minimum relation, that matching frames to all frames must
> > > > > > have.
> > > > > > +The option value must be a double value between 0 and 1. The
> > > > > > default value is 0.5. + at end table
> > > > > > +
> > > > > > + at subsection Examples
> > > > > > +
> > > > > > + at itemize
> > > > > > + at item
> > > > > > +To calculate the signature of an input video and store it in
> > > > > > signature.xml: + at example
> > > > > > +ffmpeg -i input.mkv -vf signature=filename=signature.xml -map 0:v
> > > > > > -c rawvideo -f null - + at end example
> > > > > 
> > > > > the output seems to differ between 32 an 64bit x86
> > > > > this would make any regression testing rather difficult
> > > > > why is there a difference ? can this be avoided or would that result
> > > > > in
> > > > > some disadvantage ?
> > > > 
> > > > This is due to this line:
> > > > sum -= ((double) blocksum)/(blocksize * denum);
> > > > 
> > > > sum was a double. It seems the difference leads to different results
> > > > in 32 and 64 bit (the 5 decimal place). I have reworked the filter
> > > > part so it does not use double at all. This also leads in some fewer
> > > > divisions, but the numbers get really big. The relevant parts use
> > > > int63_t.
> > > > 
> > > > If the videos gets really big, the numbers could overflow. Can I
> > > > restrict this someway?
> > > > 
> > > > An upper bound could be find with:
> > > > 255 * BLOCK_LCM * (width/32+1)^2 * (height/32+1)^2 < 2^63
> > > > I tested it with 4K (UHD) input. This does not give any problems, but
> > > > it is near the limit. (As a note: Especially 4K is a certain amount
> > > > under the limit, because the width 3840 is dividable by 32, so the
> > > > square in the above formula could be deleted)
> > > > 
> > > > The filter should generate the same signatures as in 64 bit before,
> > > > now with 32 and 64 bit.> > 
> > > if you really need more tha 64bit ints you can take a look at
> > > libavutil/integer.h
> > > it would be better if the operations can be reshuffled to keep using
> > > intXY_t
> > 
> > This depends, IMHO 4K UHD is enough for now, and given, that you can
> > simply rescale a higher resolution to somewhat below, without changing
> > the function of the signature, I would simply add a check in config_input
> > or so, that throws an error, if the resolution is too high. Would this be
> > ok?
> not optimal but ok i guess
Because of this don't essentially change to function of the filter, what do you
thing about this approach?:
Check in config_input for a potentially overflow. If so, display a warning but
don't quit. Instead divide all by 16 (or another meaningful power of 2).
So you loose precision, but the filter keeps working and this won't affect the
signature that much.

> 
> [...]
> [...]
> 
> > > > Then I added a few TODOs in the code, was about parts I don't know.
> > > > Would be nice, if you comment there, too.
> > > > 
> > > > 
> > > > I attached the new (complete) patch, the diff to the last time and the
> > > > updated check script.> > 
> > > looks like the old patch + diff to the new
> > 
> > Yes. Thought you can see the differences to the already rewieved patch
> > much faster.
> the new patch + diff is better than the old + diff
> for testing that is
Sorry this was a mistake. Was meant exactly so.

> 
> > > [...]
> > > 
> > > > +static int filter_frame(AVFilterLink *inlink, AVFrame *picref)
> > > > +{
> > > > +    AVFilterContext *ctx = inlink->dst;
> > > > +    SignatureContext *sic = ctx->priv;
> > > > +    StreamContext *sc =
> > > > &(sic->streamcontexts[FF_INLINK_IDX(inlink)]);
> > > > +    FineSignature* fs;
> > > > +
> > > > +    static const uint8_t pot3[5] = { 3*3*3*3, 3*3*3, 3*3, 3, 1 };
> > > > +    /* indexes of words : 210,217,219,274,334  44,175,233,270,273 
> > > > 57,70,103,237,269  100,285,295,337,354  101,102,111,275,296 +   
> > > > s2usw = sorted to unsorted wordvec: 44 is at index 5, 57 at index
> > > > 10... +    */
> > > > +    static const unsigned int wordvec[25] =
> > > > {44,57,70,100,101,102,103,111,175,210,217,219,233,237,269,270,273,274
> > > > ,275,285,295,296,334,337,354}; +    static const uint8_t s2usw[25]   =
> > > > { 5,10,11, 15, 20, 21, 12, 22,  6,  0,  1,  2,  7, 13, 14,  8,  9, 
> > > > 3, 23, 16, 17, 24,  4, 18, 19}; +
> > > > +    uint8_t wordt2b[5] = { 0, 0, 0, 0, 0 }; /* word ternary to binary
> > > > */
> > > > +    uint64_t intpic[32][32];
> > > > +    uint64_t rowcount;
> > > > +    uint8_t *p = picref->data[0];
> > > > +    int inti, intj;
> > > > +    int *intjlut;
> > > > +
> > > > +    double conflist[DIFFELEM_SIZE];
> > > > +    int f = 0, g = 0, w = 0;
> > > > +    int dh1 = 1, dh2 = 1, dw1 = 1, dw2 = 1, denum, a, b;
> > > > +    int i,j,k,ternary;
> > > > +    uint64_t blocksum;
> > > > +    int blocksize;
> > > > +    double th; /* threshold */
> > > > +    double sum;
> > > > +
> > > > +    /* initialize fs */
> > > > +    if(sc->curfinesig){
> > > > +        fs = av_mallocz(sizeof(FineSignature));
> > > > +        if (!fs)
> > > > +            return AVERROR(ENOMEM);
> > > > +        sc->curfinesig->next = fs;
> > > > +        fs->prev = sc->curfinesig;
> > > > +        sc->curfinesig = fs;
> > > > +    }else{
> > > > +        fs = sc->curfinesig = sc->finesiglist;
> > > > +        sc->curcoursesig1->first = fs;
> > > > +    }
> > > > +
> > > > +    fs->pts = picref->pts;
> > > > +    fs->index = sc->lastindex++;
> > > > +
> > > > +    memset(intpic, 0, sizeof(uint64_t)*32*32);
> > > > +    intjlut = av_malloc(inlink->w * sizeof(int));
> > > > +    if (!intjlut)
> > > > +        return AVERROR(ENOMEM);
> > > > +    for (i=0; i < inlink->w; i++){
> > > > +        intjlut[i] = (i<<5)/inlink->w;
> > > > +    }
> > > > +
> > > > +    for (i=0; i < inlink->h; i++){
> > > > +        inti = (i<<5)/inlink->h;
> > > > +        for (j=0; j< inlink->w; j++){
> > > > +            intj = intjlut[j];
> > > > +            intpic[inti][intj] += p[j];
> > > > +        }
> > > > +        p += picref->linesize[0];
> > > > +    }
> > > > +    av_free(intjlut);
> > > > +
> > > > +    /* The following calculate a summed area table (intpic) and
> > > > brings the numbers +     * in intpic to to the same denuminator.
> > > > +     * So you only have to handle the numinator in the following
> > > > sections.
> > > > +     */
> > > > +    dh1 = inlink->h/32;
> > > > +    if (inlink->h%32)
> > > > +        dh2 = dh1 + 1;
> > > > +    dw1 = inlink->w/32;
> > > > +    if (inlink->w%32)
> > > > +        dw2 = dw1 + 1;
> > > > 
> > > > +    denum = dh1 * dh2 * dw1 * dw2;
> > > 
> > > this will overflow if w and h are not multiplies of 32 and large
> > > the multiplication is done in 32bit not 64
> > 
> > Don't get it. All of this are 32 bit integer. Given the input is:
> > 3842x2160 (nearly 4K), this would lead in a denum of:
> > 120 * 121 * 67 * 68 = 66153120
> > 
> > This is far below the 32 bit maximum.
> 
> it will overflow with higher resolution
With the above solution this will devided as well.