[FFmpeg-devel] [PATCH] add signature filter for MPEG7 video signature

Tue Mar 22 02:20:24 CET 2016

On Montag, 21. März 2016 15:00:52 CET Thilo Borgmann wrote:
> Am 21.03.16 um 14:15 schrieb Gerion Entrup:
> > On Montag, 21. März 2016 11:53:27 CET Thilo Borgmann wrote:
> >> Am 21.03.16 um 00:14 schrieb Gerion Entrup:
> >>> On Sonntag, 20. März 2016 17:01:17 CET Thilo Borgmann wrote:
> >>>>> On Sun, Mar 20, 2016 at 12:00:13PM +0100, Gerion Entrup wrote:
> >>>> [...]
> >>>> 
> >>>>> This filter does not implement all features of MPEG7. Missing
> >>>>> features:
> >>>>> 
> >>>>> - binary output
> >>>>> - compression of signature files
> >>>> 
> >>>> I assume these features are optional?
> >>> 
> >>> Compression is optional (could be set as flag in the binary
> >>> representation). I have not found, whether binary output is optional.
> >>> 
> >>> It is definitely possible to only work with the XML-Files.
> >> 
> >> Of course, but having an unspecified XML output is almost useless if
> >> binary
> >> output is not optional. So I think it is crucial to know what the spec
> >> says
> >> about output.
> > 
> > The spec defines the XML output my filter do atm and specifies a binary
> > output additional.
> > 
> >>>>> - work only on (cropped) parts of the video
> >>>> 
> >>>> How useful is this then? Has fingerprint computed only on (cropped)
> >>>> parts
> >>>> of the video any value outside of FFmpeg itself - does this comply to
> >>>> the spec so that it can be compared with any other software generating
> >>>> it?
> >>> 
> >>> To clarify, the filter does not crop anything. The standard defines an
> >>> optional cropping to, I guess, concentrate on specific video parts (this
> >>> is not implemented). Assuming someone is recording a monitor, then e.g.
> >>> the unrelated part of the video could be cropped out. Beside that, the
> >>> signature itself is invariant to cropping up to a certain limit.
> >>> 
> >>> The cropping values (upper left and bottom right position are specified
> >>> in
> >>> the xml, so another software could either crop the same way or compare
> >>> only with the cropped input.
> >>> (The fact, that ffmpeg has a cropping filter, would make such a feature
> >>> some kind of redundant.)
> >> 
> >> If I understand it correctly, the filter should not crop the image but
> >> only
> >> use the pixel information within the specified area or the whole image.
> >> Making it a filter option is useful, because the fingerprint of a part of
> >> the image can be used in a filter chain continuing with the entire image
> >> (no actual crop is required).
> > 
> > Of course it would be great, if the filter would support it. It needs some
> > modification to the summed area table. Once the binary output is ready, I
> > will try to do it.
> > 
> >>> The XML is standard compliant.
> >> 
> >> So XML output is compliant to the spec? Or is the XML itself just valid
> >> XML?> 
> > The XML output is compliant to the spec. The whole format is specified
> > there.> 
> >>> The signature is not bitexact. 3-4 (ternary)
> >>> values in the framesignature differ from the signature of the sample
> >>> files, but the conformence tests [1] allow up to 15 ternaryerrors.
> >> 
> >> Bitexact compared to what?
> > 
> > The institute, where I write the filter, owns the sample files mentioned
> > in the doc together with the correspondent binary and XML signatures (so
> > I could compare it).
> > 
> >> Does it allow up to 15 ternary errors for assume two inputs are equal
> >> enough to be the same image or does it state that the fingerprint itself
> >> may differ for 15 ternary errors for the very same image?
> > 
> > The 15 tenary errors are valid between the sample signature and the
> > reimplementation for the same sample.
> > 
> > But ok, you seem to be right, the binary representation seems to be
> > necessary. I quote (out of the linked doc):
> > 
> > "The number of dimensions whose ternary values differ between the test and
> > the reference video signatures shall be less than or equal to 15 out of
> > 380, if the FrameConfidence values of both the test and the reference
> > video signatures are greater than or equal to 4. The ternary values of
> > the frame signature shall be decoded from the binary representation
> > according to Table E.1."
> This part of the spec seems to be dealing with comparing a reference image
> with a test image. These two images differ, and how I read it, the spec
> says they are identified as the same image if (FrameConfidence >= 4 &&
> ndiff(sig_ref, sig_test) <= 15).
This part comparing a reference signature with a test signature. They are
not identified as same images. A test signature is declared as conform, if
it fulfills the condition.

(To compare two images with the signature technique and declare them as
matching, a lot of more parts of the signature could differ.)
> 
> That is about identifying images.
> That is not about calculation of the signature.
> 
> So again: Does your fingerprint filter produces the exact same fingerprint
> for _the exact same image_ like the reference software?
> (Input reference image -> filter -> exact same fingerprint like in reference
> xml?)
No.

> 
> If not, the difference has to be understood and your filter has to be
> updated to match the reference fingerprint.
No. The conformance test says:
"In order for the video signature extractor being tested to pass the
conformance test, the FrameSignature and the FrameConfidence for 99.99% or
more of all frames from the specified video set, i.e. 162837 frames or
more out of 162853 frames, shall satisfy the following two conditions."
One of the condition is the above one. I have tested my signature against
the sample files and all is bitexact except 3-4 ternary errors in some
frames. So the condition "The number of dimensions whose ternary values
differ between the test and the reference video signatures shall be less
than or equal to 15 out of 380" is fulfilled. I hope, I get the samples
soon, so I can proof it.

Gerion