[FFmpeg-devel] A few filter questions
Gerion Entrup
gerion.entrup at t-online.de
Thu Jul 17 16:56:08 CEST 2014
Am Donnerstag 17 Juli 2014, 13:00:13 schrieb Clément Bœsch:
> On Thu, Jul 17, 2014 at 12:33:41PM +0200, Gerion Entrup wrote:
> > Good day,
> >
> > I'm currently working on a video signature filter for ffmpeg. This allows
> > you to fingerprint videos.
>
> Oh, nice.
>
> > This fingerprint is built up of 9mb/s of bits or 2-3 mb/s bits compressed.
Argh, fail, sorry. I meant: 9mb per hour of video (and 2-3 mb per hour).
> >
> > In this context a few questions come into my mind:
> > - Should I print this whole bitstream to stdout/stderr at the end? Is it
> > maybe a better choice to made an own stream out of this. But which type
> > of stream this is?
>
> How does the fingerprint looks like? Could it make sense as a gray video
> output fractal, or maybe some kind of audio signal?
There a finesignatures per frame and coursesignatures per 90 finesignatures.
coursesignature are binarized histograms (0 or 1 possible as count).
finesignature is mainly a vector of 380 difference values between -128 and 127
which are ternarized into 0 1 or 2.
(See the MPEG-7 Standard for more details).
I doubt, this is a good video or audio stream.
Definitely, interpreting this as video make sense in some way, but metadata
looks more useful.
>
> Also, you still have the string metadata possibility (git grep SET_META
> libavfilter).
Hmm, thank you, I will take a look at it. If I see it right, it is used to fill
a dictionary per frame with some kind of data?
>
> > (btw, the video signature algorithm needs 90 following frames, so I can
> >
> > theoretically write every 90 frames something somewhere.)
>
> Do you cache all these frames or just update your caches/stats & drop
> them?
ATM I don't cache the frames, but the whole signature. As said above, the
coursesignatures (the part, which needs the 90 frames) is calculated only from
the finesignatures (the finesignatures are cached, anyway).
>
> > - If I print the whole bitstream to stdout/stderr (my current
> > implementation), is there a possibility to use this later in an external
> > program? The only other globally analyze filter I found is volumedetect.
> > This filter at the end prints per print_stats the calculated results to
> > the console. Is there a possibility within the API for an external
> > program to use these values or do I have to grep the output?
>
> stdout/stderr really isn't a good thing. Using metadata is way better
> because you can output them from ffprobe, and parse them according to
> various outputs (XML, CSV, JSON, ...).
Sounds good…
>
> Another solution I can now think of is to simply pass an output file as
> option to the filter. That's typically how we do the 2-pass thing with
> vidstab filter.
I don't like output files. If you want to write a program, that performs a
lookup to signatures somewhere stored in a database and this program uses
ffmpeg internally and then always has to write a file and read it again, it's
not that elegant.
(btw, an example for such a program is Musicbrainz Picard, but for AcousticID
;))
>
> [...]
>
> > Another thing that came into my mind: Can filter force other filters to go
> > into the filterchain? I see it, when I force GREY_8 only in my filter, it
> > automatically enables the scale filter, too.
>
> Some filter are inserted automatically for conversion & constraints, but
> that's not decided by the filters but the framework itself.
>
> > The reason I asked is the
> > lookup
> >
> > for my filter. Currently my filter analyzes a video and then produces a
> > lot of numbers. To compare two videos and decide, wheather they match or
> > not, these numbers has to be compared. I see three possibilities:
> > 1. Write an VV->V filter. Reimplement (copy) the code from the V->V
> > signature filter and give a boolean as output (match or match not).
> > 2. Take the V->V filter and write a python (or whatever) script that fetch
> > the output and calculates then the rest.
> > 3. Write an VV->V filter, but enforce, that the normal signature filter is
> > executed first to both streams, use the result and then calculate the
> > matching type. Unfortunately I have no idea, how to do this and whether
> > it is possible at all. Can you give me an advice?
>
> So if you output a file in the filter itself:
> ffmpeg -i video -vf fingerprint=video.sig -f null -
> ffmpeg -i another -vf fingerprint=video.sig:check=1 -f null -
>
> Or if you save the signature "stream" in a video (in gray8 for instance):
> ffmpeg -i video -vf fingerprint -c:v ffv1 sig.nut
> ffmpeg -i another -i sig.nut -vf '[0][1] fingerprint=mode=check' -f null -
>
> The 2nd method is "better" because it doesn't require file handling in the
> library, and it also allows stuff like using a diff filter (if you also
> apply fingerprint - not with mode=check - on `another`)
>
> Am I understanding right your wondering?
No ;), but anyway thanks for your answer. In your 2nd method your filter is a
VV->V filter? Am I right, that this filter then also can take only one stream?
Said in another way: Can a VV->V filter also behave as a V->V filter?
My original thinking was something like (see it monospaced):
in1------>fingerprint1---.
|----> fingerprintcombo ---> out
in2------>fingerprint2---`
fingerprintcombo could anyhow force the framework to insert fingerprint1 and 2
in the filterchain, then uses its output to calculate the matching.
Your second proposal is better :) (if it works as V->V, too).
>
> > The last possibility also would allow something like twopass volume
> > normalisation. Currently there is a volumedetect and volume filter. To
> > normalize once could run volumedetect, then fetch the output, and put the
> > values into the volume filter, but I currently don't see a way to do this
> > automatically directly in ffmpeg.
>
> Check tools/normalize.py, it's using ebur128 and the metadata system.
Thats what I mean. Someone has to write an external script, which calls
ffmpeg/ffprobe two times, parse stdout of the first call and pass it to the
filteroptions of the second call. As I see, there is no direct way. Something
like:
ffmpeg -i foo -f:a volume=mode=autodetect normalized.opus
Internal:
volume filter recognizes: I need a value from ebur128, volumedetect etc. and
says the framework: I cannot work, need input from ...
the framework inserts volumedetect and says: here is your input, do what you
want
volumedetect says volume: here is my output, do your work
volume says framework: I could work now, but needed the first sample again.
The normalize script e.g. has the disadvantage to take only one soundstream
(if I see it correct).
Anyway, thank you for your answers, helped a lot by now.
>
> > (Once the filter is in a good state, I will try to bring it upstream.)
>
> Cool
>
> > Best,
> > Gerion
More information about the ffmpeg-devel
mailing list