[FFmpeg-devel] [PATCH] lavfi: add thumb video filter.

Tue Dec 20 15:11:13 CET 2011

On date Tuesday 2011-12-20 10:30:06 +0100, Clément Bœsch encoded:
> On Tue, Dec 06, 2011 at 03:34:07AM +0100, Michael Niedermayer wrote:
> > On Mon, Dec 05, 2011 at 05:41:10PM +0100, Clément Bœsch wrote:
> > [...]
> > > +AVFilter avfilter_vf_thumbnail = {
> > > +    .name          = "thumbnail",
> > > +    .description   = NULL_IF_CONFIG_SMALL("Thumbnail selection filter"),
> > > +    .priv_size     = sizeof(ThumbContext),
> > > +    .init          = init,
> > > +    .uninit        = uninit,
> > > +    .query_formats = query_formats,
> > > +    .inputs        = (const AVFilterPad[]) {
> > > +        {   .name             = "default",
> > > +            .type             = AVMEDIA_TYPE_VIDEO,
> > > +            .get_video_buffer = avfilter_null_get_video_buffer,
> > > +            .start_frame      = null_start_frame,
> > > +            .draw_slice       = draw_slice,
> > > +            .end_frame        = end_frame,
> > > +        },{ .name = NULL }
> > > +    },
> > > +    .outputs       = (const AVFilterPad[]) {
> > > +        {   .name             = "default",
> > > +            .type             = AVMEDIA_TYPE_VIDEO,
> > > +            .rej_perms        = AV_PERM_REUSE2,
> > > +        },{ .name = NULL }
> > 
> > you need to implement request and poll_frame() as the defaults are
> > wrong for this filter
> > 
> > request should call the sources request in a loop until a frame is
> > output from thumbnail
> > 
> > poll_frame() is a bit more tricky, if the sources filters poll returns
> > 0, it should return 0 too
> > otherwise it has to call request frame from the source filter until
> > either its poll_frame returns 0 or the next input frame would cause a
> > frame to be output in which case it should return 1
> > see vf_yadif
> > 
> 
> OK, new patch attached.
> 
> -- 
> Clément B.

> From 81aa7d67a077b9a874e43925a449f0787a33c4ec Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?Cl=C3=A9ment=20B=C5=93sch?= <clement.boesch at smartjog.com>
> Date: Mon, 24 Oct 2011 17:11:10 +0200
> Subject: [PATCH] lavfi: add thumbnail video filter.
> 
> ---
>  Changelog                  |    1 +
>  doc/filters.texi           |   12 ++
>  libavfilter/Makefile       |    1 +
>  libavfilter/allfilters.c   |    1 +
>  libavfilter/avfilter.h     |    2 +-
>  libavfilter/vf_thumbnail.c |  242 ++++++++++++++++++++++++++++++++++++++++++++
>  6 files changed, 258 insertions(+), 1 deletions(-)
>  create mode 100644 libavfilter/vf_thumbnail.c
> 
> diff --git a/Changelog b/Changelog
> index ad7fa8d..590752b 100644
> --- a/Changelog
> +++ b/Changelog
> @@ -139,6 +139,7 @@ easier to use. The changes are:
>  - SBaGen (SBG) binaural beats script demuxer
>  - OpenMG Audio muxer
>  - Simple segmenting muxer
> +- Thumbnails support (see thumbnail video filter)
>  
>  
>  version 0.8:
> diff --git a/doc/filters.texi b/doc/filters.texi
> index 699e0c1..3f50ebf 100644
> --- a/doc/filters.texi
> +++ b/doc/filters.texi
> @@ -2400,6 +2400,18 @@ For example:
>  will create two separate outputs from the same input, one cropped and
>  one padded.
>  
> + at section thumbnail
> +Select potential thumbnail frames.

"thumbnail" is not a very clear term, maybe you could say "the most
representative frame in a given sequence of consecutive frames".

> +
> +It accepts as argument the threshold of frames to analyze (default is 100). The
> +filter will pick one of these frames.

Please give an explanation of the meaning of threshold, this is not
clear at all from this description.

>From my reading of the code this filter reads N frames, and outputs
the one whose histogram is nighest to the global average
histogram. Maybe "batch_size" or "nb_frames" may be less confusing.

> A bigger value will result in a slower
> +analysis and higher memory usage, but is likely to be more efficient.

Again, if I understand the code the specified N will affect the number
of output frames, because it changes the number of frames in each
analyzed sequence, thus this statement looks quite misleading.

> +
> +Example of thumbnail creation:
> + at example
> +ffmpeg -i in.avi -vf thumbnail,scale=300:200 -frames:v 1 out.png
> + at end example

A pure libavfilter example also may be useful.

[...]
> diff --git a/libavfilter/vf_thumbnail.c b/libavfilter/vf_thumbnail.c
> new file mode 100644
> index 0000000..9be871e
> --- /dev/null
> +++ b/libavfilter/vf_thumbnail.c
> @@ -0,0 +1,242 @@
[...]
> +/**
> + * @file
> + * Potential thumbnail lookup filter to reduce the risk of an inappropriate
> + * selection (such as a black frame) we could get with an absolute seek.
> + *
> + * Algorithm by Vadim Zaliva <lord at crocodile.org>.
> + * @url http://notbrainsurgery.livejournal.com/29773.html
> + */
> +

> +#include <math.h>
> +#include "libavcodec/avcodec.h"

why these?

> +#include "libavutil/imgutils.h"
> +#include "libavutil/internal.h"
> +#include "libswscale/swscale.h"
> +#include "avfilter.h"
> +

> +#define HIST_SZ (3*256)

Nit++: possibly HISTOGRAM_SIZE, or at least HIST_SIZE

> +#define DEF_FRAMES_THRESHOLD 100
> +
> +struct thumb_frame {
> +    AVFilterBufferRef *buf;     ///< cached frame
> +    int histogram[HIST_SZ];     ///< RGB color distribution histogram of the frame
> +};
> +
> +typedef struct {
> +    int n;                      ///< current frame
> +    int n_frames;               ///< threshold of frames for analysis
> +    struct thumb_frame *frames; ///< the n_frames frames
> +} ThumbContext;
> +
> +static av_cold int init(AVFilterContext *ctx, const char *args, void *opaque)
> +{
> +    ThumbContext *thumb = ctx->priv;
> +
> +    if (args)
> +        thumb->n_frames = strtol(args, NULL, 10);
> +    if (thumb->n_frames < 2) {
> +        if (args)

> +            av_log(ctx, AV_LOG_WARNING,
> +                   "Invalid frame threshold specified, fallback to "
> +                   AV_STRINGIFY(DEF_FRAMES_THRESHOLD) "\n");

uh? why not a simple %d?

> +        thumb->n_frames = DEF_FRAMES_THRESHOLD;
> +    }
> +    thumb->frames = av_calloc(thumb->n_frames, sizeof(*thumb->frames));
> +    if (!thumb->frames) {
> +        av_log(ctx, AV_LOG_ERROR,
> +               "Allocation failure, try to lower the frames threshold\n");
> +        return AVERROR(ENOMEM);
> +    }

> +    av_log(ctx, AV_LOG_INFO, "Select thumbnail with threshold of %d frames\n",
> +           thumb->n_frames);

simpler/less cluttered: threshold:%d

> +    return 0;
> +}
> +
> +static void draw_slice(AVFilterLink *inlink, int y, int h, int slice_dir)
> +{
> +    int i, j;
> +    AVFilterContext *ctx = inlink->dst;
> +    ThumbContext *thumb = ctx->priv;
> +    int *hist = thumb->frames[thumb->n].histogram;
> +    AVFilterBufferRef *picref = inlink->cur_buf;
> +    const uint8_t *p = picref->data[0] + y * picref->linesize[0];
> +
> +    // update current frame RGB histogram
> +    for (j = 0; j < h; j++) {
> +        for (i = 0; i < inlink->w; i++) {
> +            hist[0*256 + p[i*3    ]]++;
> +            hist[1*256 + p[i*3 + 1]]++;
> +            hist[2*256 + p[i*3 + 2]]++;
> +        }
> +        p += picref->linesize[0];
> +    }
> +}
> +
> +/**
> + * @brief        compute Root-mean-square deviation to estimate "closeness"
> + * @param hist   color distribution histogram
> + * @param median average color distribution histogram
> + * @return       root mean squared error
> + */

> +static float frame_rmse(const int *hist, const float *median)
> +{
> +    int i;
> +    float err, mean_sq_err = 0;
> +    for (i = 0; i < HIST_SZ; i++) {
> +        err = median[i] - (float)hist[i];
> +        mean_sq_err += err*err / HIST_SZ;
> +    }

you can factor out the division, and gain speed (and precision)

> +    return sqrtf(mean_sq_err);
> +}
> +
> +static void end_frame(AVFilterLink *inlink)
> +{
> +    int i, j, best_frame = 0;

> +    float avg[HIST_SZ] = {0}, rmse, min_rmse = -1;

avg -> please more meaningful name, or use it in a local scope

> +    AVFilterLink *outlink = inlink->dst->outputs[0];
> +    ThumbContext *thumb   = inlink->dst->priv;
> +    AVFilterContext *ctx  = inlink->dst;
> +
> +    // keep a reference of each frame
> +    thumb->frames[thumb->n].buf = inlink->cur_buf;
> +
> +    // no selection until the buffer of N frames is filled up
> +    if (thumb->n < thumb->n_frames - 1) {
> +        thumb->n++;
> +        return;
> +    }
> +
> +    // average histogram of the N frames
> +    for (j = 0; j < FF_ARRAY_ELEMS(avg); j++)
> +        for (i = 0; i < thumb->n_frames; i++)

> +            avg[j] += (float)thumb->frames[i].histogram[j] / thumb->n_frames;

again, you can factor out the division

> +    // find the frame closer to the average using RMSE
> +    for (i = 0; i < thumb->n_frames; i++) {
> +        rmse = frame_rmse(thumb->frames[i].histogram, avg);
> +        if (i == 0 || rmse < min_rmse)
> +            best_frame = i, min_rmse = rmse;
> +    }
> +
> +    // free and reset everything (except the best frame buffer)
> +    for (i = 0; i < thumb->n_frames; i++) {

> +        memset(thumb->frames[i].histogram, 0, sizeof(thumb->frames[i].histogram));

is this required?

> +        if (i == best_frame)
> +            continue;
> +        avfilter_unref_buffer(thumb->frames[i].buf);
> +        thumb->frames[i].buf = NULL;
> +    }
> +    thumb->n = 0;
> +
> +    // raise the chosen one
> +    av_log(ctx, AV_LOG_INFO, "frame id #%d selected\n", best_frame);
> +    avfilter_start_frame(outlink, thumb->frames[best_frame].buf);
> +    thumb->frames[best_frame].buf = NULL;
> +    avfilter_draw_slice(outlink, 0, inlink->h, 1);
> +    avfilter_end_frame(outlink);
> +}
> +
> +static av_cold void uninit(AVFilterContext *ctx)
> +{
> +    int i;
> +    ThumbContext *thumb = ctx->priv;
> +    for (i = 0; i < thumb->n_frames && thumb->frames[i].buf; i++) {
> +        avfilter_unref_buffer(thumb->frames[i].buf);
> +        thumb->frames[i].buf = NULL;
> +    }
> +    av_freep(&thumb->frames);
> +}
> +
> +static void null_start_frame(AVFilterLink *link, AVFilterBufferRef *picref) { }
> +
> +static int request_frame(AVFilterLink *link)
> +{
> +    ThumbContext *thumb = link->src->priv;
> +
> +    /* loop until a frame thumbnail is available (when a frame is queued,
> +     * thumb->n is reset to zero) */
> +    while (thumb->n) {
> +        int ret = avfilter_request_frame(link->src->inputs[0]);
> +        if (ret < 0)
> +            return ret;
> +    }
> +    return 0;
> +}
> +
> +static int poll_frame(AVFilterLink *link)
> +{
> +    ThumbContext *thumb  = link->src->priv;
> +    AVFilterLink *inlink = link->src->inputs[0];
> +    int ret, available_frames = avfilter_poll_frame(inlink);
> +
> +    /* If the input link is not able to provide any frame, we can't do anything
> +     * at the moment and thus have zero thumbnail available. */
> +    if (!available_frames)
> +        return 0;
> +
> +    /* Since at least one frame is available and the next frame will allow us
> +     * to compute a thumbnail, we can return 1 frame. */
> +    if (thumb->n == thumb->n_frames - 1)
> +        return 1;
> +
> +    /* we have some frame(s) available in the input link, but not yet enough to
> +     * output a thumbnail, so we request more */
> +    ret = avfilter_request_frame(inlink);
> +    return ret < 0 ? ret : 0;
> +}
> +

> +static int query_formats(AVFilterContext *ctx)
> +{
> +    static const enum PixelFormat pix_fmts[] = {
> +        PIX_FMT_RGB24, PIX_FMT_BGR24,
> +        PIX_FMT_NONE
> +    };
> +    avfilter_set_common_pixel_formats(ctx, avfilter_make_format_list(pix_fmts));
> +    return 0;

note: this can be easily extended to support more pixel formats

> +}
> +
> +AVFilter avfilter_vf_thumbnail = {
> +    .name          = "thumbnail",

> +    .description   = NULL_IF_CONFIG_SMALL("Thumbnail selection filter"),

Nit: description is a complete sentence describing what the filter
*does*, rather than a long name.
 -- 
FFmpeg = Fostering & Formidable Murdering Philosofic Epic Glue