[FFmpeg-devel] [PATCH] lavfi: add concat filter.
Stefano Sabatini
stefasab at gmail.com
Thu Jul 19 20:14:46 CEST 2012
On date Wednesday 2012-07-18 00:48:55 +0200, Nicolas George encoded:
>
> Signed-off-by: Nicolas George <nicolas.george at normalesup.org>
> ---
> Changelog | 1 +
> doc/filters.texi | 73 +++++++++
> libavfilter/Makefile | 1 +
> libavfilter/allfilters.c | 1 +
> libavfilter/f_concat.c | 409 ++++++++++++++++++++++++++++++++++++++++++++++
> 5 files changed, 485 insertions(+)
> create mode 100644 libavfilter/f_concat.c
>
>
> Note: do not try it with -filter_complex yet, or be sure to only use very
> short files that can be decompressed completely in memory. Also, it will not
> work withoht the "ffmpeg: probe buffersinks once more after EOF" patch.
>
>
> diff --git a/Changelog b/Changelog
> index e7b8a02..073b851 100644
> --- a/Changelog
> +++ b/Changelog
> @@ -29,6 +29,7 @@ version next:
> - new option: -progress
> - 3GPP Timed Text decoder
> - GeoTIFF decoder support
> +- concat filter
>
>
> version 0.11:
> diff --git a/doc/filters.texi b/doc/filters.texi
> index 4a6c092..cd2e7a8 100644
> --- a/doc/filters.texi
> +++ b/doc/filters.texi
> @@ -3988,6 +3988,79 @@ tools.
>
> Below is a description of the currently available transmedia filters.
>
> + at section concat
> +
> +Concatenate audio and video streams, joining them together one after the
> +other.
> +
> +The filter works on segments of synchronized video and audio streams. All
> +segments must have the same number of streams of each type, and that will
> +also be the number of streams at output.
> +
> +The filter accepts the following named parameters:
> + at table @option
> +
> + at item s
> +Set the number of segments. Default is 2.
Maybe "n", so there is no conflict when Subtitles will be added (if
ever). Also having long aliases may help script readability.
> +
> + at item v
> +Set the number of video streams. Default is 1.
output video streams, that is also the number of video streams in each
segment.
> +
> + at item a
> +Set the number of audio streams. Default is 0.
output audio streams, that is also the number of audio streams in each
segment.
> +
> + at end table
> +
> +The filter has @var{v}+ at var{a} outputs: first @var{v} video outputs, then
> + at var{a} audio outputs.
> +
> +There are @var{s}×(@var{v}+ at var{a}) inputs: first the inputs for the first
> +segment, in the same order as the outputs, then the inputs for the second
> +segment, etc.
> +
> +Related streams do not always have exactly the same duration, for various
> +reasons including codec frame size or sloppy authoring. For that reason,
> +related synchronized streams (e.g. a video and its audio track) should be
> +concatenated at once. The concat filter will use the duration of the longest
> +stream in each segment (except the last one), and if necessary pad shorter
> +audio streams with silence.
> +
> +For this filter to work correctly, all segments must start at timestamp 0.
> +
> +All corresponding streams must have the same parameters in all segments; the
> +filtering system will automatically select a common pixel format for video
> +streams, and a common sample format, sample rate and channel layout for
> +audio streams, but other settings, such as resolution, must be converted
> +explicitly by the user.
> +
> +Different frame rates are acceptable but will result in variable frame rate
> +at output; be sure to configure the output file to handle it.
> +
> +Examples:
> + at itemize
> + at item
> +Concatenate an opening, an episode and an ending, all in bilingual version
> +(video in stream 0, audio in streams 1 and 2):
> + at example
> +ffmpeg -i opening.mkv -i episode.mkv -i ending.mkv -filter_complex \
> + '[0:0] [0:1] [0:2] [1:0] [1:1] [1:2] [2:0] [2:1] [2:2]
> + concat=s=3:v=1:a=2 [v] [a1] [a2]' \
> + -map '[v]' -map '[a1]' -map '[a2]' output.mkv
> + at end example
> +
> + at item
> +Concatenate two parts, handling audio and video separately, using the
> +(a)movie sources, and adjusting the resolution:
> + at example
> +movie=part1.mp4, scale=512:288 [v1] ; amovie=part1.mp4 [a1] ;
> +movie=part2.mp4, scale=512:288 [v2] ; amovie=part2.mp4 [a2] ;
> +[v1] [v2] concat [outv] ; [a1] [a2] concat=v=0:a=1 [outa]
> + at end example
Why not:
movie=part1.mp4, scale=512:288 [v1] ; amovie=part1.mp4 [a1] ;
movie=part2.mp4, scale=512:288 [v2] ; amovie=part2.mp4 [a2] ;
[v1][a1] [v2][a2] concat=v=1:a=1 [outv][outa]
?
> +Note that a desync will happen at the stitch if the audio and video streams
> +do not have exactly the same duration in the first file.
> +
> + at end itemize
> +
> @section showwaves
>
> Convert input audio to a video output, representing the samples waves.
> diff --git a/libavfilter/Makefile b/libavfilter/Makefile
> index b094f59..642a105 100644
> --- a/libavfilter/Makefile
> +++ b/libavfilter/Makefile
> @@ -197,6 +197,7 @@ OBJS-$(CONFIG_MP_FILTER) += libmpcodecs/vf_yvu9.o
> OBJS-$(CONFIG_MP_FILTER) += libmpcodecs/pullup.o
>
> # transmedia filters
> +OBJS-$(CONFIG_CONCAT_FILTER) += f_concat.o
> OBJS-$(CONFIG_SHOWWAVES_FILTER) += avf_showwaves.o
maybe I should rename avf_showwaves -> f_showwaves (or the other way
around, f_concat -> avf_concat).
>
> TOOLS = graph2dot
> diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c
> index 706405e..cae4c99 100644
> --- a/libavfilter/allfilters.c
> +++ b/libavfilter/allfilters.c
> @@ -134,6 +134,7 @@ void avfilter_register_all(void)
> REGISTER_FILTER (NULLSINK, nullsink, vsink);
>
> /* transmedia filters */
> + REGISTER_FILTER (CONCAT, concat, avf);
> REGISTER_FILTER (SHOWWAVES, showwaves, avf);
>
> /* those filters are part of public or internal API => registered
> diff --git a/libavfilter/f_concat.c b/libavfilter/f_concat.c
> new file mode 100644
> index 0000000..7a8912f
> --- /dev/null
> +++ b/libavfilter/f_concat.c
> @@ -0,0 +1,409 @@
> +/*
> + * Copyright (c) 2012 Nicolas George
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> + * See the GNU Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public License
> + * along with FFmpeg; if not, write to the Free Software Foundation, Inc.,
> + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +/**
> + * @file
> + * concat audio-video filter
> + */
> +
> +#include "libavutil/avassert.h"
> +#include "libavutil/opt.h"
> +#include "avfilter.h"
> +#define FF_BUFQUEUE_SIZE 256
> +#include "bufferqueue.h"
> +#include "internal.h"
> +#include "video.h"
> +#include "audio.h"
> +
> +#define TYPE_ALL 2
> +
> +typedef struct {
> + const AVClass *class;
> + unsigned nb_streams[TYPE_ALL];
Nit: this is ambiguous, what about nb_segment_streams (or simply add a ///< doxy
> + unsigned nb_seg;
nit: nb_segments
> + unsigned cur_idx;
> + int64_t delta_ts;
please explain in doxy what these fields mean
> + unsigned nb_in_active;
comment: number of active input segments, or number of active inputs?
Also I suppose a segment is considered "active" when no corresponding
stream already terminated, right?
> + struct concat_in {
> + int64_t pts;
> + int64_t nb_frames;
> + unsigned eof;
> + struct FFBufQueue queue;
> + } *in;
> +} ConcatContext;
> +
> +#define OFFSET(x) offsetof(ConcatContext, x)
> +
> +static const AVOption concat_options[] = {
> + { "s", "specify the number of segments", OFFSET(nb_seg),
> + AV_OPT_TYPE_INT, { .dbl = 2 }, 2, INT_MAX },
> + { "v", "specify the number of video streams",
> + OFFSET(nb_streams[AVMEDIA_TYPE_VIDEO]),
> + AV_OPT_TYPE_INT, { .dbl = 1 }, 1, INT_MAX },
> + { "a", "specify the number of audio streams",
> + OFFSET(nb_streams[AVMEDIA_TYPE_AUDIO]),
> + AV_OPT_TYPE_INT, { .dbl = 0 }, 0, INT_MAX },
> + { 0 }
> +};
Nit+: one line per option?
> +
> +AVFILTER_DEFINE_CLASS(concat);
> +
> +static av_cold void uninit(AVFilterContext *ctx)
I'd find more logical to have init before uninit.
> +{
> + ConcatContext *cat = ctx->priv;
> + unsigned i;
> +
> + for (i = 0; i < ctx->nb_inputs; i++) {
> + av_freep(&ctx->input_pads[i].name);
> + ff_bufqueue_discard_all(&cat->in[i].queue);
> + }
> + for (i = 0; i < ctx->nb_outputs; i++)
> + av_freep(&ctx->output_pads[i].name);
> + av_free(cat->in);
> +}
> +
> +static int query_formats(AVFilterContext *ctx)
> +{
> + ConcatContext *cat = ctx->priv;
> + unsigned type, nb_str, idx0 = 0, idx, str, seg;
Nit++: "str" is usually associated to strings, I suggest to
nb_str->nb_streams and str->stream_idx and move them to the internal
loop where they're used.
> + AVFilterFormats *formats, *rates;
> + AVFilterChannelLayouts *layouts;
> +
> + for (type = 0; type < TYPE_ALL; type++) {
> + nb_str = cat->nb_streams[type];
> + for (str = 0; str < nb_str; str++) {
> + idx = idx0;
maybe add a comment here along the lines: set the output stream formats
> + formats = ff_all_formats(AVMEDIA_TYPE_VIDEO);
> + if (!formats)
> + return AVERROR(ENOMEM);
> + ff_formats_ref(formats, &ctx->outputs[idx]->in_formats);
Uhm... why ff_all_formats(AVMEDIA_TYPE_VIDEO) is not under an if
(VIDEO) block?
> + if (type == AVMEDIA_TYPE_AUDIO) {
> + rates = ff_all_samplerates();
> + layouts = ff_all_channel_layouts();
> + if (!rates || !layouts)
> + return AVERROR(ENOMEM);
leak if rates && !layouts
> + ff_formats_ref(rates, &ctx->outputs[idx]->in_samplerates);
> + ff_channel_layouts_ref(layouts, &ctx->outputs[idx]->in_channel_layouts);
> + }
comment: set the formats for each corresponding segment input
> + for (seg = 0; seg < cat->nb_seg; seg++) {
> + ff_formats_ref(formats, &ctx->inputs[idx]->out_formats);
> + if (type == AVMEDIA_TYPE_AUDIO) {
> + ff_formats_ref(rates, &ctx->inputs[idx]->out_samplerates);
> + ff_channel_layouts_ref(layouts, &ctx->inputs[idx]->out_channel_layouts);
> + }
> + idx += ctx->nb_outputs;
> + }
> + idx0++;
> + }
> + }
> + return 0;
> +}
> +
> +static unsigned find_link(AVFilterLink *link, AVFilterLink **links, unsigned n)
> +{
> + unsigned i;
> +
> + for (i = 0; i < n; i++)
> + if (link == links[i])
> + return i;
> + av_assert0(!"link found");
> +}
> +
> +static int config_output(AVFilterLink *outlink)
> +{
> + AVFilterContext *ctx = outlink->src;
> + ConcatContext *cat = ctx->priv;
> + unsigned out_no = find_link(outlink, ctx->outputs, ctx->nb_outputs);
> + unsigned in_no = out_no, seg;
> + AVFilterLink *inlink = ctx->inputs[in_no];
> +
> + /* enhancement: find a common one */
> + outlink->time_base = AV_TIME_BASE_Q;
> + outlink->w = inlink->w;
> + outlink->h = inlink->h;
> + outlink->sample_aspect_ratio = inlink->sample_aspect_ratio;
> + outlink->format = inlink->format;
> + for (seg = 1; seg < cat->nb_seg; seg++) {
> + inlink = ctx->inputs[in_no += ctx->nb_outputs];
> + /* possible enhancement: unsafe mode, do not check */
what would be the advantage of such mode?
> + /* possible enhancement: be more specific */
Yes please do (see below)
> + if (outlink->w != inlink->w ||
> + outlink->h != inlink->h ||
> + outlink->sample_aspect_ratio.num != inlink->sample_aspect_ratio.num ||
> + outlink->sample_aspect_ratio.den != inlink->sample_aspect_ratio.den) {
> + av_log(ctx, AV_LOG_ERROR, "Input %s has different parameters.\n",
> + ctx->input_pads[in_no].name);
I suggest: "Input link %s parameters size:%dx%d sar:%d/%d mismatch output link %s parameters size:%dx%d sar:%d/%d mismatch\n"
Nit++: no need for the final dot, inconsistent
> + return AVERROR(EINVAL);
> + }
> + }
> +
> + return 0;
> +}
> +
> +static void push_frame(AVFilterContext *ctx, unsigned in_no,
> + AVFilterBufferRef *buf)
> +{
> + ConcatContext *cat = ctx->priv;
> + unsigned out_no = in_no % ctx->nb_outputs;
> + AVFilterLink * inlink = ctx-> inputs[ in_no];
> + AVFilterLink *outlink = ctx->outputs[out_no];
> + struct concat_in *in = &cat->in[in_no];
> +
> + buf->pts = av_rescale_q(buf->pts, inlink->time_base, outlink->time_base);
> + in->pts = buf->pts;
> + in->nb_frames++;
> + /* add duration to input PTS */
> + if (inlink->sample_rate)
when can it happen than sample_rate is not set?
> + /* use number of audio samples */
> + in->pts += av_rescale_q(buf->audio->nb_samples,
> + (AVRational){ 1, inlink->sample_rate },
> + outlink->time_base);
> + else if (in->nb_frames >= 2)
> + /* use mean duration */
> + in->pts = av_rescale(in->pts, in->nb_frames, in->nb_frames - 1);
> +
> + buf->pts += cat->delta_ts;
> + switch (buf->type) {
> + case AVMEDIA_TYPE_VIDEO:
> + ff_start_frame(outlink, buf);
> + ff_draw_slice(outlink, 0, outlink->h, 1);
> + ff_end_frame(outlink);
> + break;
> + case AVMEDIA_TYPE_AUDIO:
> + ff_filter_samples(outlink, buf);
> + break;
> + }
> +}
> +
> +static void incoming_frame(AVFilterLink *inlink, AVFilterBufferRef *buf)
I'm slightly against the use of a nominal phrase for a function name,
what about "process_frame()"?
> +{
> + AVFilterContext *ctx = inlink->dst;
> + ConcatContext *cat = ctx->priv;
> + unsigned in_no = find_link(inlink, ctx->inputs, ctx->nb_inputs);
> +
> + if (in_no < cat->cur_idx) {
> + av_log(ctx, AV_LOG_ERROR, "Frame after EOF on input %s\n",
> + ctx->input_pads[in_no].name);
> + avfilter_unref_buffer(buf);
> + } if (in_no >= cat->cur_idx + ctx->nb_outputs) {
> + ff_bufqueue_add(ctx, &cat->in[in_no].queue, buf);
> + } else {
> + push_frame(ctx, in_no, buf);
> + }
> +}
> +
> +static void start_frame(AVFilterLink *inlink, AVFilterBufferRef *buf)
> +{
> +}
> +
> +static void draw_slice(AVFilterLink *inlink, int y, int h, int dir)
> +{
> +}
Nit+++: func(...) { }
saves some lines and looks more readable (to me).
> +
> +static void end_frame(AVFilterLink *inlink)
> +{
> + incoming_frame(inlink, inlink->cur_buf);
> +}
> +
> +static int filter_samples(AVFilterLink *inlink, AVFilterBufferRef *buf)
> +{
> + incoming_frame(inlink, buf);
> + return 0; /* enhancement: handle error return */
> +}
> +
> +static void close_input(AVFilterContext *ctx, unsigned in_no)
> +{
> + ConcatContext *cat = ctx->priv;
> +
> + cat->in[in_no].eof = 1;
> + cat->nb_in_active--;
> + av_log(ctx, AV_LOG_VERBOSE, "EOF on %s, %d streams left in segment.\n",
> + ctx->input_pads[in_no].name, cat->nb_in_active);
> +}
> +
> +static void find_next_delta_ts(AVFilterContext *ctx)
> +{
> + ConcatContext *cat = ctx->priv;
> + unsigned i = cat->cur_idx;
> + unsigned imax = i + ctx->nb_outputs;
> + int64_t pts;
> +
> + pts = cat->in[i++].pts;
> + for (; i < imax; i++)
> + pts = FFMAX(pts, cat->in[i].pts);
> + cat->delta_ts += pts;
> +}
> +
> +static void send_silence(AVFilterContext *ctx, unsigned in_no, unsigned out_no)
> +{
> + ConcatContext *cat = ctx->priv;
> + AVFilterLink *outlink = ctx->outputs[out_no];
> + int64_t base_pts = cat->in[in_no].pts;
> + int64_t nb_samples, sent = 0;
> + int frame_size;
> + AVRational stb = { 1, ctx->inputs[in_no]->sample_rate };
nit++: tb or time_base seems less ambiguous at first reading (silencetb, streamtb, streamb?)
> + AVFilterBufferRef *buf;
> +
> + if (!stb.den)
> + return;
> + nb_samples = av_rescale_q(cat->delta_ts - base_pts,
> + outlink->time_base, stb);
> + frame_size = FFMAX(9600, stb.den / 5); /* arbitrary */
> + while (nb_samples) {
> + frame_size = FFMIN(frame_size, nb_samples);
> + buf = ff_get_audio_buffer(outlink, AV_PERM_WRITE, frame_size);
> + if (!buf)
> + return;
> + buf->pts = base_pts + av_rescale_q(sent, stb, outlink->time_base);
> + ff_filter_samples(outlink, buf);
> + sent += frame_size;
> + nb_samples -= frame_size;
> + }
> +}
[...]
> +AVFilter avfilter_avf_concat = {
> + .name = "concat",
> + .description = NULL_IF_CONFIG_SMALL("concatenate audio and video streams."),
Concatenate ...
[...]
--
FFmpeg = Fast & Foolish Minimal Pacific Earthshaking God
More information about the ffmpeg-devel
mailing list