[FFmpeg-devel] [PATCH 2/2] lavfi: EBU R.128 scanner.

Mon Oct 1 22:14:56 CEST 2012

On Tue, Sep 25, 2012 at 10:39:36AM +0200, Stefano Sabatini wrote:
> On date Saturday 2012-09-22 10:16:02 +0200, Clément Bœsch encoded:
> > TODO:
> >  - lavfi minor bump
> >  - Changelog
> > ---
> >  configure                |   1 +
> >  doc/filters.texi         |  44 +++
> >  libavfilter/Makefile     |   1 +
> 
> >  libavfilter/af_ebur128.c | 689 +++++++++++++++++++++++++++++++++++++++++++++++
> 
> f_ebur128.c may be more suited, given the hybrid nature of the filter.
> 

Renamed

> >  libavfilter/allfilters.c |   1 +
> >  5 files changed, 736 insertions(+)
> >  create mode 100644 libavfilter/af_ebur128.c
> > 
> > diff --git a/configure b/configure
> > index c001c5f..394ccf4 100755
> > --- a/configure
> > +++ b/configure
> > @@ -1900,6 +1900,7 @@ decimate_filter_deps="gpl avcodec"
> >  delogo_filter_deps="gpl"
> >  deshake_filter_deps="avcodec"
> >  drawtext_filter_deps="libfreetype"
> > +ebur128_filter_deps="gpl"
> >  flite_filter_deps="libflite"
> >  frei0r_filter_deps="frei0r dlopen"
> >  frei0r_filter_extralibs='$ldl'
> > diff --git a/doc/filters.texi b/doc/filters.texi
> > index e202d38..01f2778 100644
> > --- a/doc/filters.texi
> > +++ b/doc/filters.texi
> > @@ -4349,6 +4349,50 @@ setpts=PTS+10/TB
> >  @end example
> >  @end itemize
> >  
> > + at section ebur128
> > +
> > +EBU R.128 scanner filter. This filter takes an audio stream as input and
> > +outputs it unchanged. By default, it logs a message at a frequency of 10Hz with
> > +the Momentary loudness (identified by @code{M}), Short-term loudness
> > +(@code{S}), Integrated loudness (@code{I}) and Loudness Range (@code{LRA}).
> 
> Maybe add a link to the official spec document.
> 

Okay, added.

> > +The filter also has a video output (see the @var{video} option) with a real
> > +time graph to observe the loudness evolution. The graphic contains the logged
> > +message mentioned above, so it is not printed anymore when this option is set,
> > +unless the verbose logging is set. The main graphing area contains the
> > +short-term loudness (3 seconds of analysis), and the gauge on the right is for
> > +the momentary loudness (400 milliseconds).
> > +
> > +The filter accepts the following named parameters:
> > +
> > + at table @option
> > +
> 
> > + at item video
> > +Activate the video output. The audio stream is passed unchanged whether this
> > +option is set or no. The video stream will appear in first position if
> > +activated. Default is @code{0}.
> 
> Since the video is optional, I'd rather expect the video stream to
> appear on the *second* output (and "appear in first position" is not
> very clear).
> 

I prefer that order since it's common to have video in first. Also, the
two different usages of the filter (with or without video) have quite
different purpose; one is essentially for humans, while the other is for
processing, so I don't see much point in making obvious the switch between
the two.

Nevertheless, I reworded that sentence.

[...]
> 
> > +static const uint8_t graph_colors[] = {
> > +    0xdd, 0x66, 0x66,   // value above 0LU non reached
> > +    0x66, 0x66, 0xdd,   // value below 0LU non reached
> > +    0x96, 0x33, 0x33,   // value above 0LU reached
> > +    0x33, 0x33, 0x96,   // value below 0LU reached
> > +    0xdd, 0x96, 0x96,   // value above 0LU line non reached
> > +    0x96, 0x96, 0xdd,   // value below 0LU line non reached
> > +    0xdd, 0x33, 0x33,   // value above 0LU line reached
> > +    0x33, 0x33, 0xdd,   // value below 0LU line reached
> > +};
> 
> Idle question: are those values specified by the spec?
> 

Nope, totally arbitrary. The recommendation doesn't specify how the data
should be represented, it just states what information must appear and at
which refresh rate.

So if you don't like these colors, feel free to change them :)

[...]
> > +static int config_video_output(AVFilterLink *outlink)
> > +{
> > +    int i, x, y;
> > +    uint8_t *p;
> > +    AVFilterContext *ctx = outlink->src;
> > +    EBUR128Context *ebur128 = ctx->priv;
> > +    AVFilterBufferRef *outpicref;
> > +
> 
> > +    if (ebur128->w < 640 || ebur128->h < 480) {
> > +        av_log(ctx, AV_LOG_ERROR, "Video size %dx%d is too small, "
> > +               "minimum size is 640x480\n", ebur128->w, ebur128->h);
> > +        return AVERROR(EINVAL);
> > +    }
> 
> Are these constraints required by the standard? Or is there an
> implementation-level reason for them?
> 

Added the following comment:
  /* check if there is enough space to represent everything decently */

It will hopefully answer your question.

[...]
> > +            /* push one video frame */
> > +            if (ebur128->do_video) {
> > +                int x, y;
> > +                uint8_t *p;
> > +                AVFilterLink *outlink = ctx->outputs[0];
> > +
> > +                const int y_loudness_lu_graph = lu_to_y(ebur128, loudness_3000 + 23);
> > +                const int y_loudness_lu_gauge = lu_to_y(ebur128, loudness_400  + 23);
> > +
> > +                /* draw the graph using the short-term loudness */
> > +                p = pic->data[0] + ebur128->graph.y*pic->linesize[0] + ebur128->graph.x*3;
> > +                for (y = 0; y < ebur128->graph.h; y++) {
> > +                    const uint8_t *c = get_graph_color(ebur128, y_loudness_lu_graph, y);
> > +
> > +                    memmove(p, p + 3, (ebur128->graph.w - 1) * 3);
> > +                    memcpy(p + (ebur128->graph.w - 1) * 3, c, 3);
> > +                    p += pic->linesize[0];
> > +                }
> > +
> > +                /* draw the gauge using the momentary loudness */
> > +                p = pic->data[0] + ebur128->gauge.y*pic->linesize[0] + ebur128->gauge.x*3;
> > +                for (y = 0; y < ebur128->gauge.h; y++) {
> > +                    const uint8_t *c = get_graph_color(ebur128, y_loudness_lu_gauge, y);
> > +
> > +                    for (x = 0; x < ebur128->gauge.w; x++)
> > +                        memcpy(p + x*3, c, 3);
> > +                    p += pic->linesize[0];
> > +                }
> > +
> > +                /* draw textual info */
> > +                drawtext(pic, PAD, PAD - PAD/2, FONT16, font_colors,
> > +                         LOG_FMT "     ", // padding to erase trailing characters
> > +                         loudness_400, loudness_3000,
> > +                         integrated_loudness, loudness_range);
> > +
> > +                /* set pts and push frame */
> > +                pic->pts = insamples->pts +
> > +                    av_rescale_q(i, (AVRational){ 1, inlink->sample_rate },
> > +                                 outlink->time_base);
> 
> > +                ff_start_frame(outlink, avfilter_ref_buffer(pic, ~AV_PERM_WRITE));
> > +                ff_draw_slice(outlink, 0, outlink->h, 1);
> > +                ff_end_frame(outlink);
> 
> missing checks on return values
> 

The 3 return values are now checked.

> > +            }
> > +
> > +            av_log(ctx, ebur128->do_video ? AV_LOG_VERBOSE : AV_LOG_INFO,
> > +                   LOG_FMT "\n", loudness_400, loudness_3000,
> > +                   integrated_loudness, loudness_range);
> > +        }
> > +    }
> > +
> > +    return ff_filter_samples(ctx->outputs[ebur128->do_video], insamples);
> > +}
> > +
> > +static int query_formats(AVFilterContext *ctx)
> > +{
> > +    EBUR128Context *ebur128 = ctx->priv;
> > +    AVFilterFormats *formats;
> > +    AVFilterChannelLayouts *layouts;
> > +    AVFilterLink *inlink = ctx->inputs[0];
> > +    AVFilterLink *outlink = ctx->outputs[0];
> > +
> > +    static const enum AVSampleFormat sample_fmts[] = { AV_SAMPLE_FMT_DBL, -1 };
> > +    static const int input_srate[] = {48000, -1}; // ITU-R BS.1770 provides coeff only for 48kHz
> > +    static const enum PixelFormat pix_fmts[] = { PIX_FMT_RGB24, -1 };
> > +
> > +    /* set input audio formats */
> > +    formats = ff_make_format_list(sample_fmts);
> > +    if (!formats)
> > +        return AVERROR(ENOMEM);
> > +    ff_formats_ref(formats, &inlink->out_formats);
> > +
> > +    layouts = ff_all_channel_layouts();
> > +    if (!layouts)
> > +        return AVERROR(ENOMEM);
> > +    ff_channel_layouts_ref(layouts, &inlink->out_channel_layouts);
> > +
> > +    formats = ff_make_format_list(input_srate);
> > +    if (!formats)
> > +        return AVERROR(ENOMEM);
> > +    ff_formats_ref(formats, &inlink->out_samplerates);
> > +
> > +    /* set optional output video format */
> > +    if (ebur128->do_video) {
> > +        formats = ff_make_format_list(pix_fmts);
> > +        if (!formats)
> > +            return AVERROR(ENOMEM);
> > +        ff_formats_ref(formats, &outlink->in_formats);
> > +        outlink = ctx->outputs[1];
> > +    }
> > +
> > +    /* set audio output formats (same as input since it's just a passthrough) */
> > +    formats = ff_make_format_list(sample_fmts);
> > +    if (!formats)
> > +        return AVERROR(ENOMEM);
> > +    ff_formats_ref(formats, &outlink->in_formats);
> > +
> > +    layouts = ff_all_channel_layouts();
> > +    if (!layouts)
> > +        return AVERROR(ENOMEM);
> > +    ff_channel_layouts_ref(layouts, &outlink->in_channel_layouts);
> > +
> > +    formats = ff_make_format_list(input_srate);
> > +    if (!formats)
> > +        return AVERROR(ENOMEM);
> > +    ff_formats_ref(formats, &outlink->in_samplerates);
> > +
> > +    return 0;
> > +}
> 
> This could be defined more logically before filter_samples().
> 

I like having this function available easily (so at the bottom or the top
of the file). But the top of the file is convoluted with various stuff and
I'd better keep that function away from that specific code. So it looks
like a good place to me here :)

> > +static av_cold void uninit(AVFilterContext *ctx)
> > +{
> > +    int i;
> > +    EBUR128Context *ebur128 = ctx->priv;
> > +
> > +    av_freep(&ebur128->y_line_ref);
> > +    av_freep(&ebur128->ch_weighting);
> > +    av_freep(&ebur128->i400.powers_hist);
> > +    av_freep(&ebur128->i3000.powers_hist);
> > +    for (i = 0; i < ebur128->nb_channels; i++) {
> > +        av_freep(&ebur128->i400.cache[i]);
> > +        av_freep(&ebur128->i3000.cache[i]);
> > +    }
> > +    for (i = 0; i < ctx->nb_outputs; i++)
> > +        av_freep(&ctx->output_pads[i].name);
> > +    avfilter_unref_bufferp(&ebur128->outpicref);
> > +}
> > +
> > +AVFilter avfilter_af_ebur128 = {
> > +    .name          = "ebur128",
> > +    .description   = NULL_IF_CONFIG_SMALL("EBU R.128 scanner."),
> > +    .priv_size     = sizeof(EBUR128Context),
> > +    .init          = init,
> > +    .uninit        = uninit,
> > +    .query_formats = query_formats,
> > +
> > +    .inputs = (const AVFilterPad[]) {
> > +        { .name             = "default",
> > +          .type             = AVMEDIA_TYPE_AUDIO,
> > +          .get_audio_buffer = ff_null_get_audio_buffer,
> > +          .filter_samples   = filter_samples, },
> > +        { .name = NULL }
> > +    },
> > +    .outputs = (const AVFilterPad[]) { { .name = NULL } },
> > +};
> 
> Looks great otherwise, I didn't read it very carefully, but I think we
> can commit and let it mature then.
> 

Right, then I'll push in a moment.

Note: the version I'm going to push is slightly improved. I can't remember
all the various changes but the main ones are:
 - Integrated loudness results fixed
 - I & LRA should be faster, with an histogram storing more info
 - added a summary at the end

> I also wonder if we could make it a bit more flexible, for example to
> make the short-term loudness and the momentary loudness windows
> configurable (possibly useful for other purposes), but I don't
> consider this blocking.

The recommendation is quite explicit:

    The momentary loudness uses a sliding rectangular time window of
    length 0.4 s. The measurement is not gated.

    The short-term loudness uses a sliding rectangular time window of
    length 3 s. The measurement is not gated. The update rate for ‘live
    meters’ shall be at least 10 Hz.

I'm sure we can add various customisation options, but I wouldn't play too
much with these settings. OTOH Maybe the accuracy could be configurable
for instance (see HIST_GRAIN), at the cost of a likely slower code. It
would also needs some extra checks to make sure the range is sane (you
need to take into account various stuff like the range of energies in
get_histogram()).

Anyway, the next step is to add the True Peak feature, and maybe some more
fancy information on the graph. I'll work on this later.

But now, if anyone wants to play with the math behind the two filters (PRE
and RLB) to merge them, as well as the support of other frequencies than
48kHz to avoid resampling, that would very welcome. It involves some kind
of complex math according to what I read on HydrogenAudio. Here are some
references I found over the threads:

  http://www.musicdsp.org/files/Audio-EQ-Cookbook.txt
  http://www.scribd.com/doc/6531763/DirectForm-Filter-Parameter-Quantization

Oh and BTW, thanks a lot to cbsrobot for providing me some references
results which greatly helped me to fix the integrated loudness.

-- 
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 490 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20121001/ff93cf02/attachment.asc>