[FFmpeg-devel] [PATCH V1] lavfi/normalize: improve the performance

Reimar Döffinger Reimar.Doeffinger at gmx.de
Fri Jun 21 11:39:05 EEST 2019

On Thu, Jun 20, 2019 at 09:43:46PM +0800, Jun Zhao wrote:
> From: Jun Zhao <barryjzhao at tencent.com>
> Remove unnecessary max value found, it's will improve the performance
> about 10%. Used the test command like:
> ffmpeg -i 1080P.mp4 -an -vf normalize -f null /dev/null, the FPS change
> from 96fps to 107fps.
> Signed-off-by: Jun Zhao <barryjzhao at tencent.com>
> ---
>  libavfilter/vf_normalize.c |    7 +++----
>  1 files changed, 3 insertions(+), 4 deletions(-)
> diff --git a/libavfilter/vf_normalize.c b/libavfilter/vf_normalize.c
> index 48eea59..40dc031 100644
> --- a/libavfilter/vf_normalize.c
> +++ b/libavfilter/vf_normalize.c
> @@ -143,14 +143,13 @@ static void normalize(NormalizeContext *s, AVFrame *in, AVFrame *out)
>          min[c].in = max[c].in = in->data[0][s->co[c]];
>      for (y = 0; y < in->height; y++) {
>          uint8_t *inp = in->data[0] + y * in->linesize[0];
> -        uint8_t *outp = out->data[0] + y * out->linesize[0];
>          for (x = 0; x < in->width; x++) {
>              for (c = 0; c < 3; c++) {
> -                min[c].in = FFMIN(min[c].in, inp[s->co[c]]);
> -                max[c].in = FFMAX(max[c].in, inp[s->co[c]]);
> +                uint8_t val = inp[s->co[c]];
> +                if (val < min[c].in)      min[c].in = val;
> +                else if (val > max[c].in) max[c].in = val;
>              }
>              inp += s->step;
> -            outp += s->step;

outp should just be removed, a patch for that certainly would be
But I am very skeptical towards the rest of the change.
In a way I would expect the compiler should be able to
optimize this.
However this code is clearly completely unoptimized, and
thus I don't think making it more complex for a minor optimization
that gives just 10% is worth it.
A proper optimization would get rid of the s->co indirection in the
inner loop - it really is not necessary for the limited format
support the code has.
This would then also allow for SIMD optimization, which should give
a vastly larger speedup - and possibly the compiler would even be able
to auto-vectorize.
This applies even more to the processing step - the transform
calculation done in SIMD is almost certain to be cheaper than
a LUT lookup - at least if switch to fixed-point arithmetic
instead of float.
Side note: I haven't really figured out what the point
of all this s->co complexity is supposed to be.
The only thing the code seems to actually care is whether
there is an alpha channel, and if so where it is.
But the quickest path to a significant speedup would
be to write a SIMD version of the normalize function specific
to that format and the SIMD instruction set available
and call it instead of normalize if applicable.
I would expect that you should see at least a 3x speedup
if using e.g. SSE2 (which is why I don't consider a
10% speedup worth changing the code for really).

Best regards,
Reimar Döffinger

More information about the ffmpeg-devel mailing list