[FFmpeg-devel] [PATCH] avutil/mathematics: speed up av_gcd by using Stein's binary GCD algorithm

Ronald S. Bultje rsbultje at gmail.com
Sun Oct 11 00:32:06 CEST 2015


Hi,

On Sat, Oct 10, 2015 at 6:07 PM, Ganesh Ajjanagadde <gajjanag at mit.edu>
wrote:

> On Sat, Oct 10, 2015 at 5:45 PM, Ganesh Ajjanagadde <gajjanag at mit.edu>
> wrote:
> > On Sat, Oct 10, 2015 at 5:32 PM, Henrik Gramner <henrik at gramner.com>
> wrote:
> >> On Sat, Oct 10, 2015 at 11:06 PM, Ganesh Ajjanagadde
> >> <gajjanagadde at gmail.com> wrote:
> >>> This uses Stein's binary GCD algorithm:
> >>> https://en.wikipedia.org/wiki/Binary_GCD_algorithm
> >>> to get a reported 1.7-2.1x speedup over Euclidean GCD on standard
> architectures.
> >>> Have not benchmarked, so I can't comment
> >>
> >> Before you submit a patch that's supposed to make something faster,
> >> you should benchmark it to verify that it is in fact faster. Do this
> >> with inputs of various sizes on both 32- and 64-bit architectures and
> >> both with and without compilers that support __builtin_ctzll(v).
> >
> > I did not do this because I don't know how to write a benchmark in C
> > easily: for instance, printing out can lead to I/O time loss, and on
> > O3, unless there is I/O, a function may not be evaluated at all due to
> > the "as if" optimization rules. Another problem is that the function
> > is small, so I can't just time a single call, there are cache issues
> > with arrays, etc etc. Yes, I can do it - but it will take some effort.
> >
> > Also, the only compilers I have are clang and gcc, both of which have
> > the builtin. I figured someone with a better array of tools and more
> > knowledgable in writing benchmarks can trivially do this. Furthermore,
> > at the moment, comparisons are incomplete: I don't care about Windows
> > myself but one could trivially add the lines if one so desired (see my
> > commit message).
> >
> > Lastly, I can't find any links on how to do benchmarks cleanly,
> > efficiently, and accurately. I hoped that FFmpeg has some clean way of
> > doing this, but all the methods I see are terribly ad-hoc.
>
> So I hacked up a quick and dirty benchmark. I fill up two input arrays
> with 10^8 entries each via random() calls. I then start a timer, and
> compute using the old method the gcd, inputs drawn from the arrays.
> Then I used the new method, repeating the above.
> Results (on my Intel Haswell, GCC 5.2):
> Old method: 22.755636
> New method: 6.495941


Search for usages of START_TIMER and STOP_TIMER, it's how we typically
measure these things :). Nice numbers!

Ronald


More information about the ffmpeg-devel mailing list