[FFmpeg-devel] [PATCH] avutil/mathematics: speed up av_gcd by using Stein's binary GCD algorithm

Ganesh Ajjanagadde gajjanag at mit.edu
Sun Oct 11 01:45:17 CEST 2015


On Sat, Oct 10, 2015 at 7:14 PM, Michael Niedermayer
<michael at niedermayer.cc> wrote:
> On Sat, Oct 10, 2015 at 11:32:06PM +0200, Henrik Gramner wrote:
>> On Sat, Oct 10, 2015 at 11:06 PM, Ganesh Ajjanagadde
>> <gajjanagadde at gmail.com> wrote:
>> > This uses Stein's binary GCD algorithm:
>> > https://en.wikipedia.org/wiki/Binary_GCD_algorithm
>> > to get a reported 1.7-2.1x speedup over Euclidean GCD on standard architectures.
>> > Have not benchmarked, so I can't comment
>>
>> Before you submit a patch that's supposed to make something faster,
>> you should benchmark it to verify that it is in fact faster. Do this
>> with inputs of various sizes on both 32- and 64-bit architectures and
>> both with and without compilers that support __builtin_ctzll(v).
>
> without __builtin_ctzll() the old code seems faster in a simple test
> like:
> make -j12 && ./ffmpeg -i matrixbench_mpeg2.mpg test.mov

>
> also it seems a simple
>     while (u != v) {
>         if (u > v)
>             FFSWAP(int64_t, v, u);
>         v -= u;
>         do {
>             v >>= 1;
>         } while(!(v & 1));
>     }
> is faster than the non builtin ctzll in its place but still not as
> fast as the current code

So I checked out the various implementation of (non built in) ctzll at
https://graphics.stanford.edu/~seander/bithacks.html#ZerosOnRightBinSearch.
The funny thing is, among the looping ones, the fastest is the simplest :):
static inline int ff_ctzll_c(long long v)
{
    int c = 0;
    while (!(v & 1)) {
        c++;
        v >>= 1;
    }
    return c;
}

The De Bruijn based approach is intriguing:
https://graphics.stanford.edu/~seander/bithacks.html#ZerosOnRightMultLookup,
since there is no loop there, and assuming multiplies are fast, it
should be good.

I guess one could in principle simply write the asm instruction, but I
can't do that with my current knowledge.

>
> [...]
> --
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> What does censorship reveal? It reveals fear. -- Julian Assange
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>


More information about the ffmpeg-devel mailing list