[FFmpeg-devel] [PATCHv2] avutil/mathematics: speed up av_gcd by using Stein's binary GCD algorithm
Ganesh Ajjanagadde
gajjanagadde at gmail.com
Tue Oct 20 17:14:22 CEST 2015
On Tue, Oct 20, 2015 at 12:52 AM, James Almer <jamrial at gmail.com> wrote:
> On 10/11/2015 12:45 AM, Michael Niedermayer wrote:
>> On Sat, Oct 10, 2015 at 09:58:47PM -0400, Ganesh Ajjanagadde wrote:
>>> This uses Stein's binary GCD algorithm:
>>> https://en.wikipedia.org/wiki/Binary_GCD_algorithm
>>> to get a roughly 4x speedup over Euclidean GCD on standard architectures
>>> with a compiler intrinsic for ctzll, and a roughly 2x speedup otherwise.
>>> At the moment, the compiler intrinsic is used on GCC and Clang due to
>>> its easy availability.
>>>
>>> Quick note regarding overflow: yes, subtractions on int64_t can, but the
>>> llabs takes care of that. The llabs is also guaranteed to be safe, with
>>> no annoying INT64_MIN business since INT64_MIN being a power of 2, is
>>> shifted down before being sent to llabs.
>>>
>>> The binary GCD needs ff_ctzll, an extension of ff_ctz for long long (int64_t). On
>>> GCC, this is provided by a built-in. On Microsoft, there is a
>>> BitScanForward64 analog of BitScanForward that should work; but I can't confirm.
>>> Apparently it is not available on 32 bit builds; so this may or may not
>>> work correctly. On Intel, per the documentation there is only an
>>> intrinsic for _bit_scan_forward and people have posted on forums
>>> regarding _bit_scan_forward64, but often their documentation is
>>> woeful. Again, I don't have it, so I can't test.
>>>
>>> As such, to be safe, for now only the GCC/Clang intrinsic is added, the rest
>>> use a compiled version based on the De-Bruijn method of Leiserson et al:
>>> http://supertech.csail.mit.edu/papers/debruijn.pdf.
>>>
>>> Tested with FATE, sample benchmark (x86-64, GCC 5.2.0, Haswell)
>>> with a START_TIMER and STOP_TIMER in libavutil/rationsl.c, followed by a
>>> make fate.
>>>
>>> aac-am00_88.err:
>>> builtin:
>>> 714 decicycles in av_gcd, 4095 runs, 1 skips
>>>
>>> de-bruijn:
>>> 1440 decicycles in av_gcd, 4096 runs, 0 skips
>>>
>>> previous:
>>> 2889 decicycles in av_gcd, 4096 runs, 0 skips
>>>
>>> Signed-off-by: Ganesh Ajjanagadde <gajjanagadde at gmail.com>
>>> ---
>>> libavutil/intmath.h | 19 +++++++++++++++++++
>>> libavutil/mathematics.c | 26 +++++++++++++++++++++-----
>>> 2 files changed, 40 insertions(+), 5 deletions(-)
>>
>> applied
>>
>> thanks
>
> This broke fate with -ftrapv
> http://fate.ffmpeg.org/history.cgi?slot=x86_64-freebsd10-clang33-ftrapv
> http://fate.ffmpeg.org/history.cgi?slot=x86_64-openbsd5.6-gcc4.2-ftrapv
> I can reproduce this with GCC 5 on x86_64 ArchLinux as well.
Can't reproduce on GCC 5.2, ftrapv, x86_64 ArchLinux (checkasm fails
on ftrapv before and after the commit though). Are you sure it is this
commit that broke the ftrapv build? At least the "last good rev" on
fate.ffmpeg.org does not point to this commit or its parent.
>
> The problem seems to be in av_gcd() and not the De-Bruijn version of
> ff_ctzll since the gcc builtin is being used.
>
More information about the ffmpeg-devel
mailing list