[FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS
gajjanag at mit.edu
Mon Oct 12 22:57:27 CEST 2015
On Mon, Oct 12, 2015 at 7:59 AM, Ganesh Ajjanagadde <gajjanag at mit.edu> wrote:
> On Mon, Oct 12, 2015 at 7:46 AM, Carl Eugen Hoyos <cehoyos at ag.or.at> wrote:
>> Ganesh Ajjanagadde <gajjanag <at> mit.edu> writes:
>>> It is well known that fabs and fabsf are at least as fast and usually
>>> faster than the FFABS macro, at least on the gcc+glibc combination.
>> I wasn't aware of this.
>> And I believe we support other compilers and other
>> libc implementations.
> Indeed, which is why performance comparisons are welcome. I argue
> below why any sane configuration should not regress performance wise.
> This is also "relevant information" in my view.
>>> For instance, see the reference:
>>> This was a patch to glibc in order to remove their usages. Given their
>>> general performance obsession (more than FFmpeg in many cases), they
>>> have ensured that fabs and fabsf never peform worse than FFABS.
>> Ok but is this really related?
> The reference is, the comment may not be, I was slightly annoyed at
> FFABS usage when libc provides them on all our platforms, and wanted a
> justification that would appeal to the FFmpeg crowd, namely peformance
> to move away from them.
>>> I have tested on x86-64 Haswell with GCC 5.2 - even with no strict IEEE
>>> mode enabled, and just the standard -O3 optimizations, there is a
>>> performance benefit.
>> This is the only relevant information imo.
>> Please provide (very, very short) information
>> on what you tested.
> Random integers, same style as before. I have not posted numbers,
> since my numbers are anyway meaningless: I lack non
> x86-64+(gcc/clang)+glibc configurations.
> As for that being the only relevant message, I do intend to shorten
> the message. The long stuff was simply my own personal motivation to
> make people understand why I did this stuff. Otherwise, I would have
> sent a separate message anyway in the patch thread, let me know what
> style you prefer.
>> Since you mention libc so often: Does the patch
>> work on win*, aix and other strange platforms?
> Why not, any standard, conformant fabs/fabsf should. Again, I lack the
> configurations and am just a university student with a single laptop.
> fabs and fabsf are already being used elsewhere. Inf anything, they
> are far better specified on IEEE 754 than FFABS - behavior with NaN,
> Inf, etc.
Bench from libavfilter/astats on a 15 min clip. Of course the
difference is slight, but nonetheless it exists. The best case is the
same, but look at the difference in the worst cases (as was mentioned
in the glibc link I gave, I suspect some trickery for subnormal
floats/Inf/0.0). By the way, I can show results skewing even more
heavily in favor of fabs by using "random" floating point numbers,
random in the sense of being a random 64 bit pattern (same style as my
old crude bench - fill a large array, and test). There, believe it or
not, I was getting a nearly 1.5-2x improvement.
Anyway, here it is:
4230 decicycles in abs, 1 runs, 0 skips
2520 decicycles in abs, 2 runs, 0 skips
1635 decicycles in abs, 4 runs, 0 skips
967 decicycles in abs, 8 runs, 0 skips
635 decicycles in abs, 16 runs, 0 skips
473 decicycles in abs, 32 runs, 0 skips
389 decicycles in abs, 64 runs, 0 skips
350 decicycles in abs, 128 runs, 0 skips
331 decicycles in abs, 256 runs, 0 skips
321 decicycles in abs, 512 runs, 0 skips
319 decicycles in abs, 1024 runs, 0 skips
318 decicycles in abs, 2048 runs, 0 skips
315 decicycles in abs, 4096 runs, 0 skips
317 decicycles in abs, 8192 runs, 0 skips
335 decicycles in abs, 16384 runs, 0 skips
335 decicycles in abs, 32768 runs, 0 skips
333 decicycles in abs, 65536 runs, 0 skips
342 decicycles in abs, 131072 runs, 0 skips
340 decicycles in abs, 262144 runs, 0 skips
345 decicycles in abs, 524285 runs, 3 skips
348 decicycles in abs, 1048565 runs, 11 skips
351 decicycles in abs, 2097129 runs, 23 skipsbitrate=N/A
352 decicycles in abs, 4194252 runs, 52 skipsbitrate=N/A
350 decicycles in abs, 8388498 runs, 110 skipsbitrate=N/A
351 decicycles in abs,16776993 runs, 223 skipsbitrate=N/A
352 decicycles in abs,33553999 runs, 433 skipsbitrate=N/A
351 decicycles in abs,67108036 runs, 828 skips
3540 decicycles in abs, 1 runs, 0 skips
2160 decicycles in abs, 2 runs, 0 skips
1447 decicycles in abs, 4 runs, 0 skips
881 decicycles in abs, 8 runs, 0 skips
594 decicycles in abs, 16 runs, 0 skips
455 decicycles in abs, 32 runs, 0 skips
382 decicycles in abs, 64 runs, 0 skips
361 decicycles in abs, 128 runs, 0 skips
356 decicycles in abs, 256 runs, 0 skips
334 decicycles in abs, 512 runs, 0 skips
322 decicycles in abs, 1024 runs, 0 skips
317 decicycles in abs, 2048 runs, 0 skips
315 decicycles in abs, 4096 runs, 0 skips
341 decicycles in abs, 8192 runs, 0 skips
363 decicycles in abs, 16383 runs, 1 skips
342 decicycles in abs, 32767 runs, 1 skips
354 decicycles in abs, 65532 runs, 4 skips
348 decicycles in abs, 131068 runs, 4 skips
354 decicycles in abs, 262138 runs, 6 skips
356 decicycles in abs, 524277 runs, 11 skips
356 decicycles in abs, 1048560 runs, 16 skips
354 decicycles in abs, 2097120 runs, 32 skipsbitrate=N/A
354 decicycles in abs, 4194251 runs, 53 skipsbitrate=N/A
353 decicycles in abs, 8388504 runs, 104 skipsbitrate=N/A
353 decicycles in abs,16777006 runs, 210 skipsbitrate=N/A
353 decicycles in abs,33553993 runs, 439 skipsbitrate=N/A
352 decicycles in abs,67107951 runs, 913 skips
>> Carl Eugen
>> ffmpeg-devel mailing list
>> ffmpeg-devel at ffmpeg.org
More information about the ffmpeg-devel