[FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS

Fri Oct 16 13:53:36 CEST 2015

On Fri, Oct 16, 2015 at 7:30 AM, Michael Niedermayer
<michael at niedermayer.cc> wrote:
> On Thu, Oct 15, 2015 at 06:38:10AM -0400, Ganesh Ajjanagadde wrote:
>> On Wed, Oct 14, 2015 at 6:53 AM, Hendrik Leppkes <h.leppkes at gmail.com> wrote:
>> > On Wed, Oct 14, 2015 at 12:49 PM, Carl Eugen Hoyos <cehoyos at ag.or.at> wrote:
>> >> Ganesh Ajjanagadde <gajjanag <at> mit.edu> writes:
>> >>
>> >>> What? My numbers actually show that the new code may be faster -
>> >>
>> >> No, you are misunderstanding the numbers you posted.
>> >> (Or I misunderstand them but nobody said so yet.)
>> >>
>> >> Highest runs are most relevant, skips have to be
>> >> avoided (afaik).
>> >>
>> >> [...]
>> >>
>> >>> If you continue to post such stuff that has no basis, I might actually
>> >>> get tempted into finding out for which floating point values the new
>> >>> code is significantly faster, craft a relevant audio file, and post it
>> >>> showing a huge performance difference - my random numbers benchmark
>> >>> shows there must exist such values.
>> >>
>> >> Please do so!
>> >>
>> >>> > The more important question is if you can see the same
>> >>> > changes in the disassembly of af_astats.o as what
>> >>> > ubitux posted here for a short test function?
>> >>>
>> >>> I do. He uses clang/gcc, so do I.
>> >>
>> >> Sorry, my understanding fails here (I am not a native speaker):
>> >> You did look at the disassembly of af_astats.o and there is
>> >> inlined code instead of a function call?
>> >>
>> >>> The reason (irrelevant) is that both
>> >>> of us run Arch.
>> >>>
>> >>> What is "more relevant" is if _you_ can see the changes
>> >>> on some non Linux platform.
>> >>
>> >> If you could show that it is faster on any platform
>> >> I would already be happy!
>> >>
>> >
>> > A more important check would be that its not significantly slower on
>> > any other platform. Just because one compiler/glibc combination
>> > manages to produce an efficient inlined function doesn't necessarily
>> > mean that some other compiler or libc couldn't produce a full function
>> > call with all the overhead that comes with it, becoming significantly
>> > slower.
>>
>> As I point out, all a libc implementer needs to do to be on par with
>> the macro is to add the inline keyword. This was added in c99. If said
>> libc does not, then it is fundamentally broken from a performance
>> perspective. A beginning programmer can do that in a couple of
>> minutes. Fix upstream and complain to them if it does not inline.
>
> I dont know how the latest compilers handle "inline" but a few years
> ago gcc was rather dumb about inlining, and i think its not easy for
> a compiler to be actually not "dumb"
>
> A compiler cannot inline everything that has the inline keyword,
> it would lead (for some source code) to an explosion on size and
> compile time.
> and a good compiler will want to inline some functions even if they
> do not have the inline keyword
> Also its not easy to know for a compiler what to
> inline and what not, there could be 10 functions a1(),a2(), a3(), ...
> each calling the previous 10 times ...
> the way gcc handled this (in the past and AFAIK at least) is to have
> various complicated thresholds that limit the amount of inlining.
> The big annoyance with this (years ago at least) was that if you
> forced a function to be inlined by "force" gcc would then stop
> inlining something else and you ended up either forcing every single
> function you needed inlined or would have had to tune the thresholds
>
> it would be interresting to check if replacing FFABS by fabs causes
> any big changes to inlining behavior (maybe that can be done by
> comparing the list of symbols in the object files as fully inlined
> functions s´wouldnt show up but maybe there are other ways)
>
> anyway iam not against using fabs() for float/double FFABS()
> i just think some assumtations in this thread are possibly too
> optimistic, but its quite possible these replacements are all fine
> and the changes in inlining if any have no performance impact

I myself am not "optimistic" in the sense that I think most of the
time this will have zero change. All I am saying is that in cases
where there is a difference, it will likely be in favor of fabs, etc
and not the macro due to reasons I mentioned in the long commit
message I posted.

>
> also if a *abs is implemented by using a branch (as in if its positive
> jump over a negate instruction) then branch prediction can play a
> sigificant role in performance, that is random values would be alot
> slower than the same values ordered

Maybe this is why I get such a large difference between fabs and FFABS
in favor of fabs - I just keep random numbers with no ordering. If
true, this is definitely in fabs's favor.

> a good implementation should not use a branch though, abs for floats
> and doubles is just setting the sign bit basically, platforms should
> have a dedicated instruction for that or in some cases a integer
> and/or could maybe even be used

That was the point of the original libc link - I am somewhat annoyed
that some dismissed it as "irrelevant" in a cavalier manner.
Basically, what the glibc people observed was that the compiler was
not always optimizing FFABS correctly (as compared to fabs etc). Maybe
this leads to a performance difference.

>
> [...]
> --
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> Why not whip the teacher when the pupil misbehaves? -- Diogenes of Sinope
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>