[FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS

Ganesh Ajjanagadde gajjanag at mit.edu
Thu Oct 22 22:17:31 CEST 2015


On Fri, Oct 16, 2015 at 7:53 AM, Ganesh Ajjanagadde <gajjanag at mit.edu> wrote:
> On Fri, Oct 16, 2015 at 7:30 AM, Michael Niedermayer
> <michael at niedermayer.cc> wrote:
>> On Thu, Oct 15, 2015 at 06:38:10AM -0400, Ganesh Ajjanagadde wrote:
>>> On Wed, Oct 14, 2015 at 6:53 AM, Hendrik Leppkes <h.leppkes at gmail.com> wrote:
>>> > On Wed, Oct 14, 2015 at 12:49 PM, Carl Eugen Hoyos <cehoyos at ag.or.at> wrote:
>>> >> Ganesh Ajjanagadde <gajjanag <at> mit.edu> writes:
>>> >>
>>> >>> What? My numbers actually show that the new code may be faster -
>>> >>
>>> >> No, you are misunderstanding the numbers you posted.
>>> >> (Or I misunderstand them but nobody said so yet.)
>>> >>
>>> >> Highest runs are most relevant, skips have to be
>>> >> avoided (afaik).
>>> >>
>>> >> [...]
>>> >>
>>> >>> If you continue to post such stuff that has no basis, I might actually
>>> >>> get tempted into finding out for which floating point values the new
>>> >>> code is significantly faster, craft a relevant audio file, and post it
>>> >>> showing a huge performance difference - my random numbers benchmark
>>> >>> shows there must exist such values.
>>> >>
>>> >> Please do so!
>>> >>
>>> >>> > The more important question is if you can see the same
>>> >>> > changes in the disassembly of af_astats.o as what
>>> >>> > ubitux posted here for a short test function?
>>> >>>
>>> >>> I do. He uses clang/gcc, so do I.
>>> >>
>>> >> Sorry, my understanding fails here (I am not a native speaker):
>>> >> You did look at the disassembly of af_astats.o and there is
>>> >> inlined code instead of a function call?
>>> >>
>>> >>> The reason (irrelevant) is that both
>>> >>> of us run Arch.
>>> >>>
>>> >>> What is "more relevant" is if _you_ can see the changes
>>> >>> on some non Linux platform.
>>> >>
>>> >> If you could show that it is faster on any platform
>>> >> I would already be happy!
>>> >>
>>> >
>>> > A more important check would be that its not significantly slower on
>>> > any other platform. Just because one compiler/glibc combination
>>> > manages to produce an efficient inlined function doesn't necessarily
>>> > mean that some other compiler or libc couldn't produce a full function
>>> > call with all the overhead that comes with it, becoming significantly
>>> > slower.
>>>
>>> As I point out, all a libc implementer needs to do to be on par with
>>> the macro is to add the inline keyword. This was added in c99. If said
>>> libc does not, then it is fundamentally broken from a performance
>>> perspective. A beginning programmer can do that in a couple of
>>> minutes. Fix upstream and complain to them if it does not inline.
>>
>> I dont know how the latest compilers handle "inline" but a few years
>> ago gcc was rather dumb about inlining, and i think its not easy for
>> a compiler to be actually not "dumb"
>>
>> A compiler cannot inline everything that has the inline keyword,
>> it would lead (for some source code) to an explosion on size and
>> compile time.
>> and a good compiler will want to inline some functions even if they
>> do not have the inline keyword
>> Also its not easy to know for a compiler what to
>> inline and what not, there could be 10 functions a1(),a2(), a3(), ...
>> each calling the previous 10 times ...
>> the way gcc handled this (in the past and AFAIK at least) is to have
>> various complicated thresholds that limit the amount of inlining.
>> The big annoyance with this (years ago at least) was that if you
>> forced a function to be inlined by "force" gcc would then stop
>> inlining something else and you ended up either forcing every single
>> function you needed inlined or would have had to tune the thresholds
>>
>> it would be interresting to check if replacing FFABS by fabs causes
>> any big changes to inlining behavior (maybe that can be done by
>> comparing the list of symbols in the object files as fully inlined
>> functions s´wouldnt show up but maybe there are other ways)
>>
>> anyway iam not against using fabs() for float/double FFABS()
>> i just think some assumtations in this thread are possibly too
>> optimistic, but its quite possible these replacements are all fine
>> and the changes in inlining if any have no performance impact
>
> I myself am not "optimistic" in the sense that I think most of the
> time this will have zero change. All I am saying is that in cases
> where there is a difference, it will likely be in favor of fabs, etc
> and not the macro due to reasons I mentioned in the long commit
> message I posted.
>
>>
>> also if a *abs is implemented by using a branch (as in if its positive
>> jump over a negate instruction) then branch prediction can play a
>> sigificant role in performance, that is random values would be alot
>> slower than the same values ordered
>
> Maybe this is why I get such a large difference between fabs and FFABS
> in favor of fabs - I just keep random numbers with no ordering. If
> true, this is definitely in fabs's favor.
>
>> a good implementation should not use a branch though, abs for floats
>> and doubles is just setting the sign bit basically, platforms should
>> have a dedicated instruction for that or in some cases a integer
>> and/or could maybe even be used
>
> That was the point of the original libc link - I am somewhat annoyed
> that some dismissed it as "irrelevant" in a cavalier manner.
> Basically, what the glibc people observed was that the compiler was
> not always optimizing FFABS correctly (as compared to fabs etc). Maybe
> this leads to a performance difference.

To put an end to a long and tortuous thread, and due to the lack of
relevant outstanding objections, pushed.

>
>>
>> [...]
>> --
>> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>>
>> Why not whip the teacher when the pupil misbehaves? -- Diogenes of Sinope
>>
>> _______________________________________________
>> ffmpeg-devel mailing list
>> ffmpeg-devel at ffmpeg.org
>> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>


More information about the ffmpeg-devel mailing list