[FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS

Michael Niedermayer michael at niedermayer.cc
Fri Oct 16 13:30:53 CEST 2015

On Thu, Oct 15, 2015 at 06:38:10AM -0400, Ganesh Ajjanagadde wrote:
> On Wed, Oct 14, 2015 at 6:53 AM, Hendrik Leppkes <h.leppkes at gmail.com> wrote:
> > On Wed, Oct 14, 2015 at 12:49 PM, Carl Eugen Hoyos <cehoyos at ag.or.at> wrote:
> >> Ganesh Ajjanagadde <gajjanag <at> mit.edu> writes:
> >>
> >>> What? My numbers actually show that the new code may be faster -
> >>
> >> No, you are misunderstanding the numbers you posted.
> >> (Or I misunderstand them but nobody said so yet.)
> >>
> >> Highest runs are most relevant, skips have to be
> >> avoided (afaik).
> >>
> >> [...]
> >>
> >>> If you continue to post such stuff that has no basis, I might actually
> >>> get tempted into finding out for which floating point values the new
> >>> code is significantly faster, craft a relevant audio file, and post it
> >>> showing a huge performance difference - my random numbers benchmark
> >>> shows there must exist such values.
> >>
> >> Please do so!
> >>
> >>> > The more important question is if you can see the same
> >>> > changes in the disassembly of af_astats.o as what
> >>> > ubitux posted here for a short test function?
> >>>
> >>> I do. He uses clang/gcc, so do I.
> >>
> >> Sorry, my understanding fails here (I am not a native speaker):
> >> You did look at the disassembly of af_astats.o and there is
> >> inlined code instead of a function call?
> >>
> >>> The reason (irrelevant) is that both
> >>> of us run Arch.
> >>>
> >>> What is "more relevant" is if _you_ can see the changes
> >>> on some non Linux platform.
> >>
> >> If you could show that it is faster on any platform
> >> I would already be happy!
> >>
> >
> > A more important check would be that its not significantly slower on
> > any other platform. Just because one compiler/glibc combination
> > manages to produce an efficient inlined function doesn't necessarily
> > mean that some other compiler or libc couldn't produce a full function
> > call with all the overhead that comes with it, becoming significantly
> > slower.
> As I point out, all a libc implementer needs to do to be on par with
> the macro is to add the inline keyword. This was added in c99. If said
> libc does not, then it is fundamentally broken from a performance
> perspective. A beginning programmer can do that in a couple of
> minutes. Fix upstream and complain to them if it does not inline.

I dont know how the latest compilers handle "inline" but a few years
ago gcc was rather dumb about inlining, and i think its not easy for
a compiler to be actually not "dumb"

A compiler cannot inline everything that has the inline keyword,
it would lead (for some source code) to an explosion on size and
compile time.
and a good compiler will want to inline some functions even if they
do not have the inline keyword
Also its not easy to know for a compiler what to
inline and what not, there could be 10 functions a1(),a2(), a3(), ...
each calling the previous 10 times ...
the way gcc handled this (in the past and AFAIK at least) is to have
various complicated thresholds that limit the amount of inlining.
The big annoyance with this (years ago at least) was that if you
forced a function to be inlined by "force" gcc would then stop
inlining something else and you ended up either forcing every single
function you needed inlined or would have had to tune the thresholds

it would be interresting to check if replacing FFABS by fabs causes
any big changes to inlining behavior (maybe that can be done by
comparing the list of symbols in the object files as fully inlined
functions s´wouldnt show up but maybe there are other ways)

anyway iam not against using fabs() for float/double FFABS()
i just think some assumtations in this thread are possibly too
optimistic, but its quite possible these replacements are all fine
and the changes in inlining if any have no performance impact

also if a *abs is implemented by using a branch (as in if its positive
jump over a negate instruction) then branch prediction can play a
sigificant role in performance, that is random values would be alot
slower than the same values ordered
a good implementation should not use a branch though, abs for floats
and doubles is just setting the sign bit basically, platforms should
have a dedicated instruction for that or in some cases a integer
and/or could maybe even be used

Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Why not whip the teacher when the pupil misbehaves? -- Diogenes of Sinope
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20151016/2719be99/attachment.sig>

More information about the ffmpeg-devel mailing list