[FFmpeg-devel] [PATCH] lavc/aacenc_utils: replace powf(x, y) by expf(logf(x), y)

Ganesh Ajjanagadde gajjanag at gmail.com
Sat Mar 12 15:17:42 CET 2016


On Sat, Mar 12, 2016 at 9:15 AM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
> Hi,
>
> On Sat, Mar 12, 2016 at 9:03 AM, Ganesh Ajjanagadde <gajjanag at gmail.com>
> wrote:
>
>> On Fri, Mar 11, 2016 at 9:30 AM, Ronald S. Bultje <rsbultje at gmail.com>
>> wrote:
>> > Hi,
>> >
>> > On Thu, Mar 10, 2016 at 9:23 PM, Ganesh Ajjanagadde <gajjanag at gmail.com>
>> > wrote:
>> >
>> >> On Thu, Mar 10, 2016 at 8:56 AM, Ronald S. Bultje <rsbultje at gmail.com>
>> >> wrote:
>> >> > Hi,
>> >> >
>> >> > On Thu, Mar 10, 2016 at 2:37 AM, Reimar Döffinger <
>> >> Reimar.Doeffinger at gmx.de>
>> >> > wrote:
>> >> >
>> >> >> On 10.03.2016, at 03:06, Ganesh Ajjanagadde <gajjanag at gmail.com>
>> wrote:
>> >> >>
>> >> >> > On Wed, Mar 9, 2016 at 2:16 AM, Reimar Döffinger
>> >> >> > <Reimar.Doeffinger at gmx.de> wrote:
>> >> >> >> On 08.03.2016, at 04:48, Ganesh Ajjanagadde <gajjanag at gmail.com>
>> >> wrote:
>> >> >> >>
>> >> >> >>> +                    nzl += expf(logf(s / ethresh) * nzslope);
>> >> >> >>
>> >> >> >> Shouldn't log2f/exp2f be faster?
>> >> >> >> log2f at least has CPU support on x86 AFAICT.
>> >> >> >
>> >> >> > I had tested this, and no, though it is still faster than powf.
>> >> >> >
>> >> >> > It still seems to rely on libm, note that we don't use -ffast-math
>> and
>> >> >> > a look at
>> >> >> https://github.com/lattera/glibc/tree/master/sysdeps/x86_64/fpu
>> >> >> > as well seems to say no. Problem is, GNU people like to prioritize
>> >> >> > "correctly rounded" behavior over fast, reasonably accurate code,
>> >> >> > sometimes to ludicruous degrees.
>> >> >> >
>> >> >> > Personally, I don't know why we don't use -ffast-math, not many
>> seem
>> >> >> > to care that heavily on strict IEEE semantics. Maybe it leads to
>> too
>> >> >> > much variation across platforms?
>> >> >>
>> >> >> You lose some guarantees. In particular, the compiler will assume
>> NaNs
>> >> do
>> >> >> not happen and you cannot predict which code path (after a comparison
>> >> for
>> >> >> example) they take.
>> >> >> But some code for either security or correctness reasons needs them
>> to
>> >> be
>> >> >> handled a certain way.
>> >> >> I guess in theory you could try to make sure fisnan is used in all
>> those
>> >> >> cases, but then you need to find them, and I think if you take
>> >> -ffast-math
>> >> >> description literally there is no guarantee that even fisnan
>> continues
>> >> to
>> >> >> work... I am also not sure none of the code relies on order of
>> >> operations
>> >> >> to get the precision it needs.
>> >> >> So it is simply too dangerous.
>> >> >> Some more specific options might be possible to use though (but I
>> think
>> >> >> even full -ffast-math gains you almost nothing? Does it even help
>> >> here?).
>> >>
>> >> Yes, sorry, I meant some specific things from -ffast-math. I checked
>> >> configure, most of the unambiguously clear ones are already being
>> >> turned on. As such, it seems ok.
>> >>
>> >> >
>> >> >
>> >> > One could also consider writing some customized assembly (calling the
>> >> > relevant instructions instead of C wrappers) in cases where it is
>> >> > speed-sensitive. It's sort of the inverse of what Ganesh is
>> suggesting, I
>> >> > guess, maybe some more effort involved but it can't be that much. You
>> >> could
>> >> > even use av_always_inline functions and inline assembly to call the
>> >> > relevant instruction and otherwise keep things in C. That's identical
>> to
>> >> > what -ffast-math does but turns on only when specifically calling the
>> new
>> >> > API function name...
>> >>
>> >> So seems like everything wrt this patch is fine, right?
>> >
>> >
>> > Not really. Your patch still does two things, and I don't like the
>> explicit
>> > exp(log(a)*b).
>>
>> Well, both are needed for the speedup. Without the 2.0 check, there is
>> a speed regression. I don't understand why it is "two things" in that
>> case.
>>
>> > What I'm thinking is that you should have a static inline
>> > function, let's call it fast_pow(a, b), which can internally (in the C
>> > version) be implemented as exp+log. Just as you found for pow, we might
>> > find that for exp/log, the system lib is not very optimized and we can do
>> > it faster ourselves by doing whatever -ffast-math is doing for these
>> > functions. Those would be more specifically optimized and that would be
>> > part of the fast_pow implementation. This way, the code in aacenc remains
>> > easy to follow and the optimization is accessible for other parts of
>> ffmpeg
>> > also.
>>
>> Ok, changed locally.
>
>
> Please submit a new patch, this is not a minor cosmetic change.

I definitely will, just was addressing your first point and making it
clear why I did it that way, so that I don't waste effort.

>
> Ronald
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


More information about the ffmpeg-devel mailing list