[FFmpeg-devel] [PATCH] avcodec/cbrt_tablegen: avoid pow and speed up cbrt_tableinit

Ronald S. Bultje rsbultje at gmail.com
Thu Nov 26 14:27:27 CET 2015


Hi,

On Wed, Nov 25, 2015 at 10:46 PM, Ganesh Ajjanagadde <gajjanag at mit.edu>
wrote:

> On Wed, Nov 25, 2015 at 10:13 PM, Ronald S. Bultje <rsbultje at gmail.com>
> wrote:
> > Hi,
> >
> > On Wed, Nov 25, 2015 at 8:48 PM, Ganesh Ajjanagadde <gajjanag at mit.edu>
> > wrote:
> >>
> >> On Wed, Nov 25, 2015 at 8:29 PM, Ganesh Ajjanagadde <gajjanag at mit.edu>
> >> wrote:
> >> > On Wed, Nov 25, 2015 at 8:19 PM, Ronald S. Bultje <rsbultje at gmail.com
> >
> >> > wrote:
> >> >> Hi,
> >> >>
> >> >> On Wed, Nov 25, 2015 at 7:36 PM, Ganesh Ajjanagadde <
> gajjanag at mit.edu>
> >> >> wrote:
> >> >>
> >> >>> On Wed, Nov 25, 2015 at 6:49 PM, James Almer <jamrial at gmail.com>
> >> >>> wrote:
> >> >>> > On 11/25/2015 8:32 PM, Ganesh Ajjanagadde wrote:
> >> >>> >> On Wed, Nov 25, 2015 at 6:19 PM, Ronald S. Bultje
> >> >>> >> <rsbultje at gmail.com>
> >> >>> wrote:
> >> >>> >>> Hi,
> >> >>> >>>
> >> >>> >>> On Wed, Nov 25, 2015 at 5:17 PM, Ganesh Ajjanagadde <
> >> >>> gajjanagadde at gmail.com>
> >> >>> >>> wrote:
> >> >>> >>>>
> >> >>> >>>> On systems having cbrt, there is no reason to use the slow pow
> >> >>> function.
> >> >>> >>>>
> >> >>> >>>> Sample benchmark (x86-64, Haswell, GNU/Linux):
> >> >>> >>>> new:
> >> >>> >>>> 5124920 decicycles in cbrt_tableinit,       1 runs,      0
> skips
> >> >>> >>>>
> >> >>> >>>> old:
> >> >>> >>>> 12321680 decicycles in cbrt_tableinit,       1 runs,      0
> skips
> >> >>> >>>>
> >> >>> >>>> Signed-off-by: Ganesh Ajjanagadde <gajjanagadde at gmail.com>
> >> >>> >>>>
> >> >>> >>>>
> >> >>>
> >> >>>
> -------------------------------------------------------------------------------
> >> >>> >>>> What I wonder about is why --enable-hardcoded-tables is not the
> >> >>> default
> >> >>> >>>> for
> >> >>> >>>> FFmpeg. Unless I am missing something, static storage is anyway
> >> >>> allocated
> >> >>> >>>> even
> >> >>> >>>> if hardcoded tables are not used, and the cost is deferred to
> >> >>> >>>> runtime
> >> >>> >>>> instead of
> >> >>> >>>> build time. Thus binary size should not be affected, but users
> >> >>> >>>> burn
> >> >>> cycles
> >> >>> >>>> unnecessarily for every codec having these kinds of tables. I
> >> >>> >>>> have
> >> >>> these
> >> >>> >>>> patches,
> >> >>> >>>> simply because at the moment users are paying a price for the
> >> >>> >>>> typical
> >> >>> >>>> default.
> >> >>> >>>> ---
> >> >>> >>>>  libavcodec/cbrt_tablegen.h | 6 +++---
> >> >>> >>>>  1 file changed, 3 insertions(+), 3 deletions(-)
> >> >>> >>>
> >> >>> >>>
> >> >>> >>> This has been discussed extensively in the past...
> >> >>> >>
> >> >>> >> Can you please give a link and/or timeframe to search for?
> >> >>> >>
> >> >>> >>>
> >> >>> >>> As for the patch, don't forget that tablegen runs on the host
> >> >>> >>> (build),
> >> >>> not
> >> >>> >>> target (runtime), whereas libm.h is for target (runtime) and may
> >> >>> >>> not be
> >> >>> >>> compatible. I believe that's why we don't use libm.h in tablegen
> >> >>> >>> files.
> >> >>> >>
> >> >>> >> I don't understand this, it seems to me like any other code (at
> >> >>> >> least
> >> >>> >> in the default configure), it gets called, and like all other
> such
> >> >>> >> things, we use libavutil/libm for hackery. This host/target
> >> >>> >> business
> >> >>> >> affects other things as well. What is the issue?
> >> >>> >
> >> >>> > libavutil/libm.h uses defines from config.h, which are based on
> the
> >> >>> tests run
> >> >>> > by configure for the target, and not the host where compilation
> >> >>> > takes
> >> >>> place.
> >> >>> > The tablegen applications all run at compile time. What is
> available
> >> >>> > on
> >> >>> the
> >> >>> > target may not be on the host.
> >> >>>
> >> >>> Ok. So I would like an answer to two simple questions that are
> outside
> >> >>> my knowledge or interest.
> >> >>>
> >> >>> Is it possible with some hackery to get this change through, or not?
> >> >>> If so, what is it?
> >> >>
> >> >>
> >> >> You need to understand the issue before you can evaluate hacks.
> >> >>
> >> >> The issue is:
> >> >> - I'm using a linux x86-64 machine using gcc as a compiler, with
> >> >> libc=glibc
> >> >> 2.18 (A);
> >> >> - to build a binary that will run on a Windows Mobile ARMv7 machine,
> >> >> with
> >> >> libC=something-from-Microsoft (B).
> >> >>
> >> >> tablegen runs on A, but ffmpeg.exe runs on B. libavutil/libm.h only
> >> >> works
> >> >> for B. If you want a version of libm.h on A, you need to generate a
> >> >> version
> >> >> of libm.h that works on A. There is no relationship between A and B,
> >> >> and
> >> >> thus there can not possibly ever be any relationship between A's
> libm.h
> >> >> and
> >> >> B's libavutil/libm.h.
> >> >>
> >> >> It's probably possible to generate a version of libm.h for A, but
> >> >> that's
> >> >> not so much a coding issue, as it is an issue of understanding the
> >> >> build
> >> >> system and including detection for stuff on machine A, as opposed to
> >> >> machine B (which is what most of configure does).
> >> >
> >> > Thanks a lot for the detail. So how about using a local
> >> > #ifndef cbrt
> >> > #define cbrt(x) pow(x, 1 / 3.0)
> >> > code...
> >> > #undef cbrt // at the very end of the file
> >> > #endif
> >>
> >> Not that simple, something more like
> >> #ifndef cbrt
> >> #define ff_cbrt(x) pow(x, 1/3.0)
> >> #else
> >> #define ff_cbrt(x) cbrt(x)
> >> code...
> >> #undef ff_cbrt // at the very end of the file
> >> #endif
> >>
> >> - this will resolve a glitch with the above in not undef'ing an
> >> important symbol (all this is of course without including
> >> libavutil/libm.h).
> >
> >
> > I don't think cbrt is a macro? But anyway, I don't think any of this is
> > meaningful, you're basically creating a crappy per-file libm.h for
> tablegen
> > files which will be duplicated all over the place. How about we do this
> > correctly, if we do it at all?
>
> I lack the knowledge for doing this in configure, all this avutil/libm
> and compat business is black magic to me, and worse still I can hardly
> test any of it since I run exclusively a GNU/Linux setup, as evidenced
> by some recent patches. I can put in an effort if someone is willing
> to review and if it is reasonably simple in configure.
>
> >
> > (Don't forget these table inits run once per process, so the cost of
> > increased code messiness and code complexity has a much higher weight
> than
> > it does in normal situations. Runtime isn't that important because it
> only
> > runs once.)
>
> I doubt the patch itself qualifies as complex/messy as is. It is all
> this cross compiling business and associated hackery that is.
>
> An extra 0.1 ms of CPU burning across 1000's of clients across the
> globe is significant, especially when it is trivially avoided, going
> back to the above point.


You're getting in this argument/defensive mode again, don't do that.
Cross-compiling is not a hack. Your patch is a hack. Our code as it is
right now works perfectly fine when cross-compiling, and lots of people use
it like that - including me.

I understand that the configure code is convoluted, and I don't fully
understand it either. However, to correctly solve this problem, you need to
do it right.

Ronald


More information about the ffmpeg-devel mailing list