[FFmpeg-devel] [PATCH] libavutil: add x86 optimized av_popcount

Ronald S. Bultje rsbultje at gmail.com
Wed Feb 25 18:36:10 CET 2015


Hi James,

On Wed, Feb 25, 2015 at 12:25 PM, James Almer <jamrial at gmail.com> wrote:

> On 25/02/15 9:41 AM, Ronald S. Bultje wrote:
> > Hi,
> >
> > On Tue, Feb 24, 2015 at 8:05 PM, James Almer <jamrial at gmail.com> wrote:
> >>
> >> +#if HAVE_FAST_POPCNT
> >> +#if AV_GCC_VERSION_AT_LEAST(4,5)
> >> +#ifndef av_popcount
> >> +    #define av_popcount   __builtin_popcount
> >> +#endif /* av_popcount */
> >> +#if HAVE_FAST_64BIT
> >> +#ifndef av_popcount64
> >> +    #define av_popcount64 __builtin_popcountll
> >> +#endif /* av_popcount64 */
> >> +#endif /* HAVE_FAST_64BIT */
> >> +#endif /* AV_GCC_VERSION_AT_LEAST(4,5) */
> >> +#endif /* HAVE_FAST_POPCNT */
> >>
> >
> > Is this just to get the sse4 popcnt instruction if we compile with
> > -mcpu=sse4? The slightly odd thing is that we're using a built-in, yet
> > configure still does an arch/cpu check. I'd expect the built-in/compiler
> to
> > do that for us based on -mcpu, and we could always unconditionally use
> this
> > (as long as gcc >= 4.5); alternatively, you could use inline asm and then
> > have the configure check (HAVE_FAST_POPCNT). But doing both seems a
> little
> > odd. I have no objection to it, patch is still fine, just odd.
> >
> > Ronald
>
> I purposely made the checks for gcc 4.5 and in configure for cpus that
> support popcnt
> because otherwise __builtin_popcount (at least gcc's) is slower than our
> generic
> av_popcount_c function from lavu/common.h.
> When the CPU supports popcnt the builtin becomes a single inlined
> instruction.
>
> I tried the __asm__ approach, but the code generated by the builtin seemed
> better.


That's interesting, can you show the code you tried?

Ronald


More information about the ffmpeg-devel mailing list