[FFmpeg-devel] [PATCH] Demuxer for Leitch/Harris' VR native stream format (LXF)

Måns Rullgård mans
Mon Sep 13 21:47:37 CEST 2010


Michael Niedermayer <michaelni at gmx.at> writes:

> On Mon, Sep 13, 2010 at 04:56:40PM +0100, M?ns Rullg?rd wrote:
>> Daniel Verkamp <daniel at drv.nu> writes:
>> 
>> > 2010/9/13 M?ns Rullg?rd <mans at mansr.com>:
>> >> Tomas H?rdin <tomas.hardin at codemill.se> writes:
>> >>
>> >>>> > > > +//returns number of bits set in value
>> >>>> > > > +static int num_set_bits(uint32_t value) {
>> >>>> > > > + ? ?int ret;
>> >>>> > > > +
>> >>>> > > > + ? ?for(ret = 0; value; ret += (value & 1), value >>= 1);
>> >>>> > > > +
>> >>>> > > > + ? ?return ret;
>> >>>> > > > +}
>> >>>> > >
>> >>>> > > if we dont have a population count function yet, than one should be added
>> >>>> > > to some header in libavutil
>> >>>> >
>> >>>> > I couldn't find one. That probably belongs in its own thread though.
>> >>>> >
>> >>>> > Which files would such a function belong in - intmath.h/c, common.h or
>> >>>> > somewhere else? Also, which name would be best: ff_count_bits(),
>> >>>> > av_count_bits() or something else?
>> >>>>
>> >>>> av_popcount()
>> >>>> would be similar to gccs __builtin_popcount()
>> >>>
>> >>> OK. I attached popcount.patch which adds such a function to common.h.
>> >>> Also bumped minor of lavu. The implementation uses a 16-byte LUT and
>> >>> therefore counts four bits at a time. I suspect there are better
>> >>> solutions though. I did verify that it returns exactly the same number
>> >>> the other implementation does for all 2^32 possible input values.
>> >>
>> >> I can't think of a better generic solution off the top of my head.
>> >
>> > There is at least one algorithm to do this without loops or lookup
>> > tables using SWAR tricks, but I haven't benchmarked it:
>> > http://aggregate.org/MAGIC/#Population Count (Ones Count)
>> 
>> That method will be several times slower on any modern hardware.
>
> hardly
> the patch needs 32 operations (i assume its unrolled) 8 of which are
> table lookups which might be less than fast on some hw
> the aggregate.org code needs 15 operations the suggested modification for
> athlons (aka modernb cpus with fast multipler) would reduce that to 12

Did you count the operations needed to construct the constants?  I
think not.

-- 
M?ns Rullg?rd
mans at mansr.com



More information about the ffmpeg-devel mailing list