[FFmpeg-devel] [PATCH] Demuxer for Leitch/Harris' VR native stream format (LXF)

Michael Niedermayer michaelni
Mon Sep 13 22:04:51 CEST 2010


On Mon, Sep 13, 2010 at 08:47:37PM +0100, M?ns Rullg?rd wrote:
> Michael Niedermayer <michaelni at gmx.at> writes:
> 
> > On Mon, Sep 13, 2010 at 04:56:40PM +0100, M?ns Rullg?rd wrote:
> >> Daniel Verkamp <daniel at drv.nu> writes:
> >> 
> >> > 2010/9/13 M?ns Rullg?rd <mans at mansr.com>:
> >> >> Tomas H?rdin <tomas.hardin at codemill.se> writes:
> >> >>
> >> >>>> > > > +//returns number of bits set in value
> >> >>>> > > > +static int num_set_bits(uint32_t value) {
> >> >>>> > > > + ? ?int ret;
> >> >>>> > > > +
> >> >>>> > > > + ? ?for(ret = 0; value; ret += (value & 1), value >>= 1);
> >> >>>> > > > +
> >> >>>> > > > + ? ?return ret;
> >> >>>> > > > +}
> >> >>>> > >
> >> >>>> > > if we dont have a population count function yet, than one should be added
> >> >>>> > > to some header in libavutil
> >> >>>> >
> >> >>>> > I couldn't find one. That probably belongs in its own thread though.
> >> >>>> >
> >> >>>> > Which files would such a function belong in - intmath.h/c, common.h or
> >> >>>> > somewhere else? Also, which name would be best: ff_count_bits(),
> >> >>>> > av_count_bits() or something else?
> >> >>>>
> >> >>>> av_popcount()
> >> >>>> would be similar to gccs __builtin_popcount()
> >> >>>
> >> >>> OK. I attached popcount.patch which adds such a function to common.h.
> >> >>> Also bumped minor of lavu. The implementation uses a 16-byte LUT and
> >> >>> therefore counts four bits at a time. I suspect there are better
> >> >>> solutions though. I did verify that it returns exactly the same number
> >> >>> the other implementation does for all 2^32 possible input values.
> >> >>
> >> >> I can't think of a better generic solution off the top of my head.
> >> >
> >> > There is at least one algorithm to do this without loops or lookup
> >> > tables using SWAR tricks, but I haven't benchmarked it:
> >> > http://aggregate.org/MAGIC/#Population Count (Ones Count)
> >> 
> >> That method will be several times slower on any modern hardware.
> >
> > hardly
> > the patch needs 32 operations (i assume its unrolled) 8 of which are
> > table lookups which might be less than fast on some hw
> > the aggregate.org code needs 15 operations the suggested modification for
> > athlons (aka modernb cpus with fast multipler) would reduce that to 12
> 
> Did you count the operations needed to construct the constants?  I
> think not.

i used simple mathematical counting of operations as you surely see and as you
know constants dont count there. a counting of cpu cycles of your specific
embeded cpu on which constant construction is expensice i could not do,
iam sorry.
but if you doubt that the aggregate algorithm is faster in modern desktop
hardware i can benchmark that, i cannot benchmark it on arm though as i have
none.

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Thouse who are best at talking, realize last or never when they are wrong.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100913/09993cb8/attachment.pgp>



More information about the ffmpeg-devel mailing list