[FFmpeg-devel] Trans.: a64multienc.c and drawutils.c optimisations

Mon Jan 2 05:32:46 CET 2012

On Thu, Dec 29, 2011 at 12:47:37AM +0100, yann.lepetitcorps at free.fr wrote:
> > > +    for(i=0;i<num;i++)
> > > +        dst[i] = set16;
> >
> > If you don't trust the compiler, this variant should make it more
> > explicit what you want the end-result to look like:
> > int16_t *end = dst + num;
> > while (dst < end)
> >   *dst++ = set16;
> > Disadvantage: compiler potentially will not recognize it as a loop
> > and thus not do advanced optimizations like auto-vectorization etc.
> > Of course depending on the specifics it might make a little to a lot
> > more sense to unroll the loop.
> 
> Something like this ?
> 
>     #define LOOP_UNROLL_SIZE 8
> 
>     int16_t *end = dst + num;
> 
>     while (num > LOOP_UNROLL_SIZE)
>     {
>             dst[0] = set16;
>             dst[1] = set16;
>             ...
>             dst[LOOP_UNROLL_SIZE-1] = set16;
>             dst += LOOP_UNROLL_SIZE;
>             num -= LOOP_UNROLL_SIZE;
>         }
>    }
> 
>    while ( dst < end)
>         *dst++ = set16;
> 
> 
> (the while (num > LOOP_UNROLL_SIZE) bloc can too use MMX/SSE registers for to
> make the copy by blocs of four/height set16 values)

I think if a function matters speedwise then it should be written in
asm. If it doesnt then it should be written to be easy to understand
by humans.

Ive not recently seen a test of the auto vectorization of current
compilers but the last ive heared of did not give me any wish to be
dependant on it ...

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

DNS cache poisoning attacks, popular search engine, Google internet authority
dont be evil, please
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20120102/b14b6f61/attachment.asc>