[FFmpeg-devel] Trans.: a64multienc.c and drawutils.c optimisations
Michael Niedermayer
michaelni at gmx.at
Mon Jan 2 05:32:46 CET 2012
On Thu, Dec 29, 2011 at 12:47:37AM +0100, yann.lepetitcorps at free.fr wrote:
> > > + for(i=0;i<num;i++)
> > > + dst[i] = set16;
> >
> > If you don't trust the compiler, this variant should make it more
> > explicit what you want the end-result to look like:
> > int16_t *end = dst + num;
> > while (dst < end)
> > *dst++ = set16;
> > Disadvantage: compiler potentially will not recognize it as a loop
> > and thus not do advanced optimizations like auto-vectorization etc.
> > Of course depending on the specifics it might make a little to a lot
> > more sense to unroll the loop.
>
> Something like this ?
>
> #define LOOP_UNROLL_SIZE 8
>
> int16_t *end = dst + num;
>
> while (num > LOOP_UNROLL_SIZE)
> {
> dst[0] = set16;
> dst[1] = set16;
> ...
> dst[LOOP_UNROLL_SIZE-1] = set16;
> dst += LOOP_UNROLL_SIZE;
> num -= LOOP_UNROLL_SIZE;
> }
> }
>
> while ( dst < end)
> *dst++ = set16;
>
>
> (the while (num > LOOP_UNROLL_SIZE) bloc can too use MMX/SSE registers for to
> make the copy by blocs of four/height set16 values)
I think if a function matters speedwise then it should be written in
asm. If it doesnt then it should be written to be easy to understand
by humans.
Ive not recently seen a test of the auto vectorization of current
compilers but the last ive heared of did not give me any wish to be
dependant on it ...
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
DNS cache poisoning attacks, popular search engine, Google internet authority
dont be evil, please
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20120102/b14b6f61/attachment.asc>
More information about the ffmpeg-devel
mailing list