[FFmpeg-devel] Trans.: a64multienc.c and drawutils.c optimisations
yann.lepetitcorps at free.fr
yann.lepetitcorps at free.fr
Thu Dec 29 00:47:37 CET 2011
> > + for(i=0;i<num;i++)
> > + dst[i] = set16;
>
> If you don't trust the compiler, this variant should make it more
> explicit what you want the end-result to look like:
> int16_t *end = dst + num;
> while (dst < end)
> *dst++ = set16;
> Disadvantage: compiler potentially will not recognize it as a loop
> and thus not do advanced optimizations like auto-vectorization etc.
> Of course depending on the specifics it might make a little to a lot
> more sense to unroll the loop.
Something like this ?
#define LOOP_UNROLL_SIZE 8
int16_t *end = dst + num;
while (num > LOOP_UNROLL_SIZE)
{
dst[0] = set16;
dst[1] = set16;
...
dst[LOOP_UNROLL_SIZE-1] = set16;
dst += LOOP_UNROLL_SIZE;
num -= LOOP_UNROLL_SIZE;
}
}
while ( dst < end)
*dst++ = set16;
(the while (num > LOOP_UNROLL_SIZE) bloc can too use MMX/SSE registers for to
make the copy by blocs of four/height set16 values)
> > +void ff_memset_sized(char *dst, char *src, int num, int stepsize)
> > +{
> > + int i;
> > +
> > + for (i = 0; i < num; i++, dst += stepsize)
> > + memcpy(dst, src, stepsize);
> > +}
>
> Of course there's the question if one single macro (or av_always_inline
> function) with this content wouldn't serve the same purpose as all those
> different functions.
> For non-x86 alignment might be a bit of an issue though (as in, this
> variant doesn't tell the compiler that dst will always be aligned to
> stepsize).
We can perhaps use some #define for to handle problematics platforms ?
@+
Yannoo
More information about the ffmpeg-devel
mailing list