[FFmpeg-devel] [PATCH] swscale: round on planar2x C code

Sun Sep 12 12:41:58 CEST 2010

On Sun, Sep 12, 2010 at 10:30:18AM +0100, M?ns Rullg?rd wrote:
> Ramiro Polla <ramiro.polla at gmail.com> writes:
> 
> > Hi,
> >
> > The C and MMX2/3dnow code differ in planar2x due to pavgb's rounding.
> > Attached patch makes the output similar.
> >
> > I couldn't measure any speed difference. gcc ends up using "leal
> > 3(xxx)" instead of "leal (xxx)" which doesn't seem to have a speed
> > penalty.
> 
> What it does on x86 is irrelevant since the mmx code will always be
> used in practice.

no, the C code is used for the right border on mmx chips when the
width is not a multiple of 8 and the C code
used for that must round the same way as mmx.

> 
> > Otherwise we could put the MMX2/3dnow code under some if(flags &bitexact).
> >
> > Ramiro Polla
> >
> > Index: rgb2rgb_template.c
> > ===================================================================
> > --- rgb2rgb_template.c	(revision 32166)
> > +++ rgb2rgb_template.c	(working copy)
> > @@ -1820,10 +1820,10 @@ static inline void RENAME(planar2x)(const uint8_t
> >          dst[dstStride]= (  src[0] + 3*src[srcStride])>>2;
> >  
> >          for (x=mmxSize-1; x<srcWidth-1; x++) {
> > -            dst[2*x          +1]= (3*src[x+0] +   src[x+srcStride+1])>>2;
> > -            dst[2*x+dstStride+2]= (  src[x+0] + 3*src[x+srcStride+1])>>2;
> > -            dst[2*x+dstStride+1]= (  src[x+1] + 3*src[x+srcStride  ])>>2;
> > -            dst[2*x          +2]= (3*src[x+1] +   src[x+srcStride  ])>>2;
> > +            dst[2*x          +1]= ((3*src[x+0] +   src[x+srcStride+1])+3)>>2;
> > +            dst[2*x+dstStride+2]= ((  src[x+0] + 3*src[x+srcStride+1])+3)>>2;
> > +            dst[2*x+dstStride+1]= ((  src[x+1] + 3*src[x+srcStride  ])+3)>>2;
> > +            dst[2*x          +2]= ((3*src[x+1] +   src[x+srcStride  ])+3)>>2;
> 
> WTF +3?  Does mmx round like that?  Most other CPUs with a rounding
> average instruction do +4.

then most cpus are crap as thats just adding +1 afterwards with overflows on
odd days, but i guess its not so, you just randomly rant
every time grep x86 matches, and honestly this is tireing.
If you would read what you reply to first and contribute something i
wouldnt mind but please keep your senseless x86 rants for yourself they are
off topic here.

> IMO the C version should be easily
> implemented exactly on the majority of systems, not bow to the quirks
> of intel.

The most widespread cpu architecture our users use cannot be ignored even if
there was a problem but iam not seeing one atm, other SIMD systems are quite
likely to round like mmx so it makes sense to consider to change C to it too.
if you know of some cpu architecture that cant handle that
way of rounding you should tell us so we can consider that in the decisson
of what to do with the c code.

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Everything should be made as simple as possible, but not simpler.
-- Albert Einstein
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100912/92d0184d/attachment.pgp>