[MPlayer-dev-eng] [PATCH] SwScaler YV12 to BGR32 having zeroed alpha

Mon May 30 00:31:14 CEST 2005

Hi

On Sunday 29 May 2005 23:57, Jason Tackaberry wrote:
> On Sun, 2005-05-29 at 23:02 +0200, Michael Niedermayer wrote:
> > if it would be clean, wont cause any slowdown and is preferably optional
> > then yes,
>
> Well there are a couple extra additions and shifts in C case, and a
> pcmpeqb instead of pxor in the MMX case, so it will naturally be slower,
> but I'd be shocked if it was measurable. :)
>
> I can't imagine a case where alpha being initialized to 0xFF would break
> anything.  Is there any existing code anywhere that relies on it being
> 0x00?  If there is, then I suppose it would have to be optional, and
> disabled by default.  But it would complicate the code quite a bit over
> what I'm attaching to this email.  Well, if that's the only way it would
> get merged, then I'll implement it that way.  But I think setting alpha
> the alpha to fully opaque is a more sane default.
>
> At any rate, the attached patch makes the alpha bytes 0xFF in SwScaler
> for unscaled and scaled conversions to BGR32 or RGB32 in both C and MMX
> cases.  All cases have been tested and work.  The only thing I'm not
> sure about in terms of cleanliness is the unscaled C case (yuv2rgb.c).
> The same macros are used for both 32bpp and 16bpp cases (DST1(i) and
> DST2(i)).  I'm adding (255 << 24) to the values.  This isn't really any
> problem since for the 16bpp case it will just overflow and not affect
> the value.  But perhaps a better solution would be to add another
> argument to the macro that's used for the 32bpp case.  I think it's a
> matter of taste, but let me know.
>
[...]

> Anyway, please review the patch and let me know of any necessary
> changes.

> -                       ((uint32_t*)dest)[i2+0]= r[Y1] + g[Y1] + b[Y1];\
> -                       ((uint32_t*)dest)[i2+1]= r[Y2] + g[Y2] + b[Y2];\
> +                       ((uint32_t*)dest)[i2+0]= r[Y1] + g[Y1] + b[Y1] + 
(255 << 24);\
> +                       ((uint32_t*)dest)[i2+1]= r[Y2] + g[Y2] + b[Y2] + 
(255 << 24);\

why not add the 255 into one of the r,g,b arrays, this would avoid the 
slowdown
(or add 85<<24 to all 3)


[...]
-- 
Michael