[FFmpeg-devel] [RFC] Alpha support

Sun Feb 1 10:10:57 CET 2009

2009/1/31 Michael Niedermayer <michaelni at gmx.at>

> On Wed, Jan 28, 2009 at 09:10:23PM +0100, C?dric Schieli wrote:
> > Hello,
> >
> > Attached are two versions of the patch :
> > > sws_use_alpha_bloated.patch ensures that no extra code is reached for
> the
> > > non alpha case, at the cost of more bloated code (swscale.o is 9kB
> bigger in
> > > my build)
> > > sws_use_alpha_slower.patch only eliminates extra code when the
> destination
> > > format is not alpha capable, and uses a runtime test for the remaining
> cases
> > >
> >
> > Attached version of sws_use_alpha.patch combines those two cases. The
> > "bloated" case is the default, the "slower" one is selected with
> > --enable-small
> >
> > The cloberring of %ebp in yuv2packed2 has also been addressed.
>
>
> changelog entry missing

I'm not sure what you mean.

[...]

{
> placement is inconsistant and not K&R style (yes i know sws is
> somewhat messy in terms of following any common style, but new code should
> be
> approximately K&R style, doesnt need to follow all fineprint but the {}
> should
> be placed in K&R style)

Yes, I tried to follow the style of the code around the different patch
sites.

also this shouldnt be using c->alpPixBuf not in either case because
> the c-> needs an pointer dereference (unless the compiler is smart but
> experience says the compiler generally is not)

Do you mean I should use a local variable to avoid this dereference ?

> > +            A1 = 1<<18;\
> > +            A2 = 1<<18;\
> > +            for (j=0; j<lumFilterSize; j++)\
> > +            {\
> > +                A1 += alpSrc[j][i2  ] * lumFilter[j];\
> > +                A2 += alpSrc[j][i2+1] * lumFilter[j];\
> > +            }\
> > +            A1>>=19;\
> > +            A2>>=19;\
> > +        }\
>
> the int A1,A2 can be moved in the if()
>

No, A1 and A2 are used outside the if, in the main switch

[...]

> > @@ -588,14 +601,22 @@
> >              else if (U<0) U=0;    \
> >              if (V>255)    V=255;  \
> >              else if (V<0) V=0;    \
> > +        }\
> > +        if (alpha && (!CONFIG_SMALL || c->alpPixBuf) && ((A1|A2)&256))\
> > +        {\
>
> > +            if (A1>255)   A1=255; \
> > +            else if (A1<0)A1=0;   \
> > +            if (A2>255)   A2=255; \
> > +            else if (A2<0)A2=0;   \
>
> av_clip_uint8()
>

Ok.

> > @@ -625,6 +653,11 @@
> >              if (B>=(256<<22))   B=(256<<22)-1; \
> >              else if (B<0)B=0;   \
> >          }\
> > +        if (alpha && (!CONFIG_SMALL || c->alpPixBuf) && (A&0xC0000000))\
> > +        {\
> > +            if (A>=(256<<22))   A=(256<<22)-1; \
> > +            else if (A<0)A=0;   \
>
> these 2 can be verically aligned prettier

BTW, that should be av_clip()

> > -    case PIX_FMT_RGB32:\
> > -    case PIX_FMT_BGR32:\
> > -    case PIX_FMT_RGB32_1:\
> > -    case PIX_FMT_BGR32_1:\
> > -        func(uint32_t)\
> > -            ((uint32_t*)dest)[i2+0]= r[Y1] + g[Y1] + b[Y1];\
> > -            ((uint32_t*)dest)[i2+1]= r[Y2] + g[Y2] + b[Y2];\
> > -        }                \
> > +    case PIX_FMT_RGBA:\
> > +    case PIX_FMT_BGRA:\
> > +        if (!CONFIG_SMALL && c->alpPixBuf)\
> > +        {\
> > +            func(uint32_t,1)\
> > +                ((uint32_t*)dest)[i2+0]= r[Y1] + g[Y1] + b[Y1] +
> (A1<<24);\
> > +                ((uint32_t*)dest)[i2+1]= r[Y2] + g[Y2] + b[Y2] +
> (A2<<24);\
> > +            }\
> > +        }else{\
> > +            func(uint32_t,CONFIG_SMALL)\
> > +                ((uint32_t*)dest)[i2+0]= r[Y1] + g[Y1] + b[Y1] +
> (CONFIG_SMALL ? (A1<<24) : 0);\
> > +                ((uint32_t*)dest)[i2+1]= r[Y2] + g[Y2] + b[Y2] +
> (CONFIG_SMALL ? (A2<<24) : 0);\
> > +            }\
> > +        }\
> > +        break;\
> > +    case PIX_FMT_ARGB:\
> > +    case PIX_FMT_ABGR:\
>
> is it faster the way you wrote it compared to a table that does <<24 vs.
> <<0 ?
> iam asking because the table would lead to simpler and less duplicated code
>

Yes, that would be simpler. I'll try that way.

> > -        YSCALE_YUV_2_RGBX_FULL_C(1<<21)
> > -            dest[aidx]= 0;
> > +        if (!CONFIG_SMALL && c->alpPixBuf){
> > +            YSCALE_YUV_2_RGBX_FULL_C(1<<21,1)
> > +                dest[aidx]= A>>22;
> > +                dest[0]= B>>22;
> > +                dest[1]= G>>22;
> > +                dest[2]= R>>22;
> > +                dest+= step;
> > +            }
> > +        }else{
> > +        YSCALE_YUV_2_RGBX_FULL_C(1<<21,CONFIG_SMALL)
> > +            dest[aidx]= CONFIG_SMALL ? A>>22 : 0;
> >              dest[0]= B>>22;
> >              dest[1]= G>>22;
> >              dest[2]= R>>22;
> >              dest+= step;
> >          }
> > +        }
>
> Why dont you write it like
>
> +        if (!CONFIG_SMALL && c->alpPixBuf){
> +            YSCALE_YUV_2_RGBX_FULL_C(1<<21,1)
> +                dest[aidx]= A>>22;
> +                dest[0]= B>>22;
> +                dest[1]= G>>22;
> +                dest[2]= R>>22;
> +                dest+= step;
> +            }
> +        }else{
> +        YSCALE_YUV_2_RGBX_FULL_C(1<<21,CONFIG_SMALL ? c->alpPixBuf : 0)
> +            dest[aidx]= CONFIG_SMALL ? A>>22 : 0;
>              dest[0]= B>>22;
>             dest[1]= G>>22;
>             dest[2]= R>>22;
>              dest+= step;
>         }
> +        }
>
> and
>
> +#define YSCALE_YUV_2_RGBX_FULL_C(rnd,alpha) \
> +    YSCALE_YUV_2_PACKEDX_FULL_C(alpha)\
>          Y-= c->yuv2rgb_y_offset;\
>         Y*= c->yuv2rgb_y_coeff;\
>         Y+= rnd;\
> @@ -625,6 +653,11 @@
>              if (B>=(256<<22))   B=(256<<22)-1; \
>              else if (B<0)B=0;   \
>         }\
> +        if (alpha)\
>
> ?
>

Indeed, it's simpler and more elegant while totally equivalent.

Regards,
C?dric Schieli
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel at mplayerhq.hu
https://lists.mplayerhq.hu/mailman/listinfo/ffmpeg-devel