[FFmpeg-devel] [RFC] Alpha support
Cédric Schieli
cschieli
Sun Feb 1 10:10:57 CET 2009
2009/1/31 Michael Niedermayer <michaelni at gmx.at>
> On Wed, Jan 28, 2009 at 09:10:23PM +0100, C?dric Schieli wrote:
> > Hello,
> >
> > Attached are two versions of the patch :
> > > sws_use_alpha_bloated.patch ensures that no extra code is reached for
> the
> > > non alpha case, at the cost of more bloated code (swscale.o is 9kB
> bigger in
> > > my build)
> > > sws_use_alpha_slower.patch only eliminates extra code when the
> destination
> > > format is not alpha capable, and uses a runtime test for the remaining
> cases
> > >
> >
> > Attached version of sws_use_alpha.patch combines those two cases. The
> > "bloated" case is the default, the "slower" one is selected with
> > --enable-small
> >
> > The cloberring of %ebp in yuv2packed2 has also been addressed.
>
>
> changelog entry missing
I'm not sure what you mean.
[...]
{
> placement is inconsistant and not K&R style (yes i know sws is
> somewhat messy in terms of following any common style, but new code should
> be
> approximately K&R style, doesnt need to follow all fineprint but the {}
> should
> be placed in K&R style)
Yes, I tried to follow the style of the code around the different patch
sites.
also this shouldnt be using c->alpPixBuf not in either case because
> the c-> needs an pointer dereference (unless the compiler is smart but
> experience says the compiler generally is not)
Do you mean I should use a local variable to avoid this dereference ?
> > + A1 = 1<<18;\
> > + A2 = 1<<18;\
> > + for (j=0; j<lumFilterSize; j++)\
> > + {\
> > + A1 += alpSrc[j][i2 ] * lumFilter[j];\
> > + A2 += alpSrc[j][i2+1] * lumFilter[j];\
> > + }\
> > + A1>>=19;\
> > + A2>>=19;\
> > + }\
>
> the int A1,A2 can be moved in the if()
>
No, A1 and A2 are used outside the if, in the main switch
[...]
> > @@ -588,14 +601,22 @@
> > else if (U<0) U=0; \
> > if (V>255) V=255; \
> > else if (V<0) V=0; \
> > + }\
> > + if (alpha && (!CONFIG_SMALL || c->alpPixBuf) && ((A1|A2)&256))\
> > + {\
>
> > + if (A1>255) A1=255; \
> > + else if (A1<0)A1=0; \
> > + if (A2>255) A2=255; \
> > + else if (A2<0)A2=0; \
>
> av_clip_uint8()
>
Ok.
> > @@ -625,6 +653,11 @@
> > if (B>=(256<<22)) B=(256<<22)-1; \
> > else if (B<0)B=0; \
> > }\
> > + if (alpha && (!CONFIG_SMALL || c->alpPixBuf) && (A&0xC0000000))\
> > + {\
> > + if (A>=(256<<22)) A=(256<<22)-1; \
> > + else if (A<0)A=0; \
>
> these 2 can be verically aligned prettier
BTW, that should be av_clip()
> > - case PIX_FMT_RGB32:\
> > - case PIX_FMT_BGR32:\
> > - case PIX_FMT_RGB32_1:\
> > - case PIX_FMT_BGR32_1:\
> > - func(uint32_t)\
> > - ((uint32_t*)dest)[i2+0]= r[Y1] + g[Y1] + b[Y1];\
> > - ((uint32_t*)dest)[i2+1]= r[Y2] + g[Y2] + b[Y2];\
> > - } \
> > + case PIX_FMT_RGBA:\
> > + case PIX_FMT_BGRA:\
> > + if (!CONFIG_SMALL && c->alpPixBuf)\
> > + {\
> > + func(uint32_t,1)\
> > + ((uint32_t*)dest)[i2+0]= r[Y1] + g[Y1] + b[Y1] +
> (A1<<24);\
> > + ((uint32_t*)dest)[i2+1]= r[Y2] + g[Y2] + b[Y2] +
> (A2<<24);\
> > + }\
> > + }else{\
> > + func(uint32_t,CONFIG_SMALL)\
> > + ((uint32_t*)dest)[i2+0]= r[Y1] + g[Y1] + b[Y1] +
> (CONFIG_SMALL ? (A1<<24) : 0);\
> > + ((uint32_t*)dest)[i2+1]= r[Y2] + g[Y2] + b[Y2] +
> (CONFIG_SMALL ? (A2<<24) : 0);\
> > + }\
> > + }\
> > + break;\
> > + case PIX_FMT_ARGB:\
> > + case PIX_FMT_ABGR:\
>
> is it faster the way you wrote it compared to a table that does <<24 vs.
> <<0 ?
> iam asking because the table would lead to simpler and less duplicated code
>
Yes, that would be simpler. I'll try that way.
> > - YSCALE_YUV_2_RGBX_FULL_C(1<<21)
> > - dest[aidx]= 0;
> > + if (!CONFIG_SMALL && c->alpPixBuf){
> > + YSCALE_YUV_2_RGBX_FULL_C(1<<21,1)
> > + dest[aidx]= A>>22;
> > + dest[0]= B>>22;
> > + dest[1]= G>>22;
> > + dest[2]= R>>22;
> > + dest+= step;
> > + }
> > + }else{
> > + YSCALE_YUV_2_RGBX_FULL_C(1<<21,CONFIG_SMALL)
> > + dest[aidx]= CONFIG_SMALL ? A>>22 : 0;
> > dest[0]= B>>22;
> > dest[1]= G>>22;
> > dest[2]= R>>22;
> > dest+= step;
> > }
> > + }
>
> Why dont you write it like
>
> + if (!CONFIG_SMALL && c->alpPixBuf){
> + YSCALE_YUV_2_RGBX_FULL_C(1<<21,1)
> + dest[aidx]= A>>22;
> + dest[0]= B>>22;
> + dest[1]= G>>22;
> + dest[2]= R>>22;
> + dest+= step;
> + }
> + }else{
> + YSCALE_YUV_2_RGBX_FULL_C(1<<21,CONFIG_SMALL ? c->alpPixBuf : 0)
> + dest[aidx]= CONFIG_SMALL ? A>>22 : 0;
> dest[0]= B>>22;
> dest[1]= G>>22;
> dest[2]= R>>22;
> dest+= step;
> }
> + }
>
> and
>
> +#define YSCALE_YUV_2_RGBX_FULL_C(rnd,alpha) \
> + YSCALE_YUV_2_PACKEDX_FULL_C(alpha)\
> Y-= c->yuv2rgb_y_offset;\
> Y*= c->yuv2rgb_y_coeff;\
> Y+= rnd;\
> @@ -625,6 +653,11 @@
> if (B>=(256<<22)) B=(256<<22)-1; \
> else if (B<0)B=0; \
> }\
> + if (alpha)\
>
> ?
>
Indeed, it's simpler and more elegant while totally equivalent.
Regards,
C?dric Schieli
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel at mplayerhq.hu
https://lists.mplayerhq.hu/mailman/listinfo/ffmpeg-devel
More information about the ffmpeg-devel
mailing list