[FFmpeg-devel] [PATCH] Use av_clip_uint8 in swscale.
Michael Niedermayer
michaelni
Sun Aug 16 16:46:33 CEST 2009
On Sun, Aug 16, 2009 at 01:19:39AM +0100, M?ns Rullg?rd wrote:
> Michael Niedermayer <michaelni at gmx.at> writes:
>
> > On Sat, Aug 15, 2009 at 05:53:49PM +0100, M?ns Rullg?rd wrote:
> >> Reimar D?ffinger <Reimar.Doeffinger at gmx.de> writes:
> >>
> >> > On Sat, Aug 15, 2009 at 12:27:49PM -0300, Ramiro Polla wrote:
> >> >> diff --git a/swscale.c b/swscale.c
> >> >> index c513066..340acfc 100644
> >> >> --- a/swscale.c
> >> >> +++ b/swscale.c
> >> >> @@ -688,21 +688,12 @@ static inline void yuv2nv12XinC(const int16_t *lumFilter, const int16_t **lumSrc
> >> >>
> >> >> #define YSCALE_YUV_2_PACKEDX_C(type,alpha) \
> >> >> YSCALE_YUV_2_PACKEDX_NOCLIP_C(type,alpha)\
> >> >> - if ((Y1|Y2|U|V)&256)\
> >> >> - {\
> >> >> - if (Y1>255) Y1=255; \
> >> >> - else if (Y1<0)Y1=0; \
> >> >> - if (Y2>255) Y2=255; \
> >> >> - else if (Y2<0)Y2=0; \
> >> >> - if (U>255) U=255; \
> >> >> - else if (U<0) U=0; \
> >> >> - if (V>255) V=255; \
> >> >> - else if (V<0) V=0; \
> >> >> - }\
> >> >> - if (alpha && ((A1|A2)&256)){\
> >> >> - A1=av_clip_uint8(A1);\
> >> >> - A2=av_clip_uint8(A2);\
> >> >> - }
> >> >> + Y1 = av_clip_uint8(Y1); \
> >> >> + Y2 = av_clip_uint8(Y2); \
> >> >> + U = av_clip_uint8(U ); \
> >> >> + V = av_clip_uint8(V ); \
> >> >> + A1 = av_clip_uint8(A1); \
> >> >> + A2 = av_clip_uint8(A2); \
> >> >
> >> > This
> >> >
> >> >> - if ((u|v)&256){
> >> >> - if (u<0) u=0;
> >> >> - else if (u>255) u=255;
> >> >> - if (v<0) v=0;
> >> >> - else if (v>255) v=255;
> >> >> - }
> >> >> -
> >> >> - uDest[i]= u;
> >> >> - vDest[i]= v;
> >> >> + uDest[i]= av_clip_uint8((chrSrc[i ]+64)>>7);
> >> >> + vDest[i]= av_clip_uint8((chrSrc[i + VOFW]+64)>>7);
> >> >
> >> > And this need to be benchmarked (well, or at least have a look at the
> >> > generated code.
> >> > If clipping is very, very rare the original code might be faster.
> >>
> >> Depends on hardware. On processors with fast clipping instructions,
> >> always clipping is likely to be faster.
> >
> > if they are fast enough, sure, but which cpu would that be?
>
> ARM and AVR32 to name two.
I dont really know ARM & AVR32 asm ...
but i must admit that iam surprised that some cpu has cliping instructions
that match in throughput a simple bitwise or. I guess i should spend
more time with non x86 asm
>
> > besides which compiler would turn the pure C av_clip_uint8 into such
> > instructions ?
>
> We could write an asm version of it.
yes but that brings us back to the issue of cpu specific optimizations
in libavutil headers ...
besides we would need more than a optimized av_clip_uint8() because on
x86 4 or and 1 clip check is faster than 4 cliping checks
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
> ... defining _GNU_SOURCE...
For the love of all that is holy, and some that is not, don't do that.
-- Luca & Mans
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090816/af6d41e1/attachment.pgp>
More information about the ffmpeg-devel
mailing list