[FFmpeg-devel] [PATCH] Use av_clip_uint8 in swscale.
Måns Rullgård
mans
Sun Aug 16 17:18:02 CEST 2009
Michael Niedermayer <michaelni at gmx.at> writes:
> On Sun, Aug 16, 2009 at 01:19:39AM +0100, M?ns Rullg?rd wrote:
>> Michael Niedermayer <michaelni at gmx.at> writes:
>>
>> > On Sat, Aug 15, 2009 at 05:53:49PM +0100, M?ns Rullg?rd wrote:
>> >> Reimar D?ffinger <Reimar.Doeffinger at gmx.de> writes:
>> >>
>> >> > On Sat, Aug 15, 2009 at 12:27:49PM -0300, Ramiro Polla wrote:
>> >> >> diff --git a/swscale.c b/swscale.c
>> >> >> index c513066..340acfc 100644
>> >> >> --- a/swscale.c
>> >> >> +++ b/swscale.c
>> >> >> - if ((u|v)&256){
>> >> >> - if (u<0) u=0;
>> >> >> - else if (u>255) u=255;
>> >> >> - if (v<0) v=0;
>> >> >> - else if (v>255) v=255;
>> >> >> - }
>> >> >> -
>> >> >> - uDest[i]= u;
>> >> >> - vDest[i]= v;
>> >> >> + uDest[i]= av_clip_uint8((chrSrc[i ]+64)>>7);
>> >> >> + vDest[i]= av_clip_uint8((chrSrc[i + VOFW]+64)>>7);
>> >> >
>> >> > And this need to be benchmarked (well, or at least have a look at the
>> >> > generated code.
>> >> > If clipping is very, very rare the original code might be faster.
>> >>
>> >> Depends on hardware. On processors with fast clipping instructions,
>> >> always clipping is likely to be faster.
>> >
>> > if they are fast enough, sure, but which cpu would that be?
>>
>> ARM and AVR32 to name two.
>
> I dont really know ARM & AVR32 asm ...
> but i must admit that iam surprised that some cpu has cliping instructions
> that match in throughput a simple bitwise or. I guess i should spend
> more time with non x86 asm
ARM can shift and saturate in one cycle. On AVR32 shift+sat has one
issue cycle and two cycles latency. On either architecture, two of
those is definitely faster than some bitwise logic and a conditional
branch.
>> > besides which compiler would turn the pure C av_clip_uint8 into such
>> > instructions ?
>>
>> We could write an asm version of it.
>
> yes but that brings us back to the issue of cpu specific optimizations
> in libavutil headers ...
... which we need to find an acceptable solution to.
> besides we would need more than a optimized av_clip_uint8() because on
> x86 4 or and 1 clip check is faster than 4 cliping checks
Shocking.
--
M?ns Rullg?rd
mans at mansr.com
More information about the ffmpeg-devel
mailing list