[FFmpeg-devel] [PATCH] Use av_clip_uint8 in swscale.
Frank Barchard
fbarchard
Tue Aug 18 00:46:11 CEST 2009
2009/8/17 M?ns Rullg?rd <mans at mansr.com>
> And you might know it's unbounded.
>
yes. So a general purpose function has to do it the slow way. But this
function is applied within swscaler on YUV data with a know range.
> > If you combined 3 bytes, its 768 values.
>
> "combined"?
Image operations are usually a function of 2 images, or 4 channels, which
puts a limit on how far out of range they can be.
>
>
> > I know if statements are increasingly efficient, and memory less
> efficient,
> > but the original code had 4 to 6 instructions and potentially 2 branches
> > taken per clipped value.
> > av_clip_uint8() can be optimized to a single instruction on most CPU's
>
> Yes, on those with dedicated clip instructions. Others will need
> several instructions to support the full 32-bit range. Even if the
> range is known to be smaller, a table lookup can be slower than a few
> compares and conditional instructions, and it poisons the cache
> needlessly.
Here's a benchmark on my code that is very similar. This version, including
YUV conversion, runs in 2.97ms
static inline uint32 clip(int32 value) {
if (value < 0) return 0u; if (value > 65535) return 255u;
return static_cast<uint32>(value >> 8);}
*This code runs in 2.11ms*
static inline uint32 clip(int32 value) { return
static_cast<uint32>(g_rgb_clip_table[((value) >> 8) +
kClipOverflow]);}
The table is read only, so the cache lines are not dirty, and image
data tends to be coherent and only use a portion of table. The tables
for simple YUV clipping are 832 bytes.
>
> >> > On x86, there is cmov, but in the above code it would take cmp,
> >> > cmov, cmp, cmov to do each value, whereas the table method takes
> >> > one mov instruction.
> >>
> >> You're forgetting the address calculation.
> >
> > movzx eax,cliptbl[eax*4]
>
> Now you're back at the 4GB table. And where did the value of
> "cliptbl" come from? It would have to be loaded from somewhere.
cliptbl is an array. You can index directly off arrays on x86.
>
>
> --
> M?ns Rullg?rd
> mans at mansr.com
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at mplayerhq.hu
> https://lists.mplayerhq.hu/mailman/listinfo/ffmpeg-devel
>
More information about the ffmpeg-devel
mailing list