[FFmpeg-devel] [PATCH] Use av_clip_uint8 in swscale.

Mon Aug 17 20:41:56 CEST 2009

Frank Barchard <fbarchard at google.com> writes:

> 2009/8/17 M?ns Rullg?rd <mans at mansr.com>
>
>> Frank Barchard <fbarchard at google.com> writes:
>>
>> > The table method works well on all platforms... better than if statements
>> > anyway.
>>
>> Depends on the range of inputs.  If you want to allow the full 32-bit
>> range, well...  Even a smaller range could put significant pressure on
>> the cache.
>
> In practice you know the range of values.

And you might know it's unbounded.

> If you combined 3 bytes, its 768 values.

"combined"?

> I know if statements are increasingly efficient, and memory less efficient,
> but the original code had 4 to 6 instructions and potentially 2 branches
> taken per clipped value.
> av_clip_uint8() can be optimized to a single instruction on most CPU's

Yes, on those with dedicated clip instructions.  Others will need
several instructions to support the full 32-bit range.  Even if the
range is known to be smaller, a table lookup can be slower than a few
compares and conditional instructions, and it poisons the cache
needlessly.

>> > On x86, there is cmov, but in the above code it would take cmp,
>> > cmov, cmp, cmov to do each value, whereas the table method takes
>> > one mov instruction.
>>
>> You're forgetting the address calculation.
>
> movzx eax,cliptbl[eax*4]

Now you're back at the 4GB table.  And where did the value of
"cliptbl" come from?  It would have to be loaded from somewhere.

-- 
M?ns Rullg?rd
mans at mansr.com