[Ffmpeg-devel] a little optim for a SSE version of H263_LOOP_FILTER
Kostya
kostya.shishkov
Sun Nov 12 05:54:06 CET 2006
On Fri, Nov 10, 2006 at 11:48:16PM +0100, skal wrote:
> btw, while i have the mike:
>
> seems to me the following replacement functions for
> vc1_v_overlap_c() and vc1_h_overlap_c() in vc1dsp.c:31
> are likely to be faster (and bitwise equivalent of course)
>
> static void vc1_v_overlap_c(uint8_t* src, int stride, int rnd)
> {
> int i;
> for(i = 0; i < 8; i++) {
> const int a = src[-2*stride];
> const int b = src[-stride];
> const int c = src[0];
> const int d = src[stride];
> const int d1 = ( a-d + 3 + rnd ) >> 3;
> const int d2 = ( a-d + b-c + 4 - rnd ) >> 3;
> src[-2*stride] = clip_uint8(a-d1);
> src[-stride] = clip_uint8(b+d2);
> src[0] = clip_uint8(c-d2);
> src[stride] = clip_uint8(d+d1);
> src++;
> }
> }
>
> but i might of course be wrong...
They are almost correct (it should be read 'b-d2' and 'c+d2' instead) - except the rounding:
original:
4-rnd
3+rnd
4-rnd
3+rnd
yours:
-3-rnd
-4-rnd
4+rnd
3+rnd
>
> bye!
>
> Skal
>
>
More information about the ffmpeg-devel
mailing list