[Ffmpeg-devel] a little optim for a SSE version of H263_LOOP_FILTER

Kostya kostya.shishkov
Wed Nov 15 13:52:33 CET 2006


On Sun, Nov 12, 2006 at 09:15:13PM +0100, skal wrote:
> 
>    Hi Konstantin and all,
> 
>    hmm... i don't think so. The minus sign ("-d1") has its importance here.
> 
>    Btw, it's pretty obvious new values for 'a' and 'd' don't need [0..255] clipping
>    since the kernel only has positive coeffs.
>    And it's also obvious no update is needed if d1 or d2 are null.
> 
> e.g. =>
> 
> static void vc1_v_overlap_c(uint8_t* src, int stride, int rnd)
> {
>     int i;
>     for(i = 0; i < 8; i++) {
>         const int a = src[-2*stride];
>         const int b = src[-stride];
>         const int c = src[0];
>         const int d = src[stride];
>         const int d1 = ( a-d       + 3 + rnd ) >> 3;
>         const int d2 = ( a-d + b-c + 4 - rnd ) >> 3;
>         if (d1) {
>           src[-2*stride] = a-d1;
>           src[stride]    = d+d1;
>         }
>         if (d2) {
>           src[-stride]   = clip_uint8(b-d2);
>           src[0]         = clip_uint8(c+d2);
>         }
>         src++;
>     }
> }
> 
> 
> 
>    bye!
> 
> Skal
> 
> 
> for the record, let's be pragmatic:
> 
> void Test_Overlap()

Tested, works. Another case when practice does not equal to the theory.
I'll change and test everything (including mspel_mc) on weekend.




More information about the ffmpeg-devel mailing list