[Ffmpeg-devel] a little optim for a SSE version of H263_LOOP_FILTER
Kostya
kostya.shishkov
Wed Nov 15 13:52:33 CET 2006
On Sun, Nov 12, 2006 at 09:15:13PM +0100, skal wrote:
>
> Hi Konstantin and all,
>
> hmm... i don't think so. The minus sign ("-d1") has its importance here.
>
> Btw, it's pretty obvious new values for 'a' and 'd' don't need [0..255] clipping
> since the kernel only has positive coeffs.
> And it's also obvious no update is needed if d1 or d2 are null.
>
> e.g. =>
>
> static void vc1_v_overlap_c(uint8_t* src, int stride, int rnd)
> {
> int i;
> for(i = 0; i < 8; i++) {
> const int a = src[-2*stride];
> const int b = src[-stride];
> const int c = src[0];
> const int d = src[stride];
> const int d1 = ( a-d + 3 + rnd ) >> 3;
> const int d2 = ( a-d + b-c + 4 - rnd ) >> 3;
> if (d1) {
> src[-2*stride] = a-d1;
> src[stride] = d+d1;
> }
> if (d2) {
> src[-stride] = clip_uint8(b-d2);
> src[0] = clip_uint8(c+d2);
> }
> src++;
> }
> }
>
>
>
> bye!
>
> Skal
>
>
> for the record, let's be pragmatic:
>
> void Test_Overlap()
Tested, works. Another case when practice does not equal to the theory.
I'll change and test everything (including mspel_mc) on weekend.
More information about the ffmpeg-devel
mailing list