[Ffmpeg-devel] a little optim for a SSE version of H263_LOOP_FILTER

skal skal65535
Fri Nov 10 23:48:16 CET 2006



  Hi Michael and all,

> Message du 09/11/06 16:59

> > skal wrote:
> > >   Hi everybody, sorry for the delay
> > 
> > Having your mail held for moderation for 2 days surely didn't help ;-)
> 
> subscribe all your email addresses 
> and optionally set all but one to nomail :)

   well, i don't understand what's going on.. this adress is
   suscribed (this i'm sure, considering the traffic:), but i
   can't send.. oh well.

> > >>>>Please send patch, I'll try to benchmark the speed change.
> > >>>>
> > >>>>Note that movq is very slow on P4, so any code that removes
> > >>>>mov(q|dqu|..) provides an interesting speed-up.
> > >>>>
> > >>>>
> > >>>>
> > >>>>> Well, this is just for the fun of it, since the speed-up
> > >>>>> (if any) might not be worth a special version...
> > >>>>
> > >>>>Once I have a patch to play with, I can benchmark it on P4, PM, and K8... :)
> > >>>
> > >>>   sure, attached is the diff (test only!)
> > >>
> > >>Mmm.. I thought it would be slightly more complicated since you said
> > >>"+maybe a little re-org of the loop (mm3 is gone)."
> > >>
> > >>Ok, I've tried your patch.Regression tests pass, however, I have
> > >>trouble testing your patch behond that. I lack a sample with proper
> > >>inloop filter it seems.
> > >>I've tried this sample
> > >>http://samples.mplayerhq.hu/V-codecs/h263/100374.mov and a couple of
> > >>others, whithout any luck...
> > >>Maybe I'm not benchmarking the relevant parts of the inloop filter
> > >>(see attachemnt to see what I was benchmarking)?
> > >>Or could you provide a sample to run my benches?
> > > 
> > > 
> > >    You can try:
> > > 
> > > ftp://ftp.mplayerhq.hu/MPlayer/samples/FLV/flv1.1/sheep_MM_FLV1.1.flv
> > > 
> > >    which is using h263's in-loop deblocking iirc.
> > 
> > 
> > Mmm... maybe I'm very unlucky or dumb (or both), but it doesn't look
> > like there's an inloop filter in this one, as the benchmark code is
> > not getting executed. I've tried to decode with this command:
> > 
> > ./ffmpeg/ffmpeg_g -v 9 -i sheep_MM_FLV1.1.flv -y -f rawvideo /dev/null
> > 
> > which I know works fine as it's the command I used to benchmark cabac
> > code.
> > 
> > the patch bench_h263.diff that I sent earlier has the spot where I put
> > the START/STOP macros...
> 
> without looking at the flv spec i do think that h263-flv doesnt use loop
> filtering

   oh, indeed, the 'deblocking_flag' of h263.c:6268 is ignored!
   No surprise there was no effect in benchs.
 
> 
> but rv20 does i think ...
> you could also just set loop_filter=1 and use any h263 video (it might
> even look better that way :)


   btw, while i have the mike:

   seems to me the following replacement functions for 
   vc1_v_overlap_c() and vc1_h_overlap_c() in vc1dsp.c:31
   are likely to be faster (and bitwise equivalent of course)

static void vc1_v_overlap_c(uint8_t* src, int stride, int rnd)
{
    int i;
    for(i = 0; i < 8; i++) {
        const int a = src[-2*stride];
        const int b = src[-stride];
        const int c = src[0];
        const int d = src[stride];
        const int d1 = ( a-d       + 3 + rnd ) >> 3;
        const int d2 = ( a-d + b-c + 4 - rnd ) >> 3;
        src[-2*stride] = clip_uint8(a-d1);
        src[-stride]   = clip_uint8(b+d2);
        src[0]         = clip_uint8(c-d2);
        src[stride]    = clip_uint8(d+d1);
        src++;
    }
}

static void vc1_h_overlap_c(uint8_t* src, int stride, int rnd)
{
    int i;
    for(i = 0; i < 8; i++) {
        const int a = src[-2];
        const int b = src[-1];
        const int c = src[0];
        const int d = src[1];
        const int d1 = ( a-d       + 3 + rnd ) >> 3;
        const int d2 = ( a-d + b-c + 4 - rnd ) >> 3;
        src[-2] = clip_uint8(a-d1);
        src[-1] = clip_uint8(b-d2);
        src[0]  = clip_uint8(c+d2);
        src[1]  = clip_uint8(d+d1);
        src += stride;
    }
}

   but i might of course be wrong...

   bye!

Skal





More information about the ffmpeg-devel mailing list