[FFmpeg-devel] [PATCH] RV40 Loop Filter

Sun Oct 26 14:41:09 CET 2008

On Sat, Oct 25, 2008 at 11:14:25AM +0200, Michael Niedermayer wrote:
> On Sat, Oct 25, 2008 at 10:08:44AM +0300, Kostya wrote:
> > On Wed, Oct 22, 2008 at 10:53:23AM +0200, Michael Niedermayer wrote:
> > > On Tue, Oct 21, 2008 at 09:23:21AM +0300, Kostya wrote:
> [...]
> > > [...]
> > > > +static int rv40_set_deblock_coef(RV34DecContext *r)
> > > > +{
> > > > +    MpegEncContext *s = &r->s;
> > > > +    int mvmask = 0, i, j, dx, dy;
> > > > +    int midx = s->mb_x * 2 + s->mb_y * 2 * s->b8_stride;
> > > 
> > > > +    if(s->pict_type == FF_I_TYPE)
> > > > +        return 0;
> > > 
> > > why is this even called for i frames?
> > 
> > I intend to use it for calculating macroblock-specific deblock
> > strength in RV30.
> 
> fine but how is that related to having the pict_type check inside the
> function compared to outside?
 
For RV30 setting deblock coefficients would be performed for
I-frames as well.
 
> [...]
> > > > +                if(dx > 3 || dy > 3){
> > > > +                    mvmask |= 0x03 << (i*2 + j*8);
> > > > +                }
> > > > +            }
> > > > +        }
> > > > +        midx += s->b8_stride;
> > > > +    }
> > > 
> > > i think the if() can be moved out of the loop like
> > > if(first_slice_line)
> > >     mvmask &= 123;
> > 
> > IMO it can't.
> > It constructs mask based on motion vectors difference in the
> > horizontal/vertical neighbouring blocks after all. 
> 
> one way (there surely are thousend others)
> 
> get_mask(int delta)
>     for()
>         for()
>             v0= motion_val[x+y*stride]
>             v1= motion_val[x+y*stride+delta]
>             if(FFABS(v0[0]-v1[0])>3 || FFABS(v0[1]-v1[1])>3)
>                 mask |= 1<<(2*x+8*y);
>     return mask
> 
> hmask= get_mask(1     );
> vmask= get_mask(stride);
> if(!mb_x)
>     hmask &= 0x...
> if(first_slice_line)
>     vmask &= 0x...
> mask = hmask | (hmask<<1) | vmask | (vmask<<4);
> 
> besides, the way mask bits are combined looks strange/wrong

Per my understanding it sets edges for 2x2 groups of 4x4 subblocks. 
 
> >  
> > > > +    return mvmask;
> > > > +}
> > > > +
> > > > +static void rv40_loop_filter(RV34DecContext *r)
> > > > +{
> > > > +    MpegEncContext *s = &r->s;
> > > > +    int mb_pos;
> > > > +    int i, j;
> > > > +    uint8_t *Y, *C;
> > > > +    int alpha, beta, betaY, betaC;
> > > > +    int q;
> > > > +    // 0 - cur block, 1 - top, 2 - left, 3 - bottom
> > > > +    int btype[4], clip[4], mvmasks[4], cbps[4], uvcbps[4][2];
> > > > +
> > > 
> > > > +    if(s->pict_type == FF_B_TYPE)
> > > > +        return;
> > > 
> > > why is this even called for b frames?
> > 
> > Because the spec says so :)
> > RV40 has many special cases for B-frame loop filter which
> > I didn't care to implement.
> 
> :/
> i hope it cannot use B frames as reference?

Looks like it does not 
 
> [...]
> > [lots of loop filter invoking] 
> > > 
> > > the word mess is probably the best way to describe this
> > > as far as i can tell you are packing all the bits related to deblocking
> > > and then later duplicate code each with hardcoded masks to extract them
> > > again.
> > 
> > We have a saying here "To make a candy from crap", which I think describes
> > current situation. I'd like to shot the group of men who proposed the loop
> > filter in the form RV40 has it.
> 
> there arent many codecs around that are cleanly designed ...
> Some things here and there are ok but terrible messes like this are more
> common.
> We dont have too much of a choice, to support things the mess has to be
> implemented. If it can be done cleaner/simpler thats a big advantage in the
> long term, easier to maintain, understand, optimize; smaller and faste, ...

Also I think that forcing someone to understand it counts as
a psychological abuse and the sentence on it should be
debugging X8 frames or implementing interlaced mode in VC-1
(sorry, can't remember more evil codecs).
 
> > 
> > The problem is that edges should be filtered in that order with clipping
> > values depending on clipping values selected depending on whether
> > neighbouring block coded is not and if it belongs to the same MB or not.
> > It's possible to all of the into loop, but it will have too many additional
> > conditions to my taste. I've merged some of them though.
> 
> iam not suggesting to build a complex and ugly loop, rather something like
> storing all the numbers that might differ in a 2d array and then
> having a loop go over this.
> the mb edge flags, coded info and all that would be in the array so that
> reading it is a matter of coded[y][x], mb_edge[y][x], mb_type[y][x]
> i think this would be cleaner IMHO

done
 
> Ill review the new patch soon

here it is

> [...]
> -- 
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
-------------- next part --------------
Index: libavcodec/rv40.c
===================================================================

--- libavcodec/rv40.c	(revision 15305)
+++ libavcodec/rv40.c	(working copy)
@@ -247,7 +247,462 @@
     return 0;
 }
 
+#define CLIP_SYMM(a, b) av_clip(a, -(b), b)
 /**
+ * weaker deblocking very similar to the one described in 4.4.2 of JVT-A003r1
+ */
+static inline void rv40_weak_loop_filter(uint8_t *src, const int step,
+                                         const int flag0, const int flag1,
+                                         const int alpha,
+                                         const int lim0, const int lim1,
+                                         const int difflim, const int beta,
+                                         const int S0, const int S1,
+                                         const int S2, const int S3)
+{
+    uint8_t *cm = ff_cropTbl + MAX_NEG_CROP;
+    int t, u, diff;
+
+    t = src[0*step] - src[-1*step];
+    if(!t){
+        return;
+    }
+    u = (alpha * FFABS(t)) >> 7;
+    if(u > 3 - (flag0 && flag1)){
+        return;
+    }
+
+    t <<= 2;
+    if(flag0 && flag1)
+        t += src[-2*step] - src[1*step];
+    diff = CLIP_SYMM((t + 4) >> 3, difflim);
+    src[-1*step] = cm[src[-1*step] + diff];
+    src[ 0*step] = cm[src[ 0*step] - diff];
+    if(FFABS(S2) <= beta && flag0){
+        t = (S0 + S2 - diff) >> 1;
+        src[-2*step] = cm[src[-2*step] - CLIP_SYMM(t, lim1)];
+    }
+    if(FFABS(S3) <= beta && flag1){
+        t = (S1 + S3 + diff) >> 1;
+        src[ 1*step] = cm[src[ 1*step] - CLIP_SYMM(t, lim0)];
+    }
+}
+
+/**
+ * This macro is used for calculating 25*x0+26*x1+26*x2+26*x3+25*x4
+ * or 25*x0+26*x1+51*x2+26*x3
+ * @param  sub - index of the value with coefficient = 25
+ * @param last - index of the value with coefficient 25 or 51
+ */
+#define RV40_STRONG_FILTER(src, step, start, last, sub) \
+     26*(src[start    *step] + src[(start+1)*step]  + src[(start+2)*step] \
+       + src[(start+3)*step] + src[last     *step]) - src[last     *step] \
+       - src[sub      *step]
+
+/**
+ * Deblocking filter, the altered version from JVT-A003r1 H.26L draft.
+ */
+static inline void rv40_adaptive_loop_filter(uint8_t *src, const int step,
+                                             const int stride, const int dmode,
+                                             const int lim0, const int lim1,
+                                             const int alpha,
+                                             const int beta, const int beta2,
+                                             const int chroma, const int edge)
+{
+    int diffs[4][4];
+    int s0 = 0, s1 = 0, s2 = 0, s3 = 0;
+    uint8_t *ptr;
+    int flag0 = 1, flag1 = 1;
+    int strength0 = 3, strength1 = 3;
+    int i;
+    int lims;
+
+    for(i = 0, ptr = src; i < 4; i++, ptr += stride){
+        diffs[i][0] = ptr[-2*step] - ptr[-1*step];
+        diffs[i][1] = ptr[ 1*step] - ptr[ 0*step];
+        s0 += diffs[i][0];
+        s1 += diffs[i][1];
+    }
+    if(FFABS(s0) >= (beta<<2)){
+        strength0 = 1;
+    }
+    if(FFABS(s1) >= (beta<<2)){
+        strength1 = 1;
+    }
+    if(strength0 + strength1 <= 2){
+        return;
+    }
+
+    for(i = 0, ptr = src; i < 4; i++, ptr += stride){
+        diffs[i][2] = ptr[-2*step] - ptr[-3*step];
+        diffs[i][3] = ptr[ 1*step] - ptr[ 2*step];
+        s2 += diffs[i][2];
+        s3 += diffs[i][3];
+    }
+
+    if(!edge)
+        flag0 = flag1 = 0;
+    else{
+        flag0 = (strength0 == 3) && (FFABS(s2) < beta2);
+        flag1 = (strength1 == 3) && (FFABS(s3) < beta2);
+    }
+
+    lims = (lim0 + lim1 + strength0 + strength1) >> 1;
+    if(flag0 && flag1){ /* strong filtering */
+        for(i = 0; i < 4; i++, src += stride){
+            int diff[2], sflag, p0, p1;
+            int t = src[0*step] - src[-1*step];
+
+            if(!t) continue;
+            sflag = (alpha * FFABS(t)) >> 7;
+            if(sflag > 1) continue;
+
+            p0 = (RV40_STRONG_FILTER(src, step, -3, 1, -3) + rv40_dither_l[dmode + i]) >> 7;
+            p1 = (RV40_STRONG_FILTER(src, step, -2, 2, -2) + rv40_dither_r[dmode + i]) >> 7;
+            diff[0] = src[-1*step];
+            diff[1] = src[ 0*step];
+            src[-1*step] = sflag ? av_clip(p0, src[-1*step] - lims, src[-1*step] + lims) : p0;
+            src[ 0*step] = sflag ? av_clip(p1, src[ 0*step] - lims, src[ 0*step] + lims) : p1;
+            diff[0] -= src[-1*step];
+            diff[1] -= src[ 0*step];
+            p0 = (RV40_STRONG_FILTER(src, step, -4, 0, -4) + rv40_dither_l[dmode + i] + diff[1]*25) >> 7;
+            p1 = (RV40_STRONG_FILTER(src, step, -1, 3, -1) + rv40_dither_r[dmode + i] + diff[0]*25) >> 7;
+            src[-2*step] = sflag ? av_clip(p0, src[-2*step] - lims, src[-2*step] + lims) : p0;
+            src[ 1*step] = sflag ? av_clip(p1, src[ 1*step] - lims, src[ 1*step] + lims) : p1;
+            if(!chroma){
+                src[-3*step] = (RV40_STRONG_FILTER(src, step, -4, -3, -1) + 64) >> 7;
+                src[ 2*step] = (RV40_STRONG_FILTER(src, step,  0,  2,  0) + 64) >> 7;
+            }
+        }
+    }else if(strength0 == 3 && strength1 == 3){
+        for(i = 0; i < 4; i++, src += stride)
+            rv40_weak_loop_filter(src, step, 1, 1, alpha, lim0, lim1, lims, beta,
+                                  diffs[i][0], diffs[i][1], diffs[i][2], diffs[i][3]);
+    }else{
+        for(i = 0; i < 4; i++, src += stride)
+            rv40_weak_loop_filter(src, step, strength0==3, strength1==3,
+                                  alpha, lim0>>1, lim1>>1, lims>>1, beta,
+                                  diffs[i][0], diffs[i][1], diffs[i][2], diffs[i][3]);
+    }
+}
+
+static void rv40_v_loop_filter(uint8_t *src, int stride, int dmode, int lim0, int lim1,
+                               int alpha, int beta, int beta2, int chroma, int edge){
+    rv40_adaptive_loop_filter(src, 1, stride, dmode, lim0, lim1, alpha, beta, beta2, chroma, edge);
+}
+static void rv40_h_loop_filter(uint8_t *src, int stride, int dmode, int lim0, int lim1,
+                               int alpha, int beta, int beta2, int chroma, int edge){
+    rv40_adaptive_loop_filter(src, stride, 1, dmode, lim0, lim1, alpha, beta, beta2, chroma, edge);
+}
+
+static int check_mv(int16_t (*motion_val)[2], int step)
+{
+    int d;
+    d = motion_val[0][0] - motion_val[-step][0];
+    if(d < -3 || d > 3)
+        return 1;
+    d = motion_val[0][1] - motion_val[-step][1];
+    if(d < -3 || d > 3)
+        return 1;
+    return 0;        
+}
+
+static int rv40_set_deblock_coef(RV34DecContext *r)
+{
+    MpegEncContext *s = &r->s;
+    int mvmask = 0, i, j, dx, dy;
+    int midx = s->mb_x * 2 + s->mb_y * 2 * s->b8_stride;
+    int16_t (*motion_val)[2] = s->current_picture_ptr->motion_val[0][midx];
+    if(s->pict_type == FF_I_TYPE)
+        return 0;
+    for(j = 0; j < 2; j++){
+        for(i = 0; i < 2; i++){
+            if(i || s->mb_x){
+                if(check_mv(motion_val, 1)){
+                    mvmask |= 0x11 << (i*2 + j*8);
+                }
+            }
+            if(j || !s->first_slice_line){
+                if(check_mv(motion_val, s->b8_stride)){
+                    mvmask |= 0x03 << (i*2 + j*8);
+                }
+            }
+        }
+        motion_val += s->b8_stride;
+    }
+    return mvmask;
+}
+
+/** This structure holds conditions on applying loop filter to some edge */
+typedef struct RV40LoopFilterCond{
+    int x;              ///< x coordinate of edge start
+    int y;              ///< y coordinate of edge start
+    int dir;            ///< edge filtering direction (horizontal or vertical)
+    int filt_mask;      ///< mask specifying what deblock pattern bit should be tested for filtering
+    int edge_mbtype;    ///< edge condition testing - number of neighbouring mbtype or -1
+    int nonedge_mbtype; ///< not at edge condition testing - number of neighbouring mbtype or -1
+    int next_clip_mask; ///< mask specifying bit to test to select neighbour block clip value
+    int dither;         ///< dither parameter for the current loop filtering
+}RV40LoopFilterCond;
+
+#define RV40_LUMA_LOOP_FIRST 13
+static const RV40LoopFilterCond rv40_loop_cond_luma_first_row[RV40_LUMA_LOOP_FIRST] = {
+    {  0,  4, 0, 0x0010, -1, -1, 0x0001,  0 }, // subblock 0
+    {  0,  0, 1, 0x0001, -1,  2, 0x0008,  0 },
+    {  0,  0, 0, 0x0001,  1, -1, 0x1000,  0 },
+    {  0,  0, 1, 0x0001,  2, -1, 0x0008,  0 },
+    {  4,  4, 0, 0x0020, -1, -1, 0x0002,  4 }, // subblocks 1-3
+    {  4,  0, 1, 0x0002, -1, -1, 0x0001,  4 },
+    {  4,  0, 0, 0x0002,  1, -1, 0x2000,  4 },
+    {  8,  4, 0, 0x0040, -1, -1, 0x0004,  8 },
+    {  8,  0, 1, 0x0004, -1, -1, 0x0002,  8 },
+    {  8,  0, 0, 0x0004,  1, -1, 0x4000,  8 },
+    { 12,  4, 0, 0x0080, -1, -1, 0x0008, 12 },
+    { 12,  0, 1, 0x0008, -1, -1, 0x0004, 12 },
+    { 12,  0, 0, 0x0008,  1, -1, 0x8000, 12 }
+};
+
+#define RV40_LUMA_LOOP_NEXT 9
+static const RV40LoopFilterCond rv40_loop_cond_luma_next_rows[RV40_LUMA_LOOP_NEXT] = {
+    {  0,  4, 0, 0x0010, -1, -1, 0x0001, 0 }, // first subblock of the row
+    {  0,  0, 1, 0x0001,  2, -1, 0x0008, 0 },
+    {  0,  0, 1, 0x0001, -1,  2, 0x0008, 0 },
+    {  4,  4, 0, 0x0020, -1, -1, 0x0002, 1 }, // the rest of subblocks
+    {  4,  0, 1, 0x0002, -1, -1, 0x0001, 1 },
+    {  8,  4, 0, 0x0040, -1, -1, 0x0004, 2 },
+    {  8,  0, 1, 0x0004, -1, -1, 0x0002, 2 },
+    { 12,  4, 0, 0x0080, -1, -1, 0x0008, 3 },
+    { 12,  0, 1, 0x0008, -1, -1, 0x0004, 3 }
+};
+
+#define RV40_CHROMA_LOOP 12
+static const RV40LoopFilterCond rv40_loop_cond_chroma[RV40_CHROMA_LOOP] = {
+    { 0, 4, 0, 0x04, -1, -1, 0x01, 0 }, // subblock 0
+    { 0, 0, 1, 0x01, -1,  2, 0x02, 0 },
+    { 0, 0, 0, 0x01,  1, -1, 0x04, 0 },
+    { 0, 0, 1, 0x01,  2, -1, 0x02, 0 },
+    { 4, 4, 0, 0x08, -1, -1, 0x02, 8 }, // subblock 1
+    { 4, 4, 1, 0x02, -1, -1, 0x01, 0 },
+    { 4, 4, 0, 0x02,  1, -1, 0x08, 8 },
+    { 0, 8, 0, 0x10, -1, -1, 0x04, 0 }, // subblock 2
+    { 0, 4, 1, 0x04, -1,  2, 0x08, 8 },
+    { 0, 4, 1, 0x04,  2, -1, 0x08, 8 },
+    { 4, 8, 0, 0x20, -1, -1, 0x08, 8 }, // subblock 3
+    { 4, 4, 1, 0x08, -1, -1, 0x04, 8 },
+};
+
+static void rv40_loop_filter(RV34DecContext *r)
+{
+    MpegEncContext *s = &r->s;
+    int mb_pos;
+    int i, j, k;
+    uint8_t *Y, *C;
+    int alpha, beta, betaY, betaC;
+    int q;
+    // 0 - cur block, 1 - top, 2 - left, 3 - bottom
+    int mbtype[4], clip[4], mvmasks[4], cbp[4], uvcbp[4][2];
+
+    if(s->pict_type == FF_B_TYPE)
+        return;
+
+    for(s->mb_y = 0; s->mb_y < s->mb_height; s->mb_y++){
+        mb_pos = s->mb_y * s->mb_stride;
+        for(s->mb_x = 0; s->mb_x < s->mb_width; s->mb_x++, mb_pos++){
+            int btype = s->current_picture_ptr->mb_type[mb_pos];
+            if(IS_INTRA(btype) || IS_SEPARATE_DC(btype)){
+                r->cbp_luma  [mb_pos] = 0xFFFF;
+            }
+            if(IS_INTRA(btype)){
+                r->cbp_chroma[mb_pos] = 0xFF;
+            }
+        }
+    }
+    for(s->mb_y = 0; s->mb_y < s->mb_height; s->mb_y++){
+        mb_pos = s->mb_y * s->mb_stride;
+        for(s->mb_x = 0; s->mb_x < s->mb_width; s->mb_x++, mb_pos++){
+            int y_h_deblock, y_v_deblock;
+            int c_v_deblock[2], c_h_deblock[2];
+
+            ff_init_block_index(s);
+            ff_update_block_index(s);
+            Y = s->dest[0];
+            q = s->current_picture_ptr->qscale_table[mb_pos];
+            alpha = rv40_alpha_tab[q];
+            beta  = rv40_beta_tab [q];
+            betaY = betaC = beta * 3;
+            if(s->width * s->height <= 0x6300){
+                betaY += beta;
+            }
+
+            mvmasks[0] = r->deblock_coefs[mb_pos];
+            mbtype [0] = s->current_picture_ptr->mb_type[mb_pos];
+            cbp    [0] = r->cbp_luma[mb_pos];
+            uvcbp[0][0] = r->cbp_chroma[mb_pos] & 0xF;
+            uvcbp[0][1] = r->cbp_chroma[mb_pos] >> 4;
+            for(i = 1; i < 4; i++){
+                mvmasks[i] = 0;
+                mbtype [i] = mbtype[0];
+                cbp    [i] = 0;
+                uvcbp[1][0] = uvcbp[1][1] = 0;
+            }
+            if(s->mb_y){
+                mvmasks[1] = r->deblock_coefs[mb_pos - s->mb_stride] & 0xF000;
+                mbtype [1] = s->current_picture_ptr->mb_type[mb_pos - s->mb_stride];
+                cbp    [1] = r->cbp_luma[mb_pos - s->mb_stride] & 0xF000;
+                uvcbp[1][0] =  r->cbp_chroma[mb_pos - s->mb_stride]       & 0xC;
+                uvcbp[1][1] = (r->cbp_chroma[mb_pos - s->mb_stride] >> 4) & 0xC;
+            }
+            if(s->mb_x){
+                mvmasks[2] = r->deblock_coefs[mb_pos - 1] & 0x8888;
+                mbtype [2] = s->current_picture_ptr->mb_type[mb_pos - 1];
+                cbp    [2] = r->cbp_luma[mb_pos - 1] & 0x8888;
+                uvcbp[2][0] =  r->cbp_chroma[mb_pos - 1]       & 0xA;
+                uvcbp[2][1] = (r->cbp_chroma[mb_pos - 1] >> 4) & 0xA;
+            }
+            if(s->mb_y < s->mb_height - 1){
+                mvmasks[3] = r->deblock_coefs[mb_pos + s->mb_stride] & 0x000F;
+                mbtype [3] = s->current_picture_ptr->mb_type[mb_pos + s->mb_stride];
+                cbp    [3] = r->cbp_luma[mb_pos + s->mb_stride] & 0x000F;
+                uvcbp[3][0] =  r->cbp_chroma[mb_pos + s->mb_stride]       & 0x3;
+                uvcbp[3][1] = (r->cbp_chroma[mb_pos + s->mb_stride] >> 4) & 0x3;
+            }
+            for(i = 0; i < 4; i++){
+                mbtype[i] = (IS_INTRA(mbtype[i]) || IS_SEPARATE_DC(mbtype[i])) ? 2 : 1;
+                clip[i] = rv40_filter_clip_tbl[mbtype[i]][q];
+            }
+            y_h_deblock = cbp[0] | ((cbp[0] << 4) & ~0x000F) | (cbp[1] >> 12)
+                        | ((cbp[3] << 20) & ~0x000F) | (cbp[3] << 16)
+                        | mvmasks[0] | (mvmasks[3] << 16);
+            y_v_deblock = ((cbp[0] << 1) & ~0x1111) | (cbp[2] >> 3)
+                        | cbp[0] | (cbp[3] << 16)
+                        | mvmasks[0] | (mvmasks[3] << 16);
+            if(!s->mb_x){
+                y_v_deblock &= ~0x1111;
+            }
+            if(!s->mb_y){
+                y_h_deblock &= ~0x000F;
+            }
+            if(s->mb_y == s->mb_height - 1 || (mbtype[0] == 2 || mbtype[3] == 2)){
+                y_h_deblock &= ~0xF0000;
+            }
+            cbp[0] = cbp[0] | (cbp[3] << 16)
+                   | mvmasks[0] | (mvmasks[3] << 16);
+            for(i = 0; i < 2; i++){
+                c_v_deblock[i] = ((uvcbp[0][i] << 1) & ~0x5) | (uvcbp[2][i] >> 1)
+                               | (uvcbp[3][i] << 4) | uvcbp[0][i];
+                c_h_deblock[i] = (uvcbp[3][i] << 4) | uvcbp[0][i] | (uvcbp[1][i] >> 2)
+                               | (uvcbp[3][i] << 6) | (uvcbp[0][i] << 2);
+                uvcbp[0][i] = (uvcbp[3][i] << 4) | uvcbp[0][i];
+                if(!s->mb_x){
+                    c_v_deblock[i] &= ~0x5;
+                }
+                if(!s->mb_y){
+                    c_h_deblock[i] &= ~0x3;
+                }
+                if(s->mb_y == s->mb_height - 1 || mbtype[0] == 2 || mbtype[3] == 2){
+                    c_h_deblock[i] &= ~0x30;
+                }
+            }
+
+            for(j = 0; j < RV40_LUMA_LOOP_FIRST; j++){
+                RV40LoopFilterCond *loop = rv40_loop_cond_luma_first_row + j;
+                int cond, edgecond = 1, nonedgecond = 1, clip_cur, clip_next;
+                Y = s->dest[0] + loop->x + loop->y * s->linesize;
+                cond = (loop->dir ? y_v_deblock : y_h_deblock) & loop->filt_mask;
+                if(loop->edge_mbtype != -1){
+                    edgecond = (mbtype[0] == 2 || mbtype[loop->edge_mbtype] == 2);
+                }
+                if(loop->nonedge_mbtype != -1){
+                    nonedgecond = !(mbtype[0] == 2 || mbtype[loop->nonedge_mbtype] == 2);
+                }
+                clip_cur = cbp[0] & loop->filt_mask ? clip[0] : 0;
+                if(!loop->x && loop->dir){
+                    clip_next = (cbp[2] | mvmasks[2]) & loop->next_clip_mask ? clip[2] : 0;
+                }else if(!loop->y && !loop->dir){
+                    clip_next = (cbp[1] | mvmasks[1]) & loop->next_clip_mask ? clip[1] : 0;
+                }else{
+                    clip_next = cbp[0] & loop->next_clip_mask ? clip[0] : 0;
+                }
+                if(cond && edgecond && nonedgecond){
+                    if(loop->dir){
+                        rv40_v_loop_filter(Y, s->linesize, loop->dither,
+                                           clip_cur, clip_next,
+                                           alpha, beta, betaY, 0, loop->edge_mbtype != -1);
+                    }else{
+                        rv40_h_loop_filter(Y, s->linesize, loop->dither,
+                                           clip_cur, clip_next,
+                                           alpha, beta, betaY, 0, loop->edge_mbtype != -1);
+                    }
+                }
+            }
+            for(j = 4; j < 12; j++){
+                for(k = 0; k < RV40_LUMA_LOOP_NEXT; k++){
+                    RV40LoopFilterCond *loop = rv40_loop_cond_luma_next_rows + k;
+                    int cond, edgecond = 1, nonedgecond = 1, clip_cur, clip_next;
+                    Y = s->dest[0] + loop->x + (loop->y + j) * s->linesize;
+                    cond = (loop->dir ? y_v_deblock : y_h_deblock) & (loop->filt_mask << j);
+                    if(loop->edge_mbtype != -1){
+                        edgecond = (mbtype[0] == 2 || mbtype[loop->edge_mbtype] == 2);
+                    }
+                    if(loop->nonedge_mbtype != -1){
+                        nonedgecond = !(mbtype[0] == 2 || mbtype[loop->nonedge_mbtype] == 2);
+                    }
+                    clip_cur = cbp[0] & (loop->filt_mask << j) ? clip[0] : 0;
+                    if(!loop->x && loop->dir){
+                        clip_next = (cbp[2] | mvmasks[2]) & (loop->next_clip_mask << j) ? clip[2] : 0;
+                    }else{
+                        clip_next = cbp[0] & (loop->next_clip_mask << j) ? clip[0] : 0;
+                    }
+                    if(cond && edgecond && nonedgecond){
+                        if(loop->dir){
+                            rv40_v_loop_filter(Y, s->linesize, loop->dither + j,
+                                               clip_cur, clip_next,
+                                               alpha, beta, betaY, 0, loop->edge_mbtype != -1);
+                        }else{
+                            rv40_h_loop_filter(Y, s->linesize, loop->dither + j,
+                                               clip_cur, clip_next,
+                                               alpha, beta, betaY, 0, loop->edge_mbtype != -1);
+                        }
+                    }
+                }
+            }
+            for(i = 0; i < 2; i++){
+                for(j = 0; j < RV40_CHROMA_LOOP; j++){
+                    RV40LoopFilterCond *loop = rv40_loop_cond_chroma + j;
+                    int cond, edgecond = 1, nonedgecond = 1, clip_cur, clip_next;
+                    C = s->dest[i+1] + loop->x + loop->y * s->uvlinesize;
+                    cond = (loop->dir ? c_v_deblock[i] : c_h_deblock[i]) & loop->filt_mask;
+                    if(loop->edge_mbtype != -1){
+                        edgecond = (mbtype[0] == 2 || mbtype[loop->edge_mbtype] == 2);
+                    }
+                    if(loop->nonedge_mbtype != -1){
+                        nonedgecond = !(mbtype[0] == 2 || mbtype[loop->edge_mbtype] == 2);
+                    }
+                    clip_cur = uvcbp[0][i] & loop->filt_mask ? clip[0] : 0;
+                    if(!loop->x && loop->dir){
+                        clip_next = uvcbp[2][i] & loop->next_clip_mask ? clip[2] : 0;
+                    }else if(!loop->y && !loop->dir){
+                        clip_next = uvcbp[1][i] & loop->next_clip_mask ? clip[1] : 0;
+                    }else{
+                        clip_next = uvcbp[0][i] & loop->next_clip_mask ? clip[0] : 0;
+                    }
+                    if(cond && edgecond && nonedgecond){
+                        if(loop->dir){
+                            rv40_v_loop_filter(C, s->uvlinesize, loop->dither,
+                                               clip_cur, clip_next,
+                                               alpha, beta, betaC, 1, loop->edge_mbtype != -1);
+                        }else{
+                            rv40_h_loop_filter(C, s->uvlinesize, loop->dither,
+                                               clip_cur, clip_next,
+                                               alpha, beta, betaC, 1, loop->edge_mbtype != -1);
+                        }
+                    }
+                }
+            }
+        }
+    }
+}
+
+/**
  * Initialize decoder.
  */
 static av_cold int rv40_decode_init(AVCodecContext *avctx)
@@ -261,6 +716,8 @@
     r->parse_slice_header = rv40_parse_slice_header;
     r->decode_intra_types = rv40_decode_intra_types;
     r->decode_mb_info     = rv40_decode_mb_info;
+    r->loop_filter        = rv40_loop_filter;
+    r->set_deblock_coef   = rv40_set_deblock_coef;
     r->luma_dc_quant_i = rv40_luma_dc_quant[0];
     r->luma_dc_quant_p = rv40_luma_dc_quant[1];
     return 0;