[FFmpeg-devel] [PATCHv4] VP4 video decoder

Peter Ross pross at xvid.org
Thu May 23 16:38:09 EEST 2019


On Tue, May 21, 2019 at 09:30:54PM +0200, Reimar Döffinger wrote:
> On Tue, May 21, 2019 at 05:44:20PM +1000, Peter Ross wrote:
> > +    if (bits < 0x100) {
> > +        skip_bits(gb, 1);
> > +    } else if (bits < 0x180) {
> > +        skip_bits(gb, 2);
> > +        v += 1;
> > +    }
> > +#define body(n) { \
> > +    skip_bits(gb, 2 + n); \
> > +    v += (1 << n) + get_bits(gb, n); }
> > +#define else_if(thresh, n) else if (bits < thresh) body(n)
> 
> Not sure I think the defines are great for readability,
> but if you want to fully encode the logic, you could go for
> e.g.
> #define else_if(n) else if (bits < (0x200 - (0x80 >> n))) body(n)
> Also as to the earlier discussed early bailout for the +257 case:
> it seems sensible values can't really be larger than yuv_macroblock_count
> and I think FFmpeg has defines for maximum frame width/height that
> you could thus use to have a non-arbitrary bailout value?

discussed in other email.

> > +    has_partial = 0;
> > +    bit         = get_bits1(gb);
> > +    current_run = vp4_read_mb_value(gb);
> > +
> > +    for (i = 0; i < s->yuv_macroblock_count; i++) {
> > +        if (!current_run) {
> > +            bit ^= 1;
> > +            current_run = vp4_read_mb_value(gb);
> > +        }
> > +        s->superblock_coding[i] = 2 * bit;
> > +        has_partial |= bit == 0;
> > +        current_run--;
> > +    }
> 
> This code doesn't quite look right to me.
> For both of the vp4_read_mb_value weird stuff
> seems to happen when it returns 0,
i in the first case it directly flips bit
> and reads a new value, which is stupid wasteful
> encoding, in the second case current_run underflows
> on current_run--, which is undefined behaviour
> - or at least weird?

not quite that wasteful. vp4_read_mb_value() always returns >0.
the ==0 test occurs only when the current_run counter is exahusted, and needs to be
reloaded with vp4_read_mb_value().
with your optimisation below the ==0 test is not needed.

> Except for that, isn't that the same as
>    bit         = get_bits1(gb);
>    for (i = 0; i < s->yuv_macroblock_count; ) {
>        current_run = vp4_read_mb_value(gb);
>        if (current_run > s->yuv_macroblock_count - i) -> report error?
>        if (current_run == 0) current_run = s->yuv_macroblock_count - i; // maybe??
>        memset(s->superblock_coding + i, 2*bit, current_run);
>        has_partial |= bit == 0;
>        i += current_run;
>        bit ^= 1;
>    }

this is far nicer. i will squish the i += part into the for loop.

> Hm, admittedly this doesn't really make much
> sense as you can't apply this trick to the has_partial
> case.
> But still maybe the current_run too large and 0
> cases could be clarified at least.

done.

> > +                    int mb_y = 2 * sb_y + (((j >> 1) + j) & 1);
> 
> Is ^ potentially clearer than + here?

yes:  int mb_y = 2 * sb_y + (j >> 1) ^ (j & 1);


> > +                    for (k = 0; k < 4; k++) {
> > +                        if (BLOCK_X >= fragment_width || BLOCK_Y >= fragment_height)
> > +                            continue;
> > +                        fragment = s->fragment_start[plane] + BLOCK_Y * fragment_width + BLOCK_X;
> > +
> > +                        coded = pattern & (1 << (3 - k));
> 
> coded = pattern & (8 >> k);
> maybe?
> 
> > +    if (last_motion < 0)
> > +        v = -v;
> > +    return v;
> 
> I'd probably be partial to using ?: here, but your decision.

agree. leftover from when i had #ifdef trace line everywhere.

> > +                    if (coding_mode == 2) { /* VP4 */
> > +                        motion_x[0] = vp4_get_mv(s, gb, 0, last_gold_motion_x);
> > +                        motion_y[0] = vp4_get_mv(s, gb, 1, last_gold_motion_y);
> > +                        last_gold_motion_x = motion_x[0];
> > +                        last_gold_motion_y = motion_y[0];
> 
> Could write as
> last_gold_motion_x =
> motion_x[0] = vp4_get_mv(s, gb, 0, last_gold_motion_x);
> I think?
> But no strong opinion either.

agree.

> > @@ -1012,8 +1214,8 @@ static int unpack_vlcs(Vp3DecodeContext *s, GetBitContext *gb,
> >              bits_to_get = coeff_get_bits[token];
> >              if (bits_to_get)
> >                  bits_to_get = get_bits(gb, bits_to_get);
> > -            coeff = coeff_tables[token][bits_to_get];
> >
> > +            coeff = coeff_tables[token][bits_to_get];
> 
> cosmetic?

yes.

> > +            eob_run = eob_run_base[token];
> > +            if (eob_run_get_bits[token])
> [...]
> > +            zero_run = zero_run_base[token];
> > +            if (zero_run_get_bits[token])
> 
> If _run_base and _run_get_bits are always used together like this,
> wouldn't it for readability and cache locality be better to
> make it an array of structs so they are next to each other in memory?

readability, yes. performance, maybe as those arrays are only seven bytes each.
this change also touches VP3/Theora. i want to benchmark it.

> > +            vp4_dc_predictor_reset(&dc_pred[j * 6 + i + 7]);
> > +        s->dc_pred_row[sb_x * 4 + i] = dc_pred[25 + i];
> > +        dc_pred[6 * i] = dc_pred[6 * i + 4];
> 
> If there's an easy way to make those constants like 6, 7 and 25
> more obvious that might be a good idea.

i have rewritten it as dc_pred[6][6]. this avoids all those special numbers.

> > +    if (dc_pred[idx - 6].type == type) {
> > +        dc += dc_pred[idx - 6].dc;
> > +        count++;
> > +    }
> > +
> > +    if (dc_pred[idx + 6].type == type) {
> > +        dc += dc_pred[idx + 6].dc;
> > +        count++;
> > +    }
> > +
> > +    if (count != 2 && dc_pred[idx - 1].type == type) {
> > +        dc += dc_pred[idx - 1].dc;
> > +        count++;
> > +    }
> > +
> > +    if (count != 2 && dc_pred[idx + 1].type == type) {
> > +        dc += dc_pred[idx + 1].dc;
> > +        count++;
> > +    }
> 
> Maybe do dc_pred += idx at the start and then
> only dc_pred[-6], dc_pred[6] etc?

we can go one step further and remove idx, by using a pointer to the current block:

VP4Predictor *this_dc_pred = &dc_pred[hy + 1][hx + 1];
...
vp4_dc_pred(s, this_dc_pred, last_dc, dc_block_type, plane);


remiar greatly appreciate yours (and others) time taken to review this stuff.
in a few days i will post the revised patch.

cheers,

-- Peter
(A907 E02F A6E5 0CD2 34CD 20D2 6760 79C5 AC40 DD6B)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20190523/05012ff5/attachment.sig>


More information about the ffmpeg-devel mailing list