[FFmpeg-devel] [PATCH]Lagarith decoder.

Sat Oct 24 02:38:49 CEST 2009

On Thu, Oct 15, 2009 at 08:42:54PM -0600, Nathan Caldwell wrote:
> On Tue, Oct 13, 2009 at 3:28 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
> > On Tue, Sep 22, 2009 at 11:54:03PM -0600, Nathan Caldwell wrote:
> >> Here are the latest Lagarith patches. The only changes to the
> >> rangecoder from the previous patch are cosmetic (K&R formatting). The
> >> decoder includes Loren's floating point emulation. Thanks to everyone,
> >> and especially Loren and Reimar, for all their help.
> >
> >> +    int prob[258];              /*!< Table of cumulative probability for each symbol. */
> >
> > this is filled with uint32_t giving nice negative probs
> 
> Fixed.
> 
> >> +/* compute the 52bit mantissa of 1/(double)denom */
> >> +static uint64_t softfloat_reciprocal(uint32_t denom)
> >
> > comment not doxygen compatible
> > and this really needs some explanation of why it is used instead of
> > 1.0/denom in the doxy
> 
> Updated. It would be nice if Loren could comment on these descriptions.
> 
> >> +static uint32_t lag_decode_prob(GetBitContext *gb)
> >> +{
> >> +    static const uint8_t series[] = { 1, 2, 3, 5, 8, 13, 21, 34 };
> >
> > where is series[7] accessed?
> 
> It isn't. The table was copied from reference. Removed.
> 
> >> +    if (bits <= 0)
> >> +        return 0;
> >> +    if (bits > 31) {
> >> +        /* This is most likely an error, just use the first 32 bits */
> >> +        bits = 31;
> >> +    }
> >
> > are these not error conditions that warrent a error message and failure?
> 
> In the case of bits==0, no. Any other case (bits < 0 or bits >31),
> yes. This should be fixed now.
> 

> >> +
> >> +    value  = get_bits_long(gb, bits);
> >> +    value |= 1 << bits;
> >> +
> >> +    return value - 1;
> >> +}
> >> +
> >> +static int lag_read_prob_header(lag_rac *rac, GetBitContext *gb)
> >> +{
> >> +    int i, j;
> >> +    int prob = 0;
> >> +    int scale_factor = 0;
> >> +    unsigned cumul_prob = 0;
> >> +    unsigned cumulative_target = 1;
> >> +    unsigned scaled_cumul_prob = 0;
> >> +
> >> +    rac->prob[0] = 0;
> >> +    rac->prob[257] = UINT_MAX;
> >> +    /* Read probabilities from bitstream */
> >> +    for (i = 1; i < 257; i++) {
> >> +        rac->prob[i] = lag_decode_prob(gb);
> >
> >> +        cumul_prob += rac->prob[i];
> >
> > can overflow
> 
> In this case the reference decoder would also overflow. Would you
> prefer to match the reference, or not overflow?

I dont mind either way as long as its no exploitable, cant crash and doesnt
end in an infinite loop.

> 
> >> +        if (!rac->prob[i]) {
> >
> >> +            prob = lag_decode_prob(gb);
> >> +            if (i + prob >= 258)
> >> +                prob = 257 - i;
> >
> > integer overflow, segfault below
> 
> Fixed.
> 
> > and isnt this check supposed to be an error condition instead of silent
> > cliping?
> 
> In a perfect world, yes. Again, just matching the behavior of the
> reference decoder.
> 

[...]

> > also getting rid of *step  and doing +=step might be faster
> > and then variables like zeros could be tried to be moved onto the
> > stack
> 
> This is what I originally had, however it introduces too many
> divisions in the calculation of the zero runs.

i is used without *step only for a comparission against width, width
surely can be multiplied by step to avoid a division there.
Am i missing something?

[...]

> +// decodes a byte
> +static inline int lag_get_rac(lag_rac *l)

// is not picked up by doxygen yet i think ...

> +{
> +    unsigned range_scaled, low_scaled, div;
> +    int val;
> +    uint8_t shift;
> +
> +    lag_rac_refill(l);
> +
> +    range_scaled = l->range >> l->scale;
> +
> +    if (l->low >= range_scaled * l->prob[255]) {
> +        val = 255;
> +        l->range -= range_scaled * l->prob[255];
> +    } else {
> +        /* val = 0 is frequent enough to deserve a shortcut */
> +        if (l->low < range_scaled * l->prob[1]) {
> +            val = 0;

isnt 0 more common than 255? and should thus be checked before on a
shorter code path?

> +        } else {
> +            /* FIXME __builtin_clz is ~20% faster here, but not allowed in generic code. */

> +            shift = 30 - av_log2(range_scaled);

table[ l->range>>24 ]
could be used and would need just 128 entries or so

> +            div = ((range_scaled << shift) + (1 << 23) - 1) >> 23;
> +            /* low>>24 ensures that any cases too big for exact FASTDIV are
> +             * under- rather than over-estimated
> +             */
> +            low_scaled = FASTDIV(l->low - (l->low >> 24), div);
> +            shift -= 23 + l->hash_shift;

23 + l->hash_shift can be factored out of the function as hash_shift is a
"constant"

> +            shift &= 31;
> +            low_scaled = (low_scaled << shift) | (low_scaled >> (32 - shift));
> +            /* low_scaled is now a lower bound of low/range_scaled */
> +            val = l->range_hash[(uint8_t) low_scaled];

> +            while (l->low >= range_scaled * l->prob[val + 1])
> +                val++;

is there some (small) bound on the number of iterations of this?
if so it could be replaced by a if() or 2 ...

[...]
> +/**
> + * Decode a frame.
> + * @param avctx codec context
> + * @param data output AVFrame
> + * @param data_size size of output data or 0 if no picture is returned
> + * @param avpkt input packet
> + * @return number of consumed bytes on success or negative if decode fails
> + */
> +static int lag_decode_frame(AVCodecContext *avctx,
> +                            void *data, int *data_size, AVPacket *avpkt)
> +{
> +    const uint8_t *buf = avpkt->data;
> +    int buf_size = avpkt->size;
> +    LagarithContext *l = avctx->priv_data;
> +    AVFrame *const p = &l->picture;
> +    uint8_t frametype = 0;
> +    uint32_t offset_gu = 0, offset_bv = 0, offset_ry = 9;
> +
> +    AVFrame *picture = data;
> +

> +    if (!l->avctx)
> +        l->avctx = avctx;

that can be done in lag_decode_init()

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Rewriting code that is poorly written but fully understood is good.
Rewriting code that one doesnt understand is a sign that one is less smart
then the original author, trying to rewrite it will not make it better.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20091024/5829cdb2/attachment.pgp>