[Ffmpeg-devel] lzw compression in tiff encoder (qualification task for GSoC)

Sat Apr 7 01:50:20 CEST 2007

Hi

On Fri, Apr 06, 2007 at 11:43:50PM +0200, Bartlomiej Wolowiec wrote:
> Hi
> On Friday 06 April 2007 18:17, Michael Niedermayer wrote:
> > Hi
> >
> > On Fri, Apr 06, 2007 at 03:28:32PM +0200, Bartlomiej Wolowiec wrote:
> > > Hi,
> > > As a supplement to my qualification task for GSoC I implemented LZW
> > > compressor. I think that my code is fast, universal and it can be easily
> > > used in other encoders. My implementation use hash table with simple hash
> > > function (I've used LZW prefix code and xor to calculate new hash value).
> > > So, I'm sending two files: one with lzw encoder (lzw.patch) and one with
> > > patch to tiffenc.c (tifflzw.patch).
> >
> > [...]
> >
> > > +/** LZW encode state */
> > > +typedef struct LZWEncodeState {
> > > +    int clear_code;         ///< Value of clear code
> > > +    int end_code;           ///< Value of end code
> > > +    Code *tab;              ///< Hash table
> > > +    int tabsize;            ///< Number of values in hash table
> > > +    int bits;               ///< Actual bits code
> > > +    int bufsize;            ///< Size of output buffer
> > > +    PutBitContext pb;       ///< Put bit context for output
> > >
> > > +    int maxbits;            ///< Max bits code
> >
> > isnt this always 12? and does the code support larger values at all?
> 
> Currently is is used only with 12, larger values hardly are used (if there is 
> such a need, just the #define should be enlarged). But, I think, that 10 and 
> 11 bits codes are used.

ok

[...]
> > [...]
> >
> > > +/**
> > > + * Write one code to stream
> > > + * @param s LZW state
> > > + * @param c code to write
> > > + */
> > > +static inline void writeCode(LZWEncodeState * s, int c)
> > > +{
> > > +    assert(0 <= c && c < 1 << s->bits);
> > > +    put_bits(&s->pb, s->bits, c);
> > > +}
> >
> > useless wraper function around put_bits()
> 
> Now it's just wraper for put_bits, but various formats differently use lzw 
> (e.g. gif).

right, i forgot

[...]
> @@ -57,6 +58,7 @@
>      uint8_t **buf;                      ///< actual position in buffer
>      uint8_t *buf_start;                 ///< pointer to first byte in buffer
>      int buf_size;                       ///< buffer size
> +    struct LZWEncodeState *lzws;               ///< LZW Encode state

///< isnt aligned

[...]
> +/**
> + * Clear LZW code table
> + * @param s LZW state
> + */
> +static inline void clearTable(LZWEncodeState * s)
> +{
> +    int i, h;
> +
> +    writeCode(s, s->clear_code);
> +    s->bits = 9;
> +    for (i = 0; i < LZW_HASH_SIZE; i++) {
> +        s->tab[i].hash_prefix = LZW_PREFIX_FREE;
> +    }
> +    for (i = 0; i < 256; i++) {
> +        h = hash(0, i);
> +        s->tab[h].code = i;
> +        s->tab[h].suffix = i;
> +        s->tab[h].hash_prefix = LZW_PREFIX_EMPTY;
> +    }
> +    s->tabsize = 258;
> +}

this shouldnt be inline as its not called often

> +/**
> + * Calculate number of bytes written
> + * @param s LZW encode state
> + * @return Number of bytes written or -1 on error
> + */
> +static inline int writtenBytes(LZWEncodeState *s){
> +    int ret = (put_bits_count(&s->pb)) >> 3;
> +    if (ret < s->bufsize) {
> +        ret -= s->output_bytes;
> +        s->output_bytes += ret;
> +        return ret;
> +    } else {
> +        return -1;
> +    }
> +}

this one shouldnt be inline as its not called that often for speed to matter

[...]
> +/**
> + * Init LZW encoder (allocate memory)
> + * @param s LZW state
> + */
> +void ff_lzw_encode_open(LZWEncodeState ** s){
> +    *s = av_malloc(sizeof(LZWEncodeState));
> +    (*s)->tab = av_malloc(LZW_HASH_SIZE * sizeof(Code));
> +}

why _init and _open ? it seems one would be enough
also why use malloc for tab instead of a 
Code tab[LZW_HASH_SIZE]
in the struct?

and it would be easier to 
return s;
instead of using an pointer to pointer ...

> +
> +/**
> + * End LZW encoder (free memory)
> + * @param s LZW state
> + */
> +void ff_lzw_encode_close(LZWEncodeState ** s)
> +{
> +    av_free((*s)->tab);
> +    av_freep(s);
> +}

hmm why not LZWEncodeState * s ? it seems theres little gain
from a ** just the =NULL ...

also flush and close can be merged

> +
> +/**
> + * LZW main compress function
> + * @param s LZW state
> + * @param inbuf Input buffer
> + * @param insize Size of input buffer
> + * @return Number of bytes written or -1 on error
> + */
> +int ff_lzw_encode(LZWEncodeState * s, const uint8_t * inbuf, int insize)
> +{
> +    int i;
> +    int ret;
> +    int code_prefix = s->last_code;
> +
> +    if (code_prefix == LZW_PREFIX_EMPTY)
> +        clearTable(s);
> +
> +    for (i = 0; i < insize; i++) {
> +        uint8_t c = *inbuf++;
> +        int code = findCode(s, c, code_prefix);
> +        if (s->tab[code].hash_prefix != LZW_PREFIX_FREE) {
> +            code_prefix = s->tab[code].code;
> +        } else {
> +            writeCode(s, code_prefix);
> +            addCode(s, c, code_prefix, code);
> +            code_prefix = s->tab[hash(0, c)].code;
> +        }
> +        if (s->tabsize >= s->maxcode - 1) {
> +            clearTable(s);
> +        }
> +    }
> +    s->last_code = code_prefix;
> +
> +    return writtenBytes(s);
> +}

it seems there is no check for the output buffer (put_bits() does not check
this automically)
a simple check like 1.5 x insize > outsize then fail at the
top of ff_lzw_encode() should do

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

When you are offended at any man's fault, turn to yourself and study your
own failings. Then you will forget your anger. -- Epictetus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070407/ad3ee1e4/attachment.pgp>