[Ffmpeg-devel] lzw compression in tiff encoder (qualification task for GSoC)

Fri Apr 6 18:17:09 CEST 2007

Hi

On Fri, Apr 06, 2007 at 03:28:32PM +0200, Bartlomiej Wolowiec wrote:
> Hi,
> As a supplement to my qualification task for GSoC I implemented LZW 
> compressor. I think that my code is fast, universal and it can be easily used 
> in other encoders. My implementation use hash table with simple hash function 
> (I've used LZW prefix code and xor to calculate new hash value). So, I'm 
> sending two files: one with lzw encoder (lzw.patch) and one with patch to 
> tiffenc.c (tifflzw.patch). 

[...]
> +/** LZW encode state */
> +typedef struct LZWEncodeState {
> +    int clear_code;         ///< Value of clear code
> +    int end_code;           ///< Value of end code
> +    Code *tab;              ///< Hash table
> +    int tabsize;            ///< Number of values in hash table
> +    int bits;               ///< Actual bits code
> +    int bufsize;            ///< Size of output buffer
> +    PutBitContext pb;       ///< Put bit context for output

> +    int maxbits;            ///< Max bits code

isnt this always 12? and does the code support larger values at all?

> +    int maxcode;            ///< Max value of code
> +    int output_bytes;       ///< Number of written bytes
> +    int last_code;          ///< Value of last output code or -1
> +}LZWEncodeState;

i would prefer if this struct would not be vissible to the user so that
we can change it without breaking compatibilitym that is either
malloc() it in ff_lzw_encode_init() or like the libavutil/md5.c code
have a global constant size of the struct

> +
> +void ff_lzw_encode_init(LZWEncodeState * s, uint8_t * outbuf, int outsize);
> +void ff_lzw_encode_end(LZWEncodeState * s);
> +int ff_lzw_encode(LZWEncodeState * s, const uint8_t * inbuf, int insize);
> +int ff_lzw_encode_flush(LZWEncodeState * s);
> +
>  #endif
[...]
> +/**
> + * Hash function adding character
> + * @param head LZW code for prefix 
> + * @param add Character to add
> + * @return New hash value
> + */
> +static inline int hash(const int head, const int add)
> +{
> +    int ret;
> +    ret = (add << LZW_HASH_SHIFT) ^ (head);

ret is useless
head^= add << LZW_HASH_SHIFT;
will do too

> +    if (ret >= LZW_HASH_SIZE)
> +        ret -= LZW_HASH_SIZE;
> +    assert(ret >= 0 && ret < LZW_HASH_SIZE);
> +    return ret;
> +}
> +
> +/**
> + * Hash function calculate next hash value 
> + * @param head Actual hash code
> + * @param offset Offset calculated by hashOffset
> + * @return New hash value
> + */
> +static inline int hashNext(const int head, const int offset)
> +{
> +    return head >= offset ? head - offset : head - offset + LZW_HASH_SIZE;

head -=offset;
if(head<0)
    head+= LZW_HASH_SIZE;
return head;

its longer but it contains fewer operations and its more readable IMHO

[...]
> +/**
> + * Write one code to stream
> + * @param s LZW state
> + * @param c code to write
> + */
> +static inline void writeCode(LZWEncodeState * s, int c)
> +{
> +    assert(0 <= c && c < 1 << s->bits);
> +    put_bits(&s->pb, s->bits, c);
> +}

useless wraper function around put_bits()

[...]
> +
> +/**
> + * Add block to LZW code table
> + * @param s LZW state 
> + * @param c Last character in block
> + * @param hash_prefix LZW code for prefix
> + */
> +static inline void addCode(LZWEncodeState * s, uint8_t c, int hash_prefix)
> +{
> +    int hash_code = hash(hash_prefix, c);
> +    int hash_offset = hashOffset(hash_code);
> +
> +    while (s->tab[hash_code].hash_prefix != -2) {
> +        hash_code = hashNext(hash_code, hash_offset);
> +    }

hmm it seems to me that addCode() is only reached if findCode() failed to
find a match but then findCode() already found the next free spot in the
table so the above seems redundant?

> +
> +    s->tab[hash_code].code = s->tabsize;
> +    s->tab[hash_code].suffix = c;
> +    s->tab[hash_code].hash_prefix = hash_prefix;
> +
> +    s->tabsize++;
> +

> +    checkSpecialCode(s);

is there any case where this is needed here?

> +    if (s->tabsize >= 1 << s->bits)
> +        s->bits++;
> +}
> +
> +/**
> + * Clear LZW code table
> + * @param s LZW state
> + */
> +static inline void clearTable(LZWEncodeState * s)
> +{
> +    int i, h;
> +
> +    s->bits = 9;
> +    for (i = 0; i < LZW_HASH_SIZE; i++) {
> +        s->tab[i].hash_prefix = -2;

please use #defines instead of -1 / -2

> +    }
> +    for (i = 0; i < 256; i++) {
> +        h = hash(0, i);
> +        s->tab[h].code = i;
> +        s->tab[h].suffix = i;
> +        s->tab[h].hash_prefix = -1;
> +    }
> +    s->tabsize = 256;
> +    checkSpecialCode(s);

hmm is there any case here s->tabsize = 258; is false (exluding unsupported
cases where 8 initial bits are wrong to begin with)

> +}

all calls to clearTable, also call writeCode() before so this call could be
added into clearTable()

> +
> +/**
> + * Initialize LZW encoder. Please set s->clear_code, s->end_code and s->maxbits before run. 
> + * @param s LZW state
> + * @param outbuf Output buffer
> + * @param outsize Size of output buffer
> + */
> +void ff_lzw_encode_init(LZWEncodeState * s, uint8_t * outbuf, int outsize)
> +{
> +    init_put_bits(&s->pb, outbuf, outsize);
> +    s->bufsize = outsize;

> +    s->tab = av_malloc(LZW_HASH_SIZE * sizeof(Code));

this can be part of the struct avoiding the malloc/free

[...]
> +/**
> + * LZW main compress function
> + * @param s LZW state
> + * @param inbuf Input buffer
> + * @param insize Size of input buffer
> + * @return Number of bytes written or -1 on error
> + */
> +int ff_lzw_encode(LZWEncodeState * s, const uint8_t * inbuf, int insize)
> +{
> +    int i;
> +    uint8_t c;
> +    int ret;
> +    int code = 0;
> +    int code_prefix = s->last_code;
> +
> +    if (code_prefix == -1) {
> +        writeCode(s, s->clear_code);
> +        clearTable(s);
> +    }
> +
> +    for (i = 0; i < insize; i++) {
> +        c = *inbuf++;
> +        code = findCode(s, c, code_prefix);

uint8_t c= *inbuf++;
int code= ...
would be slightly simpler

> +        if (code >= 0) {
> +            code_prefix = s->tab[code].code;
> +        } else {
> +            writeCode(s, code_prefix);
> +            addCode(s, c, code_prefix);
> +            code_prefix = s->tab[hash(0, c)].code;
> +        }
> +        if (s->tabsize >= s->maxcode - 1) {
> +            writeCode(s, s->clear_code);
> +            clearTable(s);
> +        }
> +    }
> +    s->last_code = code_prefix;

> +    ret = (put_bits_count(&s->pb)) >> 3;

superfluous ()

> +    if (ret < s->bufsize) {
> +        ret = ret - s->output_bytes;

ret -=

> +        s->output_bytes += ret;
> +        return ret;
> +    } else {
> +        return -1;
> +    }
> +}
> +
> +/**
> + * Write end code and flush bitstream
> + * @param s LZW state
> + * @return Number of bytes written or -1 on error
> + */
> +int ff_lzw_encode_flush(LZWEncodeState * s)
> +{
> +    int ret;
> +    if (s->last_code != -1)
> +        writeCode(s, s->last_code);
> +    writeCode(s, s->end_code);
> +    flush_put_bits(&s->pb);
> +    ret = (put_bits_count(&s->pb)) >> 3;
> +
> +    s->last_code = -1;
> +    if (ret < s->bufsize) {
> +        ret = ret - s->output_bytes;
> +        s->output_bytes += ret;
> +        return ret;
> +    } else {
> +        return -1;
> +    }
> +}

the second half of this functions is a duplicate of the end of ff_lzw_encode()

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Why not whip the teacher when the pupil misbehaves? -- Diogenes of Sinope
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070406/d820bede/attachment.pgp>