[Ffmpeg-devel] lzw compression in tiff encoder (qualification task for GSoC)
Michael Niedermayer
michaelni
Fri Apr 6 18:17:09 CEST 2007
Hi
On Fri, Apr 06, 2007 at 03:28:32PM +0200, Bartlomiej Wolowiec wrote:
> Hi,
> As a supplement to my qualification task for GSoC I implemented LZW
> compressor. I think that my code is fast, universal and it can be easily used
> in other encoders. My implementation use hash table with simple hash function
> (I've used LZW prefix code and xor to calculate new hash value). So, I'm
> sending two files: one with lzw encoder (lzw.patch) and one with patch to
> tiffenc.c (tifflzw.patch).
[...]
> +/** LZW encode state */
> +typedef struct LZWEncodeState {
> + int clear_code; ///< Value of clear code
> + int end_code; ///< Value of end code
> + Code *tab; ///< Hash table
> + int tabsize; ///< Number of values in hash table
> + int bits; ///< Actual bits code
> + int bufsize; ///< Size of output buffer
> + PutBitContext pb; ///< Put bit context for output
> + int maxbits; ///< Max bits code
isnt this always 12? and does the code support larger values at all?
> + int maxcode; ///< Max value of code
> + int output_bytes; ///< Number of written bytes
> + int last_code; ///< Value of last output code or -1
> +}LZWEncodeState;
i would prefer if this struct would not be vissible to the user so that
we can change it without breaking compatibilitym that is either
malloc() it in ff_lzw_encode_init() or like the libavutil/md5.c code
have a global constant size of the struct
> +
> +void ff_lzw_encode_init(LZWEncodeState * s, uint8_t * outbuf, int outsize);
> +void ff_lzw_encode_end(LZWEncodeState * s);
> +int ff_lzw_encode(LZWEncodeState * s, const uint8_t * inbuf, int insize);
> +int ff_lzw_encode_flush(LZWEncodeState * s);
> +
> #endif
[...]
> +/**
> + * Hash function adding character
> + * @param head LZW code for prefix
> + * @param add Character to add
> + * @return New hash value
> + */
> +static inline int hash(const int head, const int add)
> +{
> + int ret;
> + ret = (add << LZW_HASH_SHIFT) ^ (head);
ret is useless
head^= add << LZW_HASH_SHIFT;
will do too
> + if (ret >= LZW_HASH_SIZE)
> + ret -= LZW_HASH_SIZE;
> + assert(ret >= 0 && ret < LZW_HASH_SIZE);
> + return ret;
> +}
> +
> +/**
> + * Hash function calculate next hash value
> + * @param head Actual hash code
> + * @param offset Offset calculated by hashOffset
> + * @return New hash value
> + */
> +static inline int hashNext(const int head, const int offset)
> +{
> + return head >= offset ? head - offset : head - offset + LZW_HASH_SIZE;
head -=offset;
if(head<0)
head+= LZW_HASH_SIZE;
return head;
its longer but it contains fewer operations and its more readable IMHO
[...]
> +/**
> + * Write one code to stream
> + * @param s LZW state
> + * @param c code to write
> + */
> +static inline void writeCode(LZWEncodeState * s, int c)
> +{
> + assert(0 <= c && c < 1 << s->bits);
> + put_bits(&s->pb, s->bits, c);
> +}
useless wraper function around put_bits()
[...]
> +
> +/**
> + * Add block to LZW code table
> + * @param s LZW state
> + * @param c Last character in block
> + * @param hash_prefix LZW code for prefix
> + */
> +static inline void addCode(LZWEncodeState * s, uint8_t c, int hash_prefix)
> +{
> + int hash_code = hash(hash_prefix, c);
> + int hash_offset = hashOffset(hash_code);
> +
> + while (s->tab[hash_code].hash_prefix != -2) {
> + hash_code = hashNext(hash_code, hash_offset);
> + }
hmm it seems to me that addCode() is only reached if findCode() failed to
find a match but then findCode() already found the next free spot in the
table so the above seems redundant?
> +
> + s->tab[hash_code].code = s->tabsize;
> + s->tab[hash_code].suffix = c;
> + s->tab[hash_code].hash_prefix = hash_prefix;
> +
> + s->tabsize++;
> +
> + checkSpecialCode(s);
is there any case where this is needed here?
> + if (s->tabsize >= 1 << s->bits)
> + s->bits++;
> +}
> +
> +/**
> + * Clear LZW code table
> + * @param s LZW state
> + */
> +static inline void clearTable(LZWEncodeState * s)
> +{
> + int i, h;
> +
> + s->bits = 9;
> + for (i = 0; i < LZW_HASH_SIZE; i++) {
> + s->tab[i].hash_prefix = -2;
please use #defines instead of -1 / -2
> + }
> + for (i = 0; i < 256; i++) {
> + h = hash(0, i);
> + s->tab[h].code = i;
> + s->tab[h].suffix = i;
> + s->tab[h].hash_prefix = -1;
> + }
> + s->tabsize = 256;
> + checkSpecialCode(s);
hmm is there any case here s->tabsize = 258; is false (exluding unsupported
cases where 8 initial bits are wrong to begin with)
> +}
all calls to clearTable, also call writeCode() before so this call could be
added into clearTable()
> +
> +/**
> + * Initialize LZW encoder. Please set s->clear_code, s->end_code and s->maxbits before run.
> + * @param s LZW state
> + * @param outbuf Output buffer
> + * @param outsize Size of output buffer
> + */
> +void ff_lzw_encode_init(LZWEncodeState * s, uint8_t * outbuf, int outsize)
> +{
> + init_put_bits(&s->pb, outbuf, outsize);
> + s->bufsize = outsize;
> + s->tab = av_malloc(LZW_HASH_SIZE * sizeof(Code));
this can be part of the struct avoiding the malloc/free
[...]
> +/**
> + * LZW main compress function
> + * @param s LZW state
> + * @param inbuf Input buffer
> + * @param insize Size of input buffer
> + * @return Number of bytes written or -1 on error
> + */
> +int ff_lzw_encode(LZWEncodeState * s, const uint8_t * inbuf, int insize)
> +{
> + int i;
> + uint8_t c;
> + int ret;
> + int code = 0;
> + int code_prefix = s->last_code;
> +
> + if (code_prefix == -1) {
> + writeCode(s, s->clear_code);
> + clearTable(s);
> + }
> +
> + for (i = 0; i < insize; i++) {
> + c = *inbuf++;
> + code = findCode(s, c, code_prefix);
uint8_t c= *inbuf++;
int code= ...
would be slightly simpler
> + if (code >= 0) {
> + code_prefix = s->tab[code].code;
> + } else {
> + writeCode(s, code_prefix);
> + addCode(s, c, code_prefix);
> + code_prefix = s->tab[hash(0, c)].code;
> + }
> + if (s->tabsize >= s->maxcode - 1) {
> + writeCode(s, s->clear_code);
> + clearTable(s);
> + }
> + }
> + s->last_code = code_prefix;
> + ret = (put_bits_count(&s->pb)) >> 3;
superfluous ()
> + if (ret < s->bufsize) {
> + ret = ret - s->output_bytes;
ret -=
> + s->output_bytes += ret;
> + return ret;
> + } else {
> + return -1;
> + }
> +}
> +
> +/**
> + * Write end code and flush bitstream
> + * @param s LZW state
> + * @return Number of bytes written or -1 on error
> + */
> +int ff_lzw_encode_flush(LZWEncodeState * s)
> +{
> + int ret;
> + if (s->last_code != -1)
> + writeCode(s, s->last_code);
> + writeCode(s, s->end_code);
> + flush_put_bits(&s->pb);
> + ret = (put_bits_count(&s->pb)) >> 3;
> +
> + s->last_code = -1;
> + if (ret < s->bufsize) {
> + ret = ret - s->output_bytes;
> + s->output_bytes += ret;
> + return ret;
> + } else {
> + return -1;
> + }
> +}
the second half of this functions is a duplicate of the end of ff_lzw_encode()
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Why not whip the teacher when the pupil misbehaves? -- Diogenes of Sinope
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070406/d820bede/attachment.pgp>
More information about the ffmpeg-devel
mailing list