[Ffmpeg-devel] lzw compression in tiff encoder (qualification task for GSoC)
Michael Niedermayer
michaelni
Sat Apr 7 01:50:20 CEST 2007
Hi
On Fri, Apr 06, 2007 at 11:43:50PM +0200, Bartlomiej Wolowiec wrote:
> Hi
> On Friday 06 April 2007 18:17, Michael Niedermayer wrote:
> > Hi
> >
> > On Fri, Apr 06, 2007 at 03:28:32PM +0200, Bartlomiej Wolowiec wrote:
> > > Hi,
> > > As a supplement to my qualification task for GSoC I implemented LZW
> > > compressor. I think that my code is fast, universal and it can be easily
> > > used in other encoders. My implementation use hash table with simple hash
> > > function (I've used LZW prefix code and xor to calculate new hash value).
> > > So, I'm sending two files: one with lzw encoder (lzw.patch) and one with
> > > patch to tiffenc.c (tifflzw.patch).
> >
> > [...]
> >
> > > +/** LZW encode state */
> > > +typedef struct LZWEncodeState {
> > > + int clear_code; ///< Value of clear code
> > > + int end_code; ///< Value of end code
> > > + Code *tab; ///< Hash table
> > > + int tabsize; ///< Number of values in hash table
> > > + int bits; ///< Actual bits code
> > > + int bufsize; ///< Size of output buffer
> > > + PutBitContext pb; ///< Put bit context for output
> > >
> > > + int maxbits; ///< Max bits code
> >
> > isnt this always 12? and does the code support larger values at all?
>
> Currently is is used only with 12, larger values hardly are used (if there is
> such a need, just the #define should be enlarged). But, I think, that 10 and
> 11 bits codes are used.
ok
[...]
> > [...]
> >
> > > +/**
> > > + * Write one code to stream
> > > + * @param s LZW state
> > > + * @param c code to write
> > > + */
> > > +static inline void writeCode(LZWEncodeState * s, int c)
> > > +{
> > > + assert(0 <= c && c < 1 << s->bits);
> > > + put_bits(&s->pb, s->bits, c);
> > > +}
> >
> > useless wraper function around put_bits()
>
> Now it's just wraper for put_bits, but various formats differently use lzw
> (e.g. gif).
right, i forgot
[...]
> @@ -57,6 +58,7 @@
> uint8_t **buf; ///< actual position in buffer
> uint8_t *buf_start; ///< pointer to first byte in buffer
> int buf_size; ///< buffer size
> + struct LZWEncodeState *lzws; ///< LZW Encode state
///< isnt aligned
[...]
> +/**
> + * Clear LZW code table
> + * @param s LZW state
> + */
> +static inline void clearTable(LZWEncodeState * s)
> +{
> + int i, h;
> +
> + writeCode(s, s->clear_code);
> + s->bits = 9;
> + for (i = 0; i < LZW_HASH_SIZE; i++) {
> + s->tab[i].hash_prefix = LZW_PREFIX_FREE;
> + }
> + for (i = 0; i < 256; i++) {
> + h = hash(0, i);
> + s->tab[h].code = i;
> + s->tab[h].suffix = i;
> + s->tab[h].hash_prefix = LZW_PREFIX_EMPTY;
> + }
> + s->tabsize = 258;
> +}
this shouldnt be inline as its not called often
> +/**
> + * Calculate number of bytes written
> + * @param s LZW encode state
> + * @return Number of bytes written or -1 on error
> + */
> +static inline int writtenBytes(LZWEncodeState *s){
> + int ret = (put_bits_count(&s->pb)) >> 3;
> + if (ret < s->bufsize) {
> + ret -= s->output_bytes;
> + s->output_bytes += ret;
> + return ret;
> + } else {
> + return -1;
> + }
> +}
this one shouldnt be inline as its not called that often for speed to matter
[...]
> +/**
> + * Init LZW encoder (allocate memory)
> + * @param s LZW state
> + */
> +void ff_lzw_encode_open(LZWEncodeState ** s){
> + *s = av_malloc(sizeof(LZWEncodeState));
> + (*s)->tab = av_malloc(LZW_HASH_SIZE * sizeof(Code));
> +}
why _init and _open ? it seems one would be enough
also why use malloc for tab instead of a
Code tab[LZW_HASH_SIZE]
in the struct?
and it would be easier to
return s;
instead of using an pointer to pointer ...
> +
> +/**
> + * End LZW encoder (free memory)
> + * @param s LZW state
> + */
> +void ff_lzw_encode_close(LZWEncodeState ** s)
> +{
> + av_free((*s)->tab);
> + av_freep(s);
> +}
hmm why not LZWEncodeState * s ? it seems theres little gain
from a ** just the =NULL ...
also flush and close can be merged
> +
> +/**
> + * LZW main compress function
> + * @param s LZW state
> + * @param inbuf Input buffer
> + * @param insize Size of input buffer
> + * @return Number of bytes written or -1 on error
> + */
> +int ff_lzw_encode(LZWEncodeState * s, const uint8_t * inbuf, int insize)
> +{
> + int i;
> + int ret;
> + int code_prefix = s->last_code;
> +
> + if (code_prefix == LZW_PREFIX_EMPTY)
> + clearTable(s);
> +
> + for (i = 0; i < insize; i++) {
> + uint8_t c = *inbuf++;
> + int code = findCode(s, c, code_prefix);
> + if (s->tab[code].hash_prefix != LZW_PREFIX_FREE) {
> + code_prefix = s->tab[code].code;
> + } else {
> + writeCode(s, code_prefix);
> + addCode(s, c, code_prefix, code);
> + code_prefix = s->tab[hash(0, c)].code;
> + }
> + if (s->tabsize >= s->maxcode - 1) {
> + clearTable(s);
> + }
> + }
> + s->last_code = code_prefix;
> +
> + return writtenBytes(s);
> +}
it seems there is no check for the output buffer (put_bits() does not check
this automically)
a simple check like 1.5 x insize > outsize then fail at the
top of ff_lzw_encode() should do
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
When you are offended at any man's fault, turn to yourself and study your
own failings. Then you will forget your anger. -- Epictetus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070407/ad3ee1e4/attachment.pgp>
More information about the ffmpeg-devel
mailing list