[FFmpeg-devel] [PATCH] WIP/lavu: add string interpolation helper.

Nicolas George nicolas.george at normalesup.org
Wed Feb 1 19:35:53 CET 2012


Le tridi 13 pluviôse, an CCXX, Clément Bœsch a écrit :
> From: Clément Bœsch <clement.boesch at smartjog.com>
> 
> ---
> Hi,
> 
> Just a PoC to do some string interpolation easily. This can have various
> applications, for instance in segmenter, the -report option or drawtext filter.
> 
> Right now, the hash function is just a simple additive one (feel free to
> propose anything better), with a fixed size hashtable + linked list in case of
> collisions.

If the hash table does not allow deletion, then double hashing is probably a
better solution: try cell H; if it is already used, try (H+H2)%SIZE, the
(H+2*H2)%SIZE, (H+3*H2)%SIZE, etc., with H2 a second hash function. Double
the table size (and re-insert all items) if it becomes more than 3/4 full.

> Please comment,
> 
> PS: the test doesn't do much for collisions, just reduce the hashtable to 2
> entries and you'll get some.
> ---
>  libavutil/Makefile       |    2 +-
>  libavutil/stringinterp.c |  175 ++++++++++++++++++++++++++++++++++++++++++++++
>  libavutil/stringinterp.h |   36 ++++++++++
>  3 files changed, 212 insertions(+), 1 deletions(-)
>  create mode 100644 libavutil/stringinterp.c
>  create mode 100644 libavutil/stringinterp.h
> 
> diff --git a/libavutil/Makefile b/libavutil/Makefile
> index 2cd763c..da3fe7b 100644
> --- a/libavutil/Makefile
> +++ b/libavutil/Makefile
> @@ -80,7 +80,7 @@ OBJS-$(ARCH_X86) += x86/cpu.o
>  
>  
>  TESTPROGS = adler32 aes avstring base64 cpu crc des eval file fifo lfg lls \
> -            md5 opt pca parseutils rational sha tree
> +            md5 opt pca parseutils rational sha tree stringinterp
>  TESTPROGS-$(HAVE_LZO1X_999_COMPRESS) += lzo
>  
>  TOOLS = ffeval
> diff --git a/libavutil/stringinterp.c b/libavutil/stringinterp.c
> new file mode 100644
> index 0000000..dfd0020
> --- /dev/null
> +++ b/libavutil/stringinterp.c
> @@ -0,0 +1,175 @@
> +/*
> + * Copyright (c) 2012 Clément Bœsch
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +/**
> + * @file
> + * string interpolation
> + */
> +
> +#include "avutil.h"
> +#include "avstring.h"
> +#include "stringinterp.h"
> +
> +struct node {
> +    const char *key;
> +    const char *value;
> +    struct node *next;
> +};
> +
> +typedef struct AVStringInter {
> +    struct node *hashtable[128];
> +} AVStringInter;
> +
> +AVStringInter *av_stringinterp_alloc(void)
> +{
> +    AVStringInter *ctx = av_mallocz(sizeof(*ctx));
> +    return ctx;
> +}
> +
> +void av_stringinterp_free(AVStringInter **ctx)
> +{
> +    int i;
> +    AVStringInter *s = *ctx;
> +    for (i = 0; i < FF_ARRAY_ELEMS(s->hashtable); i++) {
> +        struct node *tmp, *node = s->hashtable[i];
> +        while (node) {
> +            tmp = node;
> +            node = node->next;
> +            av_freep(&tmp);
> +        }
> +    }
> +    av_freep(ctx);
> +}
> +
> +static uint32_t get_hash(const char *key, size_t len, int tbl_len)
> +{
> +    size_t i;
> +    uint32_t hash = (uint32_t)len;
> +    for (i = 0; i < len; i++)
> +        hash += key[i];

Some random multiplication would probably yield better results: with that
function "xy" and "yx" will always collide.

> +    return hash % tbl_len;
> +}
> +
> +static struct node *new_node(const char *key, const char *value)
> +{
> +    struct node *node = av_mallocz(sizeof(*node));

Unchecked allocation.

And I really do not like those kind of small allocations: the amount of
overhead is tremendous. Better alloc a bunch of them at a time and consume
them as needed.

(But we do not need the allocation with double hashing.)

> +    node->key   = key;
> +    node->value = value;
> +    return node;
> +}
> +
> +int av_stringinterp_ref(AVStringInter *ctx, const char *key, const char *value)
> +{
> +    int id = get_hash(key, strlen(key), FF_ARRAY_ELEMS(ctx->hashtable));
> +    struct node *node = ctx->hashtable[id];
> +    struct node *insert_after = NULL;

	struct node **insert_at = &ctx->hashtable[id];

> +
> +    while (node) {
> +        if (!strcmp(key, node->key)) {
> +            node->value = value;
> +            return 0;
> +        }
> +        insert_after = node;

	  insert_at = &node->next;

> +        node = node->next;
> +    }
> +    if (insert_after)
> +        insert_after->next = new_node(key, value);
> +    else
> +        ctx->hashtable[id] = new_node(key, value);

      *insert_at = new_node(key, value);

Classic trick, often much simpler when writing linked lists.

> +    return 0;
> +}
> +
> +static const char *get_value(AVStringInter *ctx, const char *key)
> +{
> +    int id;
> +    size_t keylen;
> +    const char *p = strchr(key, ')');

I do not find this part very elegant: I would like it better if the string
-> string mapping API were clearly separated.

OTOH, using (pointer, length) pairs (or (start_pointer, end_pointer)) rather
than NUL-terminated strings is the way to go to avoid unnecessary copies.

> +    struct node *node;
> +
> +    if (!p)
> +        return NULL;
> +    keylen = p - key;
> +    id = get_hash(key, keylen, FF_ARRAY_ELEMS(ctx->hashtable));
> +    for (node = ctx->hashtable[id]; node; node = node->next)
> +        if (!strncmp(key, node->key, keylen))
> +            return node->value;
> +    return NULL;
> +}
> +
> +char *av_stringinterp_interpolate(AVStringInter *ctx, const char *s)

The name is self-redundant.

> +{
> +    char res[256]; // FIXME
> +    char *p = res;
> +    char *pend = p + sizeof(res);
> +
> +    while (*s) {
> +        if (*s == '%' && *(s+1) == '(') {

Do you have a reason to favor %(foo) like... no language I know of, instead
of maybe ${foo} like shell and perl?

> +            const char *value = get_value(ctx, s + 2);
> +            if (value) {
> +                p += snprintf(p, pend-p, "%s", value);

If p goes beyond pend, (size_t)(pend-p) becomes huge. pend - FFMIN(pend, p)
should do the trick.

> +                s  = strchr(s, ')') + 1;

Segfault if the user gives a string without the matching ')'.

> +                continue;
> +            }
> +        }
> +        p += snprintf(p, pend-p, "%c", *s);

Maybe we can dispense with the printf for just a char.

> +        s++;
> +    }
> +    return av_strdup(res);

For this kind of situations, I like to have two runs of exactly the same
code: once computing the required space but not writing anything, and a
second time actually writing in the buffer with exactly the necessary size.

But whatever the solution, I believe we could do with a set of functions
av_buf_printf(buf, fmt, ...).

> +}
> +
> +#ifdef TEST
> +#undef printf
> +static void dump_table(AVStringInter *ctx)
> +{
> +    int i;
> +    printf("Hashtable dump:\n");
> +    for (i = 0; i < FF_ARRAY_ELEMS(ctx->hashtable); i++) {
> +        struct node *node = ctx->hashtable[i];
> +        if (!node)
> +            continue;
> +        printf("  [%02d/%02d]:", i, FF_ARRAY_ELEMS(ctx->hashtable));
> +        while (node) {
> +            printf(" \"%s\"=>\"%s\"", node->key, node->value);
> +            node = node->next;
> +        }
> +        printf("\n");
> +    }
> +}
> +
> +int main(int ac, char **av)
> +{
> +    AVStringInter *si = av_stringinterp_alloc();
> +    char *res;
> +
> +    av_stringinterp_ref(si, "ts1",      "0.4123");
> +    av_stringinterp_ref(si, "ts2",      "54.153");
> +    av_stringinterp_ref(si, "filename", "FooBar");
> +    av_stringinterp_ref(si, "ext",      ".mpg");
> +    av_stringinterp_ref(si, "ext",      ".mp4");
> +
> +    dump_table(si);
> +
> +    res = av_stringinterp_interpolate(si, "%(filename)_%(ts1)-%(ts2)_%(nokey)%(ext)");
> +    printf("%s\n", res);
> +    av_free(res);
> +    av_stringinterp_free(&si);
> +    return 0;
> +}
> +#endif
> diff --git a/libavutil/stringinterp.h b/libavutil/stringinterp.h
> new file mode 100644
> index 0000000..9f75ae9
> --- /dev/null
> +++ b/libavutil/stringinterp.h
> @@ -0,0 +1,36 @@
> +/*
> + * Copyright (c) 2012 Clément Bœsch
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +/**
> + * @file
> + * string interpolation header
> + */
> +
> +#ifndef AVUTIL_STRINGINTERP_H
> +#define AVUTIL_STRINGINTERP_H
> +
> +typedef struct AVStringInter AVStringInter;
> +
> +AVStringInter *av_stringinterp_alloc(void);
> +void av_stringinterp_free(AVStringInter **ctx);
> +int av_stringinterp_ref(AVStringInter *ctx, const char *key, const char *value);
> +char *av_stringinterp_interpolate(AVStringInter *ctx, const char *s);

Some doxy about what it is about would be nice.

> +
> +#endif/* AVUTIL_STRINGINTERP_H */
> -- 
> 1.7.8.3

On the whole, I believe you are trying to design a solution without a
problem, and that is usually not a good sign.

The credo around here is to only add functions to lavu when there is an use
case in the rest of the code. This is not a very efficient solution, and it
tends to force the code in small local optima because no one wants to anneal
(as in "simulated annealing") ffmpeg, but at least this is a consistent
policy and it avoids having useless code.

The other possible policy is the one for glib (which is for Gtk+ the
equivalent of lavu for lavc/lavf): aim for very a very generic API, covering
any foreseeable possible use.

The practical problem I see with your proposal is that it seems to be too
specific, and will not be easily usable for a lot of situations. For
example, let us say we want to use it to process the string for the drawtext
filter: how do we expand %(pts)? raw integer? float with six digits?
rational? hh:mm:ss.mmm? And do we want to have to stringify each possible
variable for each frame, even if the user only wants to add their initials?


May I suggest to redesign the API that way: ?

- A set of av_bsprintf functions: like asprintf, but into a buffer that is
  grown as necessary.

- Instead of a hash table, av_stringinterp_interpolate accepts a function:

  void get_value(opaque_pointer, target_buffer, var_name, var_flags);

  Of course, someone who wants to interpolate from a hash table can just
  write a get_value function that reads the hash table; we can provide a
  stock one.

- Accept to interpolate ${var:flags}, for example ${pts:%M:%S.%3}.


Regards,

-- 
  Nicolas George
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20120201/2444d6b9/attachment.asc>


More information about the ffmpeg-devel mailing list