[FFmpeg-devel] [PATCH] WIP/lavu: add string interpolation helper.
Nicolas George
nicolas.george at normalesup.org
Wed Feb 1 19:35:53 CET 2012
Le tridi 13 pluviôse, an CCXX, Clément Bœsch a écrit :
> From: Clément Bœsch <clement.boesch at smartjog.com>
>
> ---
> Hi,
>
> Just a PoC to do some string interpolation easily. This can have various
> applications, for instance in segmenter, the -report option or drawtext filter.
>
> Right now, the hash function is just a simple additive one (feel free to
> propose anything better), with a fixed size hashtable + linked list in case of
> collisions.
If the hash table does not allow deletion, then double hashing is probably a
better solution: try cell H; if it is already used, try (H+H2)%SIZE, the
(H+2*H2)%SIZE, (H+3*H2)%SIZE, etc., with H2 a second hash function. Double
the table size (and re-insert all items) if it becomes more than 3/4 full.
> Please comment,
>
> PS: the test doesn't do much for collisions, just reduce the hashtable to 2
> entries and you'll get some.
> ---
> libavutil/Makefile | 2 +-
> libavutil/stringinterp.c | 175 ++++++++++++++++++++++++++++++++++++++++++++++
> libavutil/stringinterp.h | 36 ++++++++++
> 3 files changed, 212 insertions(+), 1 deletions(-)
> create mode 100644 libavutil/stringinterp.c
> create mode 100644 libavutil/stringinterp.h
>
> diff --git a/libavutil/Makefile b/libavutil/Makefile
> index 2cd763c..da3fe7b 100644
> --- a/libavutil/Makefile
> +++ b/libavutil/Makefile
> @@ -80,7 +80,7 @@ OBJS-$(ARCH_X86) += x86/cpu.o
>
>
> TESTPROGS = adler32 aes avstring base64 cpu crc des eval file fifo lfg lls \
> - md5 opt pca parseutils rational sha tree
> + md5 opt pca parseutils rational sha tree stringinterp
> TESTPROGS-$(HAVE_LZO1X_999_COMPRESS) += lzo
>
> TOOLS = ffeval
> diff --git a/libavutil/stringinterp.c b/libavutil/stringinterp.c
> new file mode 100644
> index 0000000..dfd0020
> --- /dev/null
> +++ b/libavutil/stringinterp.c
> @@ -0,0 +1,175 @@
> +/*
> + * Copyright (c) 2012 Clément Bœsch
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +/**
> + * @file
> + * string interpolation
> + */
> +
> +#include "avutil.h"
> +#include "avstring.h"
> +#include "stringinterp.h"
> +
> +struct node {
> + const char *key;
> + const char *value;
> + struct node *next;
> +};
> +
> +typedef struct AVStringInter {
> + struct node *hashtable[128];
> +} AVStringInter;
> +
> +AVStringInter *av_stringinterp_alloc(void)
> +{
> + AVStringInter *ctx = av_mallocz(sizeof(*ctx));
> + return ctx;
> +}
> +
> +void av_stringinterp_free(AVStringInter **ctx)
> +{
> + int i;
> + AVStringInter *s = *ctx;
> + for (i = 0; i < FF_ARRAY_ELEMS(s->hashtable); i++) {
> + struct node *tmp, *node = s->hashtable[i];
> + while (node) {
> + tmp = node;
> + node = node->next;
> + av_freep(&tmp);
> + }
> + }
> + av_freep(ctx);
> +}
> +
> +static uint32_t get_hash(const char *key, size_t len, int tbl_len)
> +{
> + size_t i;
> + uint32_t hash = (uint32_t)len;
> + for (i = 0; i < len; i++)
> + hash += key[i];
Some random multiplication would probably yield better results: with that
function "xy" and "yx" will always collide.
> + return hash % tbl_len;
> +}
> +
> +static struct node *new_node(const char *key, const char *value)
> +{
> + struct node *node = av_mallocz(sizeof(*node));
Unchecked allocation.
And I really do not like those kind of small allocations: the amount of
overhead is tremendous. Better alloc a bunch of them at a time and consume
them as needed.
(But we do not need the allocation with double hashing.)
> + node->key = key;
> + node->value = value;
> + return node;
> +}
> +
> +int av_stringinterp_ref(AVStringInter *ctx, const char *key, const char *value)
> +{
> + int id = get_hash(key, strlen(key), FF_ARRAY_ELEMS(ctx->hashtable));
> + struct node *node = ctx->hashtable[id];
> + struct node *insert_after = NULL;
struct node **insert_at = &ctx->hashtable[id];
> +
> + while (node) {
> + if (!strcmp(key, node->key)) {
> + node->value = value;
> + return 0;
> + }
> + insert_after = node;
insert_at = &node->next;
> + node = node->next;
> + }
> + if (insert_after)
> + insert_after->next = new_node(key, value);
> + else
> + ctx->hashtable[id] = new_node(key, value);
*insert_at = new_node(key, value);
Classic trick, often much simpler when writing linked lists.
> + return 0;
> +}
> +
> +static const char *get_value(AVStringInter *ctx, const char *key)
> +{
> + int id;
> + size_t keylen;
> + const char *p = strchr(key, ')');
I do not find this part very elegant: I would like it better if the string
-> string mapping API were clearly separated.
OTOH, using (pointer, length) pairs (or (start_pointer, end_pointer)) rather
than NUL-terminated strings is the way to go to avoid unnecessary copies.
> + struct node *node;
> +
> + if (!p)
> + return NULL;
> + keylen = p - key;
> + id = get_hash(key, keylen, FF_ARRAY_ELEMS(ctx->hashtable));
> + for (node = ctx->hashtable[id]; node; node = node->next)
> + if (!strncmp(key, node->key, keylen))
> + return node->value;
> + return NULL;
> +}
> +
> +char *av_stringinterp_interpolate(AVStringInter *ctx, const char *s)
The name is self-redundant.
> +{
> + char res[256]; // FIXME
> + char *p = res;
> + char *pend = p + sizeof(res);
> +
> + while (*s) {
> + if (*s == '%' && *(s+1) == '(') {
Do you have a reason to favor %(foo) like... no language I know of, instead
of maybe ${foo} like shell and perl?
> + const char *value = get_value(ctx, s + 2);
> + if (value) {
> + p += snprintf(p, pend-p, "%s", value);
If p goes beyond pend, (size_t)(pend-p) becomes huge. pend - FFMIN(pend, p)
should do the trick.
> + s = strchr(s, ')') + 1;
Segfault if the user gives a string without the matching ')'.
> + continue;
> + }
> + }
> + p += snprintf(p, pend-p, "%c", *s);
Maybe we can dispense with the printf for just a char.
> + s++;
> + }
> + return av_strdup(res);
For this kind of situations, I like to have two runs of exactly the same
code: once computing the required space but not writing anything, and a
second time actually writing in the buffer with exactly the necessary size.
But whatever the solution, I believe we could do with a set of functions
av_buf_printf(buf, fmt, ...).
> +}
> +
> +#ifdef TEST
> +#undef printf
> +static void dump_table(AVStringInter *ctx)
> +{
> + int i;
> + printf("Hashtable dump:\n");
> + for (i = 0; i < FF_ARRAY_ELEMS(ctx->hashtable); i++) {
> + struct node *node = ctx->hashtable[i];
> + if (!node)
> + continue;
> + printf(" [%02d/%02d]:", i, FF_ARRAY_ELEMS(ctx->hashtable));
> + while (node) {
> + printf(" \"%s\"=>\"%s\"", node->key, node->value);
> + node = node->next;
> + }
> + printf("\n");
> + }
> +}
> +
> +int main(int ac, char **av)
> +{
> + AVStringInter *si = av_stringinterp_alloc();
> + char *res;
> +
> + av_stringinterp_ref(si, "ts1", "0.4123");
> + av_stringinterp_ref(si, "ts2", "54.153");
> + av_stringinterp_ref(si, "filename", "FooBar");
> + av_stringinterp_ref(si, "ext", ".mpg");
> + av_stringinterp_ref(si, "ext", ".mp4");
> +
> + dump_table(si);
> +
> + res = av_stringinterp_interpolate(si, "%(filename)_%(ts1)-%(ts2)_%(nokey)%(ext)");
> + printf("%s\n", res);
> + av_free(res);
> + av_stringinterp_free(&si);
> + return 0;
> +}
> +#endif
> diff --git a/libavutil/stringinterp.h b/libavutil/stringinterp.h
> new file mode 100644
> index 0000000..9f75ae9
> --- /dev/null
> +++ b/libavutil/stringinterp.h
> @@ -0,0 +1,36 @@
> +/*
> + * Copyright (c) 2012 Clément Bœsch
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +/**
> + * @file
> + * string interpolation header
> + */
> +
> +#ifndef AVUTIL_STRINGINTERP_H
> +#define AVUTIL_STRINGINTERP_H
> +
> +typedef struct AVStringInter AVStringInter;
> +
> +AVStringInter *av_stringinterp_alloc(void);
> +void av_stringinterp_free(AVStringInter **ctx);
> +int av_stringinterp_ref(AVStringInter *ctx, const char *key, const char *value);
> +char *av_stringinterp_interpolate(AVStringInter *ctx, const char *s);
Some doxy about what it is about would be nice.
> +
> +#endif/* AVUTIL_STRINGINTERP_H */
> --
> 1.7.8.3
On the whole, I believe you are trying to design a solution without a
problem, and that is usually not a good sign.
The credo around here is to only add functions to lavu when there is an use
case in the rest of the code. This is not a very efficient solution, and it
tends to force the code in small local optima because no one wants to anneal
(as in "simulated annealing") ffmpeg, but at least this is a consistent
policy and it avoids having useless code.
The other possible policy is the one for glib (which is for Gtk+ the
equivalent of lavu for lavc/lavf): aim for very a very generic API, covering
any foreseeable possible use.
The practical problem I see with your proposal is that it seems to be too
specific, and will not be easily usable for a lot of situations. For
example, let us say we want to use it to process the string for the drawtext
filter: how do we expand %(pts)? raw integer? float with six digits?
rational? hh:mm:ss.mmm? And do we want to have to stringify each possible
variable for each frame, even if the user only wants to add their initials?
May I suggest to redesign the API that way: ?
- A set of av_bsprintf functions: like asprintf, but into a buffer that is
grown as necessary.
- Instead of a hash table, av_stringinterp_interpolate accepts a function:
void get_value(opaque_pointer, target_buffer, var_name, var_flags);
Of course, someone who wants to interpolate from a hash table can just
write a get_value function that reads the hash table; we can provide a
stock one.
- Accept to interpolate ${var:flags}, for example ${pts:%M:%S.%3}.
Regards,
--
Nicolas George
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20120201/2444d6b9/attachment.asc>
More information about the ffmpeg-devel
mailing list