[FFmpeg-devel] [RFC] function to check for valid UTF-8 string

Reimar Döffinger Reimar.Doeffinger
Sun Dec 9 11:18:32 CET 2007


Hello,
since Rich seems to have given up on it, here is a proposed patch
that adds a av_check_utf8 function that could be used to validate
input strings.
Since it hacked it up very quickly please forgive any bugs or other
stupidity.

Greetings,
Reimar D?ffinger
-------------- next part --------------
Index: libavutil/string.c
===================================================================
--- libavutil/string.c	(revision 11160)
+++ libavutil/string.c	(working copy)
@@ -23,8 +23,23 @@
 #include <stdio.h>
 #include <string.h>
 #include <ctype.h>
+#include <inttypes.h>
+#include "common.h"
 #include "avstring.h"
 
+static const uint32_t utf8_minvals[5] = {0, 1 << 7, 1 << 13, 1 << 20, 1 << 27};
+
+int av_check_utf8(const char *str) {
+    while (*str) {
+        const char *last = str;
+        uint32_t v;
+        GET_UTF8(v, *str++, return 0;)
+        if (str - last > 4) return 0;
+        if (v < utf8_minvals[str - last]) return 0;
+    }
+    return 1;
+}
+
 int av_strstart(const char *str, const char *pfx, const char **ptr)
 {
     while (*pfx && *pfx == *str) {
Index: libavutil/avstring.h
===================================================================
--- libavutil/avstring.h	(revision 11160)
+++ libavutil/avstring.h	(working copy)
@@ -24,6 +24,11 @@
 #include <stddef.h>
 
 /**
+ * Return non-zero if str is a valid UTF-8 string.
+ */
+int av_check_utf8(const char *str);
+
+/**
  * Return non-zero if pfx is a prefix of str. If it is, *ptr is set to
  * the address of the first character in str after the prefix.
  *



More information about the ffmpeg-devel mailing list