[FFmpeg-devel] [PATCH v4 1/4] avutil/{avstring, bprint}: add XML escaping from ffprobe to avutil

Jan Ekström jeebjp at gmail.com
Mon Jan 25 13:44:59 EET 2021


On Mon, Jan 25, 2021 at 11:58 AM Nicolas George <george at nsup.org> wrote:
>
> Jan Ekström (12021-01-22):
> > From: Stefano Sabatini <stefasab at gmail.com>
> >
> > ---
> >  libavutil/avstring.h |  7 ++++---
> >  libavutil/bprint.c   | 15 +++++++++++++++
> >  tools/ffescape.c     |  7 ++++---
> >  3 files changed, 23 insertions(+), 6 deletions(-)
>
> Thanks for this new version.
>
> I think this patch and the next would be better merged.

For now I kept them separate since one is just moving Stefano's code,
and another adds new functionality that you requested.

> >
> > diff --git a/libavutil/avstring.h b/libavutil/avstring.h
> > index ee225585b3..189b4726a5 100644
> > --- a/libavutil/avstring.h
> > +++ b/libavutil/avstring.h
> > @@ -321,9 +321,10 @@ int av_match_name(const char *name, const char *names);
> >  char *av_append_path_component(const char *path, const char *component);
> >
> >  enum AVEscapeMode {
> > -    AV_ESCAPE_MODE_AUTO,      ///< Use auto-selected escaping mode.
> > -    AV_ESCAPE_MODE_BACKSLASH, ///< Use backslash escaping.
> > -    AV_ESCAPE_MODE_QUOTE,     ///< Use single-quote escaping.
> > +    AV_ESCAPE_MODE_AUTO,          ///< Use auto-selected escaping mode.
> > +    AV_ESCAPE_MODE_BACKSLASH,     ///< Use backslash escaping.
> > +    AV_ESCAPE_MODE_QUOTE,         ///< Use single-quote escaping.
>
> > +    AV_ESCAPE_MODE_XML_CHAR_DATA, ///< Use XML non-markup character data escaping.
>
> It could be shorter: XML_TEXT, XML_ATTR_QUOT and XML_ATTR_APOS are clear
> enough IMHO, and more convenient.
>

I really have no bigger bone with either naming, I just utilized:
1. How the XML spec calls the thing ("AttValue" thus becoming
ATT_VALUE, "CharData" becoming CHAR_DATA). I slightly would prefer to
keep this due to this making it easy to point towards what is actually
meant in the XML spec.
2. Single and double quotes are the wording that I am more acquainted
with, but I have no hard opinion so if you want _QUOT/_APOS, sure.

> >  };
> >
> >  /**
> > diff --git a/libavutil/bprint.c b/libavutil/bprint.c
> > index 2f059c5ba6..7cdbb75095 100644
> > --- a/libavutil/bprint.c
> > +++ b/libavutil/bprint.c
> > @@ -283,6 +283,21 @@ void av_bprint_escape(AVBPrint *dstbuf, const char *src, const char *special_cha
> >          av_bprint_chars(dstbuf, '\'', 1);
> >          break;
> >
> > +    case AV_ESCAPE_MODE_XML_CHAR_DATA:
> > +        /* escape XML non-markup character data as per 2.4 */
> > +        /*  [^<&]* - ([^<&]* ']]>' [^<&]*) */
> > +        for (; *src; src++) {
> > +            switch (*src) {
> > +            case '&' : av_bprintf(dstbuf, "%s", "&");  break;
> > +            case '<' : av_bprintf(dstbuf, "%s", "<");   break;
> > +            case '>' : av_bprintf(dstbuf, "%s", ">");   break;
>
> > +            case '"' : av_bprintf(dstbuf, "%s", """); break;
> > +            case '\'': av_bprintf(dstbuf, "%s", "'"); break;
>
> Outside attributes, " and ' do not need to be quoted.
>

I think this is where I'm slightly going out of my comfort zone. This
is already existing code, and I just moved it out of ffprobe (where it
was already successfully merged and existed for a while). Additionally
the expression for char_data does at least reference single quotes. I
do see the paragraph at the end of 2.4 that notes

> To allow attribute values to contain both single and double quotes, the apostrophe or single-quote character (') may be represented as "'", and the double-quote character (") as """."

Of course, which seems specific to attribute values.

But at this point I'm 50/50 if it makes even sense for me to continue
this struggle or if this is an unclear message from you to me that I
should just utilize libxml2, which is already utilized in the DASH
demuxer. Mostly because there I at least can relatively surely know
that (probably) not only escaping, but also removal of invalid UTF-8
might be handled by it (which we currently do not do either with JSON
or XML output from ffprobe as far as I know). That way also I don't
have to attempt to be a nice person and attempt to re-utilize existing
code from ffprobe, nor do I have to touch ffprobe.

Jan


More information about the ffmpeg-devel mailing list