[FFmpeg-devel] [PATCH v17 1/5] libavutil: Add wchartoutf8(), wchartoansi(), utf8toansi() and getenv_utf8()

Soft Works softworkz at hotmail.com
Sun Jun 19 10:24:29 EEST 2022



> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of
> Andreas Rheinhardt
> Sent: Sunday, June 19, 2022 8:28 AM
> To: ffmpeg-devel at ffmpeg.org
> Subject: Re: [FFmpeg-devel] [PATCH v17 1/5] libavutil: Add
> wchartoutf8(), wchartoansi(), utf8toansi() and getenv_utf8()
> 
> Soft Works:
> >
> >
> >> -----Original Message-----
> >> From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of
> >> Andreas Rheinhardt
> >> Sent: Sunday, June 19, 2022 6:59 AM
> >> To: ffmpeg-devel at ffmpeg.org
> >> Subject: Re: [FFmpeg-devel] [PATCH v17 1/5] libavutil: Add
> >> wchartoutf8(), wchartoansi(), utf8toansi() and getenv_utf8()
> >>
> >> Nil Admirari:
> >>> wchartoutf8() converts strings returned by WinAPI into UTF-8,
> >>> which is FFmpeg's preffered encoding.
> >>>
> >>> Some external dependencies, such as AviSynth, are still
> >>> not Unicode-enabled. utf8toansi() converts UTF-8 strings
> >>> into ANSI in two steps: UTF-8 -> wchar_t -> ANSI.
> >>> wchartoansi() is responsible for the second step of the
> conversion.
> >>> Conversion in just one step is not supported by WinAPI.
> >>>
> >>> Since these character converting functions allocate the buffer
> >>> of necessary size, they also facilitate the removal of MAX_PATH
> >> limit
> >>> in places where fixed-size ANSI/WCHAR strings were used
> >>> as filename buffers.
> >>>
> >>> getenv_utf8() wraps _wgetenv() converting its input from
> >>> and its output to UTF-8. Compared to plain getenv(),
> >>> getenv_utf8() requires a cleanup.
> >>>
> >>> Because of that, in places that only test the existence of
> >>> an environment variable or compare its value with a string
> >>> consisting entirely of ASCII characters, the use of plain
> getenv()
> >>> is still preferred. (libavutil/log.c check_color_terminal()
> >>> is an example of such a place.)
> >>>
> >>> Plain getenv() is also preffered in UNIX-only code,
> >>> such as bktr.c, fbdev_common.c, oss.c in libavdevice
> >>> or af_ladspa.c in libavfilter.
> >>> ---
> >>>  configure                  |  1 +
> >>>  libavutil/getenv_utf8.h    | 71
> >> ++++++++++++++++++++++++++++++++++++++
> >>>  libavutil/wchar_filename.h | 51 +++++++++++++++++++++++++++
> >>>  3 files changed, 123 insertions(+)
> >>>  create mode 100644 libavutil/getenv_utf8.h
> >>>
> >>> diff --git a/configure b/configure
> >>> index 3dca1c4bd3..fa37a74531 100755
> >>> --- a/configure
> >>> +++ b/configure
> >>> @@ -2272,6 +2272,7 @@ SYSTEM_FUNCS="
> >>>      fcntl
> >>>      getaddrinfo
> >>>      getauxval
> >>> +    getenv
> >>>      gethrtime
> >>>      getopt
> >>>      GetModuleHandle
> >>> diff --git a/libavutil/getenv_utf8.h b/libavutil/getenv_utf8.h
> >>> new file mode 100644
> >>> index 0000000000..161e3e6202
> >>> --- /dev/null
> >>> +++ b/libavutil/getenv_utf8.h
> >>> @@ -0,0 +1,71 @@
> >>> +/*
> >>> + * This file is part of FFmpeg.
> >>> + *
> >>> + * FFmpeg is free software; you can redistribute it and/or
> >>> + * modify it under the terms of the GNU Lesser General Public
> >>> + * License as published by the Free Software Foundation; either
> >>> + * version 2.1 of the License, or (at your option) any later
> >> version.
> >>> + *
> >>> + * FFmpeg is distributed in the hope that it will be useful,
> >>> + * but WITHOUT ANY WARRANTY; without even the implied warranty
> of
> >>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >> GNU
> >>> + * Lesser General Public License for more details.
> >>> + *
> >>> + * You should have received a copy of the GNU Lesser General
> >> Public
> >>> + * License along with FFmpeg; if not, write to the Free Software
> >>> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
> >> 02110-1301 USA
> >>> + */
> >>> +
> >>> +#ifndef AVUTIL_GETENV_UTF8_H
> >>> +#define AVUTIL_GETENV_UTF8_H
> >>> +
> >>> +#include <stdlib.h>
> >>> +
> >>> +#include "mem.h"
> >>> +
> >>> +#ifdef HAVE_GETENV
> >>> +
> >>> +#ifdef _WIN32
> >>> +
> >>> +#include "libavutil/wchar_filename.h"
> >>> +
> >>> +static inline char *getenv_utf8(const char *varname)
> >>> +{
> >>> +    wchar_t *varname_w, *var_w;
> >>> +    char *var;
> >>> +
> >>> +    if (utf8towchar(varname, &varname_w))
> >>> +        return NULL;
> >>> +    if (!varname_w)
> >>> +        return NULL;
> >>> +
> >>> +    var_w = _wgetenv(varname_w);
> >>> +    av_free(varname_w);
> >>> +
> >>> +    if (!var_w)
> >>> +        return NULL;
> >>> +    if (wchartoutf8(var_w, &var))
> >>> +        return NULL;
> >>> +
> >>> +    return var;
> >>> +
> >>> +    // No CP_ACP fallback compared to other *_utf8() functions:
> >>> +    // non UTF-8 strings must not be returned.
> >>> +}
> >>> +
> >>> +#else
> >>> +
> >>> +static inline char *getenv_utf8(const char *varname)
> >>> +{
> >>> +    return av_strdup(getenv(varname));
> >>
> >> This forces allocations and frees in scenarios where this is
> wholly
> >> unnecessary.
> >
> > Why do you think this is unnecessary? At least on Windows, there is
> > no guarantee regarding the lifetime of strings returned from
> > getenv(). In case when some other code would call _putenv to set
> the
> > env variable, this can cause the previously returned string to
> become
> > invalid without the caller being able to know.
> >
> >
> >> This can be avoided by adding a custom deallocator for
> >> strings returned via getenv_utf8: Namely a define/wrapper around
> >> av_free in the _WIN32 and a no-op else.
> >
> > I don't think I really understand what you mean, by the above 😉
> >
> 
> The "scenarios where this is wholly unnecessary" are the scenarios in
> which one is in the above #else branch, i.e. not on Windows (I
> thought
> this was clear from the placement of my comment). In this case, the
> above av_strdup() can be avoided by adding a custom deallocator that
> boils down to av_free on Windows and a no-op in all other cases.

Right. I saw the placement was on non-Windows and talked about Windows.
I had read some time ago that the getenv() return value on Linux would
be subject to the same non-granted lifetime "policy" like on Windows, 
but I was too lazy to reconfirm.

> Yes, if you would keep the return value from getenv for too long,
> while
> something else changes the environment in the same process, you'd
> have
> such an issue. But that hasn't been a concern so far - right?

Has this ever been a valid argument on this ML? :-)

> And it isn't what we try to fix here.

Two points regarding that line of argumentation:

First of all: getenv() is a horrible API, it's neither guaranteed to be
thread-safe nor is it safe to make any assumption about the lifetime of
the returned value.

1. For getenv(), a caller needs to know about it's awful specifics, but
   getenv_utf8() is our own API, and I'm not sure whether the fact that
   it's replacing getenv() justifies that we introduce such behavior
   into "our" API as well.

2. For Windows, the code change already has the effect that the caller
   does not need to be afraid about the lifetime/validity of the result.
   So why not expose the same behavior on non-Win platforms as well?
   (via strdup)


In case that one might argue that even with the use of strdup, it won't be
100% safe, which is true, but when comparing the time ranges of the 
possible points of failure,  then, the use of strdup would still catch 
the vast majority of race conditions. 

I think that Nil's version is a better choice. Anyway I don't understand 
the motivation: ffmpeg is allocating huge amounts of memory for video 
frames and here we want to save what - the string return value of an
environment variable?

Maybe I'm missing something and I just can't see it.. :-)

Thanks,
softworkz









More information about the ffmpeg-devel mailing list