[FFmpeg-devel] [PATCH] Support for UTF8 filenames on Windows

Karl Blomster thefluff
Fri Jun 26 17:37:42 CEST 2009


Ramiro Polla wrote:
> On Fri, Jun 26, 2009 at 11:07 AM, Karl Blomster<thefluff at uppcon.com> wrote:
>> M?ns Rullg?rd wrote:
>>> Karl Blomster <thefluff at uppcon.com> writes:
>>>> Ramiro Polla wrote:
>>>>> On Thu, Jun 25, 2009 at 8:59 AM, Michael
>>>>> Niedermayer<michaelni at gmx.at> wrote:
>>>>>> On Sat, Jun 20, 2009 at 11:56:37PM +0200, Kalle Blomster wrote:
>>>>>>> Currently, ffmpeg on Windows does not support opening files whose
>>>>>>> names
>>>>>>> contain characters that cannot be expressed in the current locale,
>>>>>>> because
>>>>>>> on Windows you can't pass UTF8 in a char* to _open() and have it work.
>>>>>>> You
>>>>>>> have to convert the filename to UTF16 and use _wopen(), which takes a
>>>>>>> wchar_t instead.
>>>>>>>
>>>>>>> I have attached a patch that attempts to solve the problem with a
>>>>>>> rather
>>>>>>> ugly hack. It Works For Me(tm) under mingw at least. Comments are
>>>>>>> appreciated.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Karl Blomster
>>>>>>>  os_support.c |   17 +++++++++++++++++
>>>>>>>  os_support.h |    5 +++++
>>>>>>>  2 files changed, 22 insertions(+)
>>>>>>> 9afa6887f1f6998c37d75efaae5d589918dc752b  ffmpeg_win_utf8_paths.patch
>>>>>>> Index: libavformat/os_support.c
>>>>>>> ===================================================================
>>>>>>> --- libavformat/os_support.c  (revision 19242)
>>>>>>> +++ libavformat/os_support.c  (working copy)
>>>>>>> @@ -30,6 +30,23 @@
>>>>>>>  #include <sys/time.h>
>>>>>>>  #include "os_support.h"
>>>>>>>
>>>>>>> +#ifdef HAVE_WIN_UTF8_PATHS
>>>>>>> +#define WIN32_LEAN_AND_MEAN
>>>>>>> +#include <windows.h>
>>>>>>> +#endif
>>>>>>> +
>>>>>>> +#ifdef HAVE_WIN_UTF8_PATHS
>>>>> Where is HAVE_WIN_UTF8_PATHS defined?
>>>> Nowhere, right now. My thought is to let configure set it with some
>>>> --enable parameter, or you just pass -DHAVE_WIN_UTF8_PATHS in your
>>>> CFLAGS. The point was that I thought it might be a good idea to let
>>>> the user compile with it disabled, if he wanted to, like if someone
>>>> wanted to build on Win9x (heh) or something where unicode support
>>>> might not be available.
>>> Can we simply test for the existence of _wopen()?  Is there any reason
>>> to disable this if the function exists?
>> That may be dangerous. It will always exist in the MinGW includes/libraries,
>> but that doesn't mean it's implemented and works in the runtime libraries
>> you end up using. See also below.
> 
> It this something from msvcrt or from the MinGW runtime libraries?
> FFmpeg already expects minimum mingw-rt and w32api versions.
> 
> If it's because of Win9x users, we already have a couple of places
> that need higher versions of Windows (like a call in getutime in
> ffmpeg.c and inside vfwcap IIRC). I haven't heard of anyone seriously
> using FFmpeg in Win9x and before that happens I don't think we should
> worry about them =)
> 
>>>>>>> +int winutf8_open(const char *filename, int oflag, int pmode)
>>>>>>> +{
>>>>>>> +     wchar_t wfilename[MAX_PATH * 2];
>>>>>>> +
>>>>>>> +     if
>>>>>>> (MultiByteToWideChar(CP_UTF8,MB_ERR_INVALID_CHARS,filename,-1,wfilename,MAX_PATH)
>>>>>>>> 0)
>>>>>>> +             return _wopen(wfilename, oflag, pmode);
>>>>>>> +     else
>>>>>>> +             return open(filename, oflag, pmode);
>>>>>>> +}
>>>>>>> +#endif
>>> What might cause MultiByteToWideChar() to fail?  What will plain
>>> open() do with such input?  Also, what is the value of MAX_PATH?
>>> It is probably a bad idea to silently truncate the filename at
>>> MAX_PATH characters.  This could turn an invalid name into the name of
>>> an existing file.
>> MultiByteToWideChar() will fail in this case if the input string has
>> characters that cannot be translated as valid UTF8 (since
>> MB_ERR_INVALID_CHARS is specified). This might happen if you have a
>> multi-byte string that isn't UTF8, like for example in the system's local
>> code page (if it's multi-byte). It can also fail if the buffer length is
>> insufficient, or if you lack CP_UTF8, but neither should be a concern here.
>>
>> open() should, as far as I am aware, deal gracefully with multi-byte strings
>> in the system locale, but since it is conceivable that there might be
>> multi-byte characters in the local code page that can be interpreted as
>> valid UTF-8 even though they are not, and considering the fact that the
>> MSVCRT behaves really weirdly with character translations sometimes, the
>> only truly safe option here is to pass only UTF-8 or latin-1; other
>> character sets are not guaranteed to work. Hence my preference for leaving
>> it optional, so people who want UTF-8 filenames on Windows can get them and
>> everyone else can go about their business as usual.
> 
> If it's optional it should be documented and the consequences made clear.
> 
>> MAX_PATH is defined to 260 in WinDef.h, and that is actually the maximum
>> allowed path length in the Win32 API unless you want to jump through some
>> hoops. Paths of up to 32,767 characters (approximately) are allowed, but
>> only if they are absolute and start with the magical \\?\ prefix. I guess I
>> could do some detection of relative paths and add said magical prefix
>> manually if so desired, but the static allocation seems safe enough, and the
>> 260 character limit is indeed what a vast majority of Windows programs use.
> 
> Indeed, FFmpeg fails with long names. But if you truncate the long
> name, it might turn into a valid name (like Mans said).

Actually, I made a braino in my last mail. Somebody else just pointed out that I 
myself said above that MultiByteToWideChar will fail if the buffer isn't large 
enough, so if the input string is longer than MAX_PATH, it'll just get passed 
straight to _open() anyway, and then who knows what will happen (but it will be 
the same as what happened before this patch, anyway).

Regards,
Karl Blomster



More information about the ffmpeg-devel mailing list