[FFmpeg-devel] [PATCH] Support for UTF8 filenames on Windows

Sat Jul 18 05:59:27 CEST 2009

Hi,

On Thu, Jul 16, 2009 at 2:55 PM, Karl Blomster<thefluff at uppcon.com> wrote:
> Ramiro Polla wrote:
>> On Thu, Jul 16, 2009 at 11:20 AM, Karl Blomster<thefluff at uppcon.com>
>> wrote:
>>> Unless I am severely missing something in your updated patch (thanks for
>>> the
>>> nice work, by the way!) it will not work with the FFmpeg commandline
>>> program. If you want an Unicode commandline in Windows you need to use
>>> wmain() or _tmain() instead of plain old main(), AFAIK. As I said earlier
>>> my
>>> original patch was only intended to let the API support Unicode. Working
>>> it
>>> into ffmpeg.c would be a lot more work, I think.
>>
>> How do you test UNICODE support?
>>
>> I used attached shell file with msys (sh test_unicode.sh) and it works
>> as expected (only the unicode filename without FF_WINUTF8 fails). I
>> also tested with an app that used Find(First,Next)FileA() and passed
>> the unicode filenames as ascii string to ff_winutf8_open() and it also
>> worked as expected.
>
> Plain old cmd.exe (both with and without the chcp 65001 trick). I can do
> stuff like notepad.exe <unicode filename> and it'll work fine, but with
> ffmpeg it just says file not found (and prints a bogus filename). It works
> fine with mingw's sh; MinGW probably does some kind of black magic there to
> get Unix apps to work without having to patch in the Windows mess. The API
> works fine, of course.

Do you know of any real example where a codepage->utf8 conversion
fails? I only see some possible theoretical references scattered
around the web, but no real examples.

I'm tempted to do the following:
- Always expect filenames in Windows to be passed in UTF8.
- Always get the Unicode command line and convert it to UTF8.

That way we always convert to UTF-16 and use _wopen when opening
files, and MultiByteToWideChar() shouldn't fail.

That is all assuming no support for Win9x/WinME.

And this is the information I've gathered from comments and
suggestions and asking around some Win32 RE guys but no real hard
facts or MSDN documentation:
- Windows file system APIs use UTF-16 internally, so any codepage that
can't be converted to UTF-16 will be a problem anyways and we
shouldn't worry about it.
- UTF16->UTF8 conversion might be lossless (some suggest the extra
characters in codepages that can't be represented in unicode are being
assigned invalid unicode values).

VLC chose this path and they haven't had reports about codepage issues.

Ramiro Polla