[MPlayer-dev-eng] Moving towards UTF-8

Zuxy Meng zuxy.meng at gmail.com
Mon Oct 23 08:10:12 CEST 2006


Hi,

I guess we've agreed that all internal strings inside mplayer are to
be encoded in utf-8. Absolutely good news, especially for CJK users
like me. But several things must be done or CJK users will most
probabilly see messed up strings:

1. Filenames must be passed to fopen() as is, so maybe they shouldn't
be stored as utf-8, and a mp_msg_noconv() should be introduced?

2. ASF files' meta data are stored in utf-16le, they should be
properly converted to utf-8 instead of simply being shrank. As
mentioned in another thread I'll attack this.

3. Most challenging thing will be meta data stored in legacy encoding,
like id3tag. Quite absurd, if the user doesn't bother to set
mp_msg_charset, like most guys under Windows, s/he will probably see
the correct string if it happens to be encoded in her/his locale,
because it's printed unconverted; but if s/he or mplayer set
mp_msg_charset correctly, s/he will surely see mess.

For 3, my proposal was to treat such meta data as encoded in
mp_msg_charset (I assumed that people tend to listen more songs in
their own language than in foreign languages :-)). Rich disagreed and
argued that the process should be more intelligent. Then a temporary
solution would be: in mp_msg, when iconv() fail, instead of going for
next char, it should bail out and print the rest of the string as is.
Then for GBK encoded Chinese, more than 80% the case, the string won't
be a legal UTF-8 symbol and hence the user will see the correct,
unconverted string.

Comments?
-- 
Zuxy
Beauty is truth,
While truth is beauty.
PGP KeyID: E8555ED6



More information about the MPlayer-dev-eng mailing list