[MPlayer-dev-eng] [PATCH] Convert UCS-2LE encoded asf tag

Rich Felker dalias at aerifal.cx
Fri Aug 25 06:51:12 CEST 2006


On Fri, Aug 25, 2006 at 12:01:30PM +0800, Zuxy Meng wrote:
> Hi,
> 
> 2006/8/25, Reimar Döffinger <Reimar.Doeffinger at stud.uni-karlsruhe.de>:
> >Hello,
> >
> >Well, yes. But I (and AFAICT others) consider the case MSG_CHARSET !=
> >UTF8 a legacy setup (the only reason for having this is lack of
> >iconv/libiconv) and not worth adding bloated/ugly or complicated code.
> 
> Consistency is cool. However, non-unicode compatible encodings are
> still widely used. For example the v1 id3tag dosen't impose any
> encoding rules, and is usually encoded in locale-dependent encodings
> (mp_msg_charset instead of MSG_CHARSET).

You mean legacy encodings. Obviously they cannot be locale dependent
since the file was most likely not even created by the user playing
it, and there's no reason to believe the file author and the user
would have the same locale.

> How do we handle this? First
> convert from mp_msg_charset to MSG_CHARSET in demuxer and then convert
> back in mp_msg(), or just print mojibake out and tell the user to

Yes, the demuxer should convert to unicode, but unfortunately it
doesn't have enough information to know the original encoding. My
recommendation would be to have options to control the assumed
encoding, with a default fallback order of:
1. try as UTF-8
2. try as legacy CJK encodings (only popular windows ones)
3. if there are long runs of characters with bit 8 set or characters
   in the 'C1' control range, assume koi8-r
4. assume latin-1

Of course a more intelligent detection algo could be used too..

> update to v2 id3tags that must use UTF-8?

id3v2 is an abomination that's intentionally ignored by MPlayer's
demuxer. Using UTF-8 with id3v1 is a much better idea..

Rich




More information about the MPlayer-dev-eng mailing list