[MPlayer-dev-eng] [PATCH] Convert UCS-2LE encoded asf tag

Zuxy Meng zuxy.meng at gmail.com
Fri Aug 25 07:35:06 CEST 2006


Hi,

2006/8/25, Rich Felker <dalias at aerifal.cx>:
>
> You mean legacy encodings. Obviously they cannot be locale dependent
> since the file was most likely not even created by the user playing
> it, and there's no reason to believe the file author and the user
> would have the same locale.

Assuming so is simplest for developers:-) Actually most times it works, because
1. Western pops with 7-bit encoded tags causes little trouble.
2. Most CJK user tend to listen to western or native pops.

> > How do we handle this? First
> > convert from mp_msg_charset to MSG_CHARSET in demuxer and then convert
> > back in mp_msg(), or just print mojibake out and tell the user to
>
> Yes, the demuxer should convert to unicode, but unfortunately it
> doesn't have enough information to know the original encoding. My
> recommendation would be to have options to control the assumed
> encoding, with a default fallback order of:
> 1. try as UTF-8
> 2. try as legacy CJK encodings (only popular windows ones)

Well, this step isn't so straight-forward. C has 2 popular encodings
(GBK & Big5), J has 3 (EUC, JIS & SJIS). GBK's coding plane overlaps
with Big5's, and without statistics a program can't tell whether a
character is encoded in GBK or Big5. That's what enca does. Too
complicated a task for a media player?

> 3. if there are long runs of characters with bit 8 set or characters
>   in the 'C1' control range, assume koi8-r
> 4. assume latin-1
>
> Of course a more intelligent detection algo could be used too..

-- 
Zuxy
Beauty is truth,
While truth is beauty.
PGP KeyID: E8555ED6



More information about the MPlayer-dev-eng mailing list