[MPlayer-dev-eng] Detecting icy info charset

Nicolas George nicolas.george at normalesup.org
Wed Aug 17 14:48:35 CEST 2011


Le decadi 30 thermidor, an CCXIX, Timur Aydin a écrit :
> screen. The problem is, with stations from different countries, it isn't
> certain what charset is being used in the icy info strings. Most of the
> time it seems to be plain ASCII. But with same stations, UTF-8 is used.
> Yet other stations use ISO-8859-1.
> 
> How can I detect the encoding being used?

I do not know ivy info, but I guess the following procedure should do the
trick:

Walk the string; if it is valid UTF-8 until the end, then treat it as UTF-8
(this also takes care of the ASCII case). If you encounter byte sequence
that is not valid in UTF-8, consider the string as being in the user's
locale, as defined by the LC_CTYPE category. If that fails, fall back to
ISO-8859-1.

The rationale for this is:

- UTF-8 is quite recognizable, there are few chances for a string in legacy
  8-bits encoding to be valid UTF-8.

- If someone have his locale set to a Russian encoding, they are most likely
  to listen to Russian radios than Greek ones.

Regards,

-- 
  Nicolas George
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/mplayer-dev-eng/attachments/20110817/c800857f/attachment.asc>


More information about the MPlayer-dev-eng mailing list