[MPlayer-dev-eng] Detecting icy info charset
Timur Aydin
ta at taydin.org
Wed Aug 17 16:10:34 CEST 2011
On 08/17/11 15:48, Nicolas George wrote:
> Walk the string; if it is valid UTF-8 until the end, then treat it as UTF-8
> (this also takes care of the ASCII case). If you encounter byte sequence
> that is not valid in UTF-8, consider the string as being in the user's
> locale, as defined by the LC_CTYPE category. If that fails, fall back to
> ISO-8859-1.
>
> The rationale for this is:
>
> - UTF-8 is quite recognizable, there are few chances for a string in legacy
> 8-bits encoding to be valid UTF-8.
>
> - If someone have his locale set to a Russian encoding, they are most likely
> to listen to Russian radios than Greek ones.
>
Hmm, I guess statistically, this would work most of the time. But as you
mentioned, there are characters that are both valid UTF-8 and a valid
member of other charsets.
Right now I have assembled a list of radio stations that use a certain
type of charsets. For each one of them, I will use Wireshark to see if
the HTTP headers give a hint as to what encoding is in effect...
--
Timur
More information about the MPlayer-dev-eng
mailing list