[MPlayer-dev-eng] Detecting icy info charset

Wed Aug 17 16:10:34 CEST 2011

On 08/17/11 15:48, Nicolas George wrote:
> Walk the string; if it is valid UTF-8 until the end, then treat it as UTF-8
> (this also takes care of the ASCII case). If you encounter byte sequence
> that is not valid in UTF-8, consider the string as being in the user's
> locale, as defined by the LC_CTYPE category. If that fails, fall back to
> ISO-8859-1.
> 
> The rationale for this is:
> 
> - UTF-8 is quite recognizable, there are few chances for a string in legacy
>   8-bits encoding to be valid UTF-8.
> 
> - If someone have his locale set to a Russian encoding, they are most likely
>   to listen to Russian radios than Greek ones.
> 

Hmm, I guess statistically, this would work most of the time. But as you
mentioned, there are characters that are both valid UTF-8 and a valid
member of other charsets.

Right now I have assembled a list of radio stations that use a certain
type of charsets. For each one of them, I will use Wireshark to see if
the HTTP headers give a hint as to what encoding is in effect...

--
Timur