[MPlayer-dev-eng] [PATCH] Use _NET_WM_NAME for UTF-8 title in X11 if supported

Fri Jun 23 20:01:04 CEST 2006

On Fri, Jun 23, 2006 at 07:24:29PM +0200, Matthias Hopf wrote:
> On Jun 23, 06 11:58:53 -0400, Rich Felker wrote:
> > > I don't think locale stuff can be done lightweight, too much stuff to be
> > > considered especially for arabian (right-to-left) and eastern (large
> > > charsets, combined characters, multiwidth characters) languages.
> > 
> > None of this requires significant special support by ordinary
> > applications. If you don't believe it can be done lightweight then try
> > out my 640k DOS UTF-8 terminal emulator with support for all scripts
> > once it's released. (Yes the 640k includes all glyphs. :)
> 
> Including all >68000 glyphs?

Many (most of the latin and greek characters) are precombined.
Performing canonical decomposition reduces the number of glyphs, but
of course doesn't help CJK.

> That makes less than 10 bytes per glyph,
> which would be impressive for a 14 point font ;)

7:5 compression ratio is not that impressive.

> BTW - where did you get all that glyphs from? We are constantly
> struggling geting all glyphs in a decent quality font...

I don't have them all, but unifont (iirc that's the name) has a lot in
8x16 format, and I've created some myself.

If you want to see it all, have a little patience since almost nothing
is done yet. The main work I've done is the theoretical estimates
(based on similar-complexity code I've already written) which have the
program easily fitting in 640k.

> > > - not being able to easily(!) switch between POSIX/C style and locale
> > 
> > Even if you could do this it's still broken, because all it can do is
> > make _localized_ applications, not _internationalized_ applications.
> > We do not live in a world of isolation anymore. Just because I'm in
> > country X does not mean I don't want to be able to process data from
> > country Y.
> 
> That's why I need something like this. To store data in a format that
> can be easily read in another locale. Is this missunderstandable?

Well you also possibly want to be able to display data formatted
according to multiple differing cultural conventions in one
application/document. This concept especially hilights the brokenness
of LC_CTYPE case mappings -- just because I live in country X does not
mean all the text I process will be in the official language of
country X. Surely Turkish users will process a lot of English text as
well as Turkish text, and the case mappings will be horribly broken.
:)

> > This is what my C library implementation provides, and nothing else.
> > UTF-8 will also mostly work with 'ordinary' C locale except that
> > character widths and such will be messed up.
> 
> ... which is irrelevant if the user can only display ASCII anyway.

I'm thinking of any program that can display the text through an
opaque mechanism without having to know how it works -- either plain
text output to stdout, or an interface library that hides all the
display issues in its widgets, etc. For your basic command line
programs, the only time they need to care about multibyte character or
character width is if they have interactive line editing or
tabular/column output (like ls). And for GUI apps the calling program
hardly needs to know anything about the text since the widgets handle
everything.

> > > maybe with current characterset encoding.
> > 
> > Idea of current charset is deprecated. Hopefully it will be gone in 5
> > years or so..
> 
> In a perfect world you would be right.
> I don't think we will have e.g. the majority of Japanese people
> converted in 5 years :-(
> 
> Apple was bold enough to nuke everything non-UTF-8, but I just heard
> that in Japan Macintoshs are almost not used for text processing...

It's still a smart move for the overall good, if not a smart business
move for apple. Hopefully things like this will speed the abandonment
of legacy character sets. In any case, I'm thinking about the *nix
world, not Windows, and I think *nix users have a lot more interest in
using UTF-8 especially in locales that already required legacy
multibyte encodings. To my knowledge, the legacy Japanese encodings
are (almost?) all problematic for inclusion in text files with ascii
semantics, e.g. config files.

Rich