[MPlayer-dev-eng] [PATCH]breakline properly with subtitles using Chinese

The Wanderer inverseparadox at comcast.net
Thu Nov 24 21:20:35 CET 2005


Rich Felker wrote:

> On Thu, Nov 24, 2005 at 11:10:36PM +0800, ?????? wrote:
> 
>> Hi,
>> 
>> MPlayer treat characters without whitespace as a single word and
>> try to render it in one line, if the length is too long then the
>> word will be truncated, however in Chinese and many asia languages
>> there is no whitespace to seperate word, so MPlayer usually treat a
>> sentence as a word, and usually the sentence will be truncate if it
>> cannot be rendered in one line. This patch detect if the character
>> is some asia char, (I assume char > 0x800 is a asia char), and try
>> to break the sentence when it cannot be rendered as a word. Please
>> test it, thx.
> 
> definitely not correct; not all asian languages or all languages
> using characters past 0x800 are splittable at any point! you'll have
> to special-case it much more if you want a patch like this to be 
> accepted. i don't know the correct splitting algorithms so you'll
> have to do it yourself but i imagine it involves a database of all
> words for chinese and japanese.

This is not feasible. There is no available (or, possibly, even
*existing*) database of "all valid Japanese words"; edict isn't a bad
start, but from what I can tell it is decidedly far from complete, and
has considerable warts (judging by how many are reported and fixed on a
regular basis). For that matter, as far as I'm aware there is not
actually the concept of "word" in Japanese; certainly it is possible to
insert a line break in between any two characters in a Japanese sentence
without problems (I see it all the time in, say, video games).

Yes, a better way than the simple assumption above of determining
whether or not a given character is "Asian" (that is, can have a line
break after it regardless of context) is needed - but a "database of all
words" is not an available solution.

-- 
       The Wanderer

Warning: Simply because I argue an issue does not mean I agree with any
side of it.

Secrecy is the beginning of tyranny.




More information about the MPlayer-dev-eng mailing list