[MPlayer-dev-eng] New translation system

Torinthiel torinthiel at megapolis.pl
Thu Nov 30 11:11:14 CET 2006


On Wed, Nov 29, 2006 at 08:51:49PM -0500, Rich Felker wrote:
> 
> On Thu, Nov 30, 2006 at 03:05:47AM +0200, Uoti Urpala wrote:
> > The most important reason to replace the current translation system is
> > that the language must be chosen at compile time. This makes the
> > translations very impractical for binary distributions, eliminating much
> > of the potential audience for translated versions.
> > 
> > Some features I consider important:
> > 1 Must allow selecting language at runtime (as above).
> 
> Agree.

Without having to 'make install' anything.

> > 2 Should leave the English versions visible in the source and find the
> >   translation based on those (plus optional context argument for non-unique
> >   strings that might need to be translated differently in different places).
> 
> I disagree strongly with this condition. They're not in the current
> source, and adding them only bloats things up. If we've lived this
> long without English getting the special treatment of being in the .c
> files we can live with it in the future.

Right now ONE language resides in binary, and something quite similar to
English text resides in sources. IMHO it should be that sources stay the
same, and English messages reside in binary (so that it's usable w/o
installing).

> > 3 Should allow loading translations from separate files and replacing them
> >   without recompiling, both to make updating them easier and because including
> >   all possible languages in the main binary would make it noticeably bigger.
> 
> s/Should/Must/.
seconded

> > 4 Should work without extra work with formats that might differ between
> >   machines such as PRId64.
> 
> This can be a preprocessing step when compiling the translation files
> to binary format.
> 
> > 5 Outdated, broken or intentionally harmful translation files should not
> >   cause crashes or security holes.
> 
> Unreasonable requirement. The translation system cannot be responsible
> for knowing what a random function that processes format strings will
> do with the string. Special-casing printf type would be possible but
> would require nearly a complete printf reimplementation in the
> translation system which is idiotic ridiculous bloat. Translation
> files should be treated as part of the source/binary; as a user you're
> responsible for ensuring that you don't download and use malicious
> ones just like you're responsible for making sure you don't download
> MPlayer from randomwarezsite.biz...

But then, there has to be any protection against accidental using of
wrong files. Say I change a string to use two %s instead of one (for
whatever reasons), than mplayer has to be save while translators update
their strings.

> > It might be possible to wrap gettext in a way that achieves these (at
> > least it seems setting the locale should be avoidable by using
> > bind_textdomain_codeset() and dcgettext() instead of plain gettext()).
> 
> Iabsolutely and unconditionally object to anything involving gettext.

And catgets? From what I know about it (random ranting in gettext's
manual) it should work, and creating unique ID's won't be a problem
since we already use a bunch of .h files.

> > Writing basic text lookup functions fulfilling the above conditions from
> > scratch would be relatively easy too, and would avoid a dependency on
> > gettext and any problems with its API. OTOH if a similar code-side
> > interface can be implemented by wrapping gettext then all the details
> > and tools it implements would probably make life easier for translators.
> 
> Using strings as keys for looking up strings is idiotic and plain
> wrong, both semantically and in terms of implementation sanity.
> Indices are numbers not strings. We code in C not PHP.

There's a serious problem with using numbers as indices: you need two
separate mp_msg-like functions. One for translated strings (that passes
numbers as indices), another for strings not meant for translation
(because we pass string then).



OK, here's a rough draft of my proposal using catgets. Someone better
knowing catgets interface tell me if it's reasonable.

We have our current .h files. Some processing tools (custom) do several
things with them. 

First, create a single help_mp.h, as now, only now with unique
identifiers (more on that below).

Second, it creates a .c file with a table of all English strings

Third, for each translation it creates a catalog, with translated text
and unique ID.

Then, during runtime we open the catalog, and keep the handle in
a global (or better yet, static global) variable[1]. On each call to
mp_msg we call catgets, with stored handle, fixed set number (after all,
why we need more than one?), the ID from .h file (by #define, just like
now), and original string, from table created in step 2.

Now, there are two problems I see left: we need two functions, one for
strings that can be translated (as they pass an ID, not a string), and
the message IDs. The ID should be constructed from the position in .h
file (first string get 1, second 2 and so on), and some checksum of
original string. Why the second? So that when original string changes,
the translation becomes invalid. That guards us against accidentally
leftover compiled catalogs. The catalog generating tools should take
care to check if the number of \n's and %'s is correct.

[1] I assume the interface is: first init, with catd=catopen("catalog");
then translation=catgets(catd, set_no, msg_id, "original"); on each
lookup.

Torinthiel

-- 
 Waclaw "Torinthiel" Schiller       GG#: 3073512
   torinthiel(at)megapolis(dot)pl
   gpg: 0906A2CE fpr: EE3E DFB4 C4D6 E22E 8999  D714 7CEB CDDC 0906 A2CE
 "No classmates may be used during this examination"
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/mplayer-dev-eng/attachments/20061130/18fd0de0/attachment.pgp>


More information about the MPlayer-dev-eng mailing list