First of all i18n is shorthand for internationalisation, The same reasoning is behind l10n. The standard translation support on linux is "gettext". It consists of a translations database stored in the filesystem, utilities to manage the database and an API (which comes with glibc) to access it. Database: The translations database is stored in seperate files like: $dirname/$locale/$category/$domain.mo an example of the variables being: dirname=/usr/share/locale #This is the usual location locale=en_IE #language_COUNTRY category=LC_MESSAGES #strings in your app domain=fslint #your app API (to set variables above in your program): C: #include <locale.h> bindtextdomain("fslint","/usr/share/locale"); setlocale(LC_ALL,""); /* set all locale categories to value in LC_ALL or LANG environment variables */ /* note gettext uses LC_MESSAGES category */ textdomain("fslint"); Python: import gettext, locale gettext.bindtextdomain("fslint", "/usr/share/locale") #sys default used if localedir=None locale.setlocale(locale.LC_ALL,'') gettext.textdomain("fslint") #Note if you initially do the following, it is much #faster as lookup for mo file not done for each translation #(the C version automatically caches the translations so it's not needed there). gettext.install("fslint",localedir=None,unicode=1) #None is sys default locale #Note also before python 2.3 you need the following if #you need translations from non python code (glibc,libglade etc.) gtk.glade.bindtextdomain("fslint",textdomain) #there are other access points to this function #Since python 2.3 one still needs to call the following #as the gettext equivalent doesn't do it in case the message #catalogs are in different formats for libc and the python app locale.bindtextdomain("fslint",textdomain) #Note python parses the translations itself, instead of letting #glibc do it. This is for platform independence I suppose, but #it does allow you to use python to display existing message catalogs: $ LANG=es python >>> import gettext >>> gettext.install("libc") >>> for item in gettext._translations['/usr/share/locale/es/LC_MESSAGES/libc.mo']._catalog.keys(): >>> print item, ":", gettext._translations['/usr/share/locale/es/LC_MESSAGES/libc.mo']._catalog[item] To actually call the gettext translation functions just replace your strings "string" with gettext("string") The following shortcuts are usually used: Python: _ = gettext.gettext #Don't do if used gettext.install above (more inefficient) print _("translated string") C: #define _(x) gettext(x) printf(_("translated string")); Utilities: The next thing to do is extract the marked strings from your source files for translation and insertion into the database. Python used to have its own utility (pygettext.py) to do this, but the best way now is to use the standard xgettext utility which now supports python. The output from this stage is a pot file. The last thing left to is actually do the translations. Translators create a "po" file from the pot file above, by just entering the text for the source strings in the pot file. Then the developer compiles these to binary mo files for use by the application. msgfmt and msgmerge are the main utilities for manipulating po, pot and mo files. The quickest way to learn about the external utilities (xgettext, msgmerge, msgfmt) is to look at existing examples, which are usually in po/Makefile in various projects, including: FSlint Charsets: Translators can represent your strings in various ways. For e.g. the Euro symbol (€) can be encoded like: A4 in iso-8859-15 20AC in unicode E282AC in utf-8 All in, utf-8 is the best one to use if you can, as it involves the least conversion and is very efficient for primarily ascii text. Note gtk2 only takes utf8. Note also pygtk will auto convert from unicode to utf-8. Python will convert translations to unicode if you specify unicode=1 to gettext.install(). So for e.g. if you got translations in each of the 3 encodings above the charset translation process for pygtk would be: iso-8859-15 \ unicode - unicode - utf-8 utf-8 / Misc It's not just strings that need to be translated in an application. For e.g. there are differing number and date representations. To handle these you need to use variants of the standard functions for representing numbers to users: C: #include <locale.h> setlocale(LC_ALL, ""); printf("%'d", 1234); /* notice the ' */ Python: import locale locale.setlocale(locale.LC_ALL, "") locale.format("%d", 1234, 1) #this is a little limited as of 2.2.3 More info info gettext
© Nov 22 2004