Internationalization and localization are means of adapting software for
non-native environments, especially for other nations and cultures.
The distinction between internationalization and localization is subtle but
important. Internationalization is the adaptation of products for potential use
virtually everywhere, while localization is the addition of special features
for use in a specific locale.
For example, in terms of language used in software, internationalization is the
process of marking up all strings that might need to be translated whilst
localization is the process of producing translations for a particular locale.
Pylons provides built-in support to enable you to internationalize language but
leaves you to handle any other aspects of internationalization which might be
appropriate to your application.
In order to represent characters from multiple languages, you will
need to utilize Unicode. This document assumes you have read the
Unicode document.
By now you should have a good idea of what Unicode is, how to use it in Python
and which areas of you application need to pay specific attention to decoding and
encoding Unicode data.
This final section will look at the issue of making your application work with
multiple languages.
Pylons uses the Python gettext module for internationalization.
It is based off the GNU gettext API.
Everywhere in your code where you want strings to be available in different
languages you wrap them in the _() function. There are also a number of
other translation functions which are documented in the API reference at
http://pylonshq.com/docs/module-pylons.i18n.translation.html
Note
The _() function is a reference to the ugettext() function.
_() is a convention for marking text to be translated and saves on keystrokes.
ugettext() is the Unicode version of gettext(); it returns unicode strings.
In our example we want the string 'Hello' to appear in three different
languages: English, French and Spanish. We also want to display the word
'Hello' in the default language. We'll then go on to use some pural words
too.
Lets call our project translate_demo:
1 | $ paster create -t pylons translate_demo
|
Now lets add a friendly controller that says hello:
1
2 | $ cd translate_demo
$ paster controller hello
|
Edit controllers/hello.py to make use of the _() function everywhere
where the string Hello appears:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15 | import logging
from pylons.i18n import get_lang, set_lang
from translate_demo.lib.base import *
log = logging.getLogger(__name__)
class HelloController(BaseController):
def index(self):
response.write('Default: %s<br />' % _('Hello'))
for lang in ['fr','en','es']:
set_lang(lang)
response.write("%s: %s<br />" % (get_lang(), _('Hello')))
|
When writing wrapping strings in the gettext functions, it is important not to
piece sentences together manually; certain languages might need to invert the
grammars. Don't do this:
1
2
3 | # BAD!
msg = _("He told her ")
msg += _("not to go outside.")
|
but this is perfectly acceptable:
1
2 | # GOOD
msg = _("He told her not to go outside")
|
The controller has now been internationalized, but it will raise a
LanguageError until we have setup the alternative language catalogs.
GNU gettext use three types of files in the translation framework.
POT (Portable Object Template) files
The first step in the localization process. A program is used to search
through your project's source code and pick out every string passed to one
of the translation functions, such as _(). This list is put together in
a specially-formatted template file that will form the basis of all
translations. This is the .pot file.
PO (Portable Object) files
The second step in the localization process. Using the POT file as a
template, the list of messages are translated and saved as a .po file.
MO (Machine Object) files
The final step in the localization process. The PO file is run through a
program that turns it into an optimized machine-readable binary file, which
is the .mo file. Compiling the translations to machine code makes the
localized program much faster in retrieving the translations while it is
running.
GNU gettext provides a suite of command line programs for extracting messages
from source code and working with the associated gettext catalogs. The Babel project provides pure Python alternative
versions of these tools. Unlike the GNU gettext tool xgettext', Babel
supports extracting translatable strings from Python templating languages
(currently Mako and Genshi).
To use Babel, you must first install it via easy_install. Run the command:
Pylons (as of 0.9.6) includes some sane defaults for Babel's distutils commands
in the setup.cfg file.
It also includes an extraction method mapping in the setup.py file. It is
commented out by default, to avoid distutils warning about it being an
unrecognized option when Babel is not installed. These lines should be
uncommented before proceeding with the rest of this walk through:
1
2
3
4 | message_extractors = {'translate_demo': [
('**.py', 'python', None),
('templates/**.mako', 'mako', None),
('public/**', 'ignore', None)]},
|
We'll use Babel to extract messages to a .pot file in your project's
i18n directory. First, the directory needs to be created. Don't forget to
add it to your revision control system if one is in use:
1
2
3 | $ cd translate_demo
$ mkdir translate_demo/i18n
$ svn add translate_demo/i18n
|
Next we can extract all messages from the project with the following command:
1
2
3
4
5
6
7 | $ python setup.py extract_messages
running extract_messages
extracting messages from translate_demo/__init__.py
extracting messages from translate_demo/websetup.py
...
extracting messages from translate_demo/tests/functional/test_hello.py
writing PO template file to translate_demo/i18n/translate_demo.pot
|
This will create a .pot file in the i18n directory that looks something
like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22 | # Translations template for translate_demo.
# Copyright (C) 2007 ORGANIZATION
# This file is distributed under the same license as the translate_demo project.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2007.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: translate_demo 0.0.0\n"
"Report-Msgid-Bugs-To: EMAIL@ADDRESS\n"
"POT-Creation-Date: 2007-08-02 18:01-0700\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: Babel 0.9dev-r215\n"
#: translate_demo/controllers/hello.py:10 translate_demo/controllers/hello.py:13
msgid "Hello"
msgstr ""
|
The .pot details that appear here can be customized via the
extract_messages configuration in your project's setup.cfg (See the
Babel Command-Line Interface Documentation for all
configuration options).
Next, we'll initialize a catalog (.po file) for the Spanish language:
1
2
3
4 | $ python setup.py init_catalog -l es
running init_catalog
creating catalog 'translate_demo/i18n/es/LC_MESSAGES/translate_demo.po' based on
'translate_demo/i18n/translate_demo.pot'
|
Then we can edit the last line of the new Spanish .po file to add a
translation of "Hello":
1
2 | msgid "Hello"
msgstr "¡Hola!"
|
Finally, to utilize these translations in our application, we need to compile
the .po file to a .mo file:
1
2
3
4
5 | $ python setup.py compile_catalog
running compile_catalog
1 of 1 messages (100%) translated in 'translate_demo/i18n/es/LC_MESSAGES/translate_demo.po'
compiling catalog 'translate_demo/i18n/es/LC_MESSAGES/translate_demo.po' to
'translate_demo/i18n/es/LC_MESSAGES/translate_demo.mo'
|
We can also use the update_catalog command to merge new messages from the
.pot to the .po files. For example, if we later added the following
line of code to the end of HelloController's index method:
1 | response.write('Goodbye: %s' % _('Goodbye'))
|
We'd then need to re-extract the messages from the project, then run the
update_catalog command:
1
2
3
4
5
6
7
8
9
10
11 | $ python setup.py extract_messages
running extract_messages
extracting messages from translate_demo/__init__.py
extracting messages from translate_demo/websetup.py
...
extracting messages from translate_demo/tests/functional/test_hello.py
writing PO template file to translate_demo/i18n/translate_demo.pot
$ python setup.py update_catalog
running update_catalog
updating catalog 'translate_demo/i18n/es/LC_MESSAGES/translate_demo.po' based on
'translate_demo/i18n/translate_demo.pot'
|
We'd then edit our catalog to add a translation for "Goodbye", and recompile
the .po file as we did above.
For more information, see the Babel documentation and the GNU
Gettext Manual.
Next we'll need to repeat the process of creating a .mo file for the en
and fr locales:
1
2
3
4
5
6
7
8 | $ python setup.py init_catalog -l en
running init_catalog
creating catalog 'translate_demo/i18n/en/LC_MESSAGES/translate_demo.po' based on
'translate_demo/i18n/translate_demo.pot'
$ python setup.py init_catalog -l fr
running init_catalog
creating catalog 'translate_demo/i18n/fr/LC_MESSAGES/translate_demo.po' based on
'translate_demo/i18n/translate_demo.pot'
|
Modify the last line of the fr catalog to look like this:
1
2
3 | #: translate_demo/controllers/hello.py:10 translate_demo/controllers/hello.py:13
msgid "Hello"
msgstr "Bonjour"
|
Since our original messages are already in English, the en catalog can stay
blank; gettext will fallback to the original.
Once you've edited these new .po files and compiled them to .mo files,
you'll end up with an i18n directory containing:
1
2
3
4
5
6
7 | i18n/translate_demo.pot
i18n/en/LC_MESSAGES/translate_demo.po
i18n/en/LC_MESSAGES/translate_demo.mo
i18n/es/LC_MESSAGES/translate_demo.po
i18n/es/LC_MESSAGES/translate_demo.mo
i18n/fr/LC_MESSAGES/translate_demo.po
i18n/fr/LC_MESSAGES/translate_demo.mo
|
Start the server with the following command:
1 | $ paster serve --reload development.ini
|
Test your controller by visiting http://localhost:5000/hello. You should see
the following output:
1
2
3
4 | Default: Hello
fr: Bonjour
en: Hello
es: ¡Hola!
|
You can now set the language used in a controller on the fly.
For example this could be used to allow a user to set which language they
wanted your application to work in. You could save the value to the session
object:
1
2 | session['lang'] = 'en'
session.save()
|
then on each controller call the language to be used could be read from the
session and set in your controller's __before__() method so that the pages
remained in the same language that was previously set:
1
2
3 | def __before__(self):
if 'lang' in session:
set_lang(session['lang'])
|
Pylons also supports defining the default language to be used in the
configuration file. Set a lang variable to the desired default language in
your development.ini file, and Pylons will automatically call set_lang
with that language at the beginning of every request.
E.g. to set the default language to Spanish, you would add lang = es to
your development.ini:
1
2
3 | [app:main]
use = egg:translate_demo
lang = es
|
If you are running the server with the --reload option the server will
automatically restart if you change the development.ini file. Otherwise
restart the server manually and the output would this time be as follows:
1
2
3
4 | Default: ¡Hola!
fr: Bonjour
en: Hello
es: ¡Hola!
|
If your code calls _() with a string that doesn't exist at all in your
language catalog, the string passed to _() is returned instead.
Modify the last line of the hello controller to look like this:
1 | response.write("%s %s, %s" % (_('Hello'), _('World'), _('Hi!')))
|
Warning
Of course, in real life breaking up sentences in this way is very dangerous
because some grammars might require the order of the words to be different.
If you run the example again the output will be:
1
2
3
4 | Default: ¡Hola!
fr: Bonjour World!
en: Hello World!
es: ¡Hola! World!
|
This is because we never provided a translation for the string 'World!' so
the string itself is used.
Pylons also provides a mechanism for fallback languages, so that you can
specify other languages to be used if the word is ommitted from the main
language's catalog.
In this example we choose fr as the main language but es as a fallback:
1
2
3
4
5
6
7
8
9
10
11
12
13
14 | import logging
from pylons.i18n import add_fallback, set_lang
from translate_demo.lib.base import *
log = logging.getLogger(__name__)
class HelloController(BaseController):
def index(self):
set_lang('fr')
add_fallback('es')
return "%s %s, %s" % (_('Hello'), _('World'), _('Hi!'))
|
If Hello is in the fr .mo file as Bonjour, World is only in
es as Mundo and none of the catalogs contain Hi!, you'll get the
multilingual message: Bonjour Mundo, Hi!. This is a combination of the
French, Spanish and original (English in this case, as defined in our source
code) words.
You can add as many fallback languages with the add_fallback() function as
you like and they will be tested in the order you add them.
One case where using fallbacks in this way is particularly useful is when you
wish to display content based on the languages requested by the browser in the
HTTP_ACCEPT_LANGUAGE header. Typically the browser may submit a number of
languages so it is useful to be add fallbacks in the order specified by the
browser so that you always try to display words in the language of preference
and search the other languages in order if a translation cannot be found. The
languages defined in the HTTP_ACCEPT_LANGAUGE header are available in
Pylons as request.languages and can be used like this:
1
2 | for lang in request.languages:
add_fallback(lang)
|
You can also use the _() function within templates in exactly the same way
you do in code. For example, in a Mako template:
would produce the string 'Hello' in the language you had set.
Babel currently supports extracting gettext messages from Mako and Genshi
templates. The Mako extractor also provides support for translator comments.
Babel can be extended to extract messages from other sources via a custom
extraction method plugin.
Pylons (as of 0.9.6) automatically configures a Babel extraction mapping for
your Python source code and Mako templates. This is defined in your project's
setup.py file:
1
2
3
4
5 | message_extractors = {'translate_demo': [
('public/**', 'ignore', None),
('**.py', 'python', None),
('templates/**.mako', 'mako', None),
]},
|
For a project using Genshi instead of Mako, the Mako line might be replaced with:
1 | ('templates/**.html, 'genshi, None),
|
See Babel's documentation on Message Extraction
for more information.
Occasionally you might come across a situation when you need to translate a
string when it is accessed, not when the _() or other functions are called.
Consider this example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18 | import logging
from pylons.i18n import get_lang, set_lang
from translate_demo.lib.base import *
log = logging.getLogger(__name__)
text = _('Hello')
class HelloController(BaseController):
def index(self):
response.write('Default: %s<br />' % _('Hello'))
for lang in ['fr','en','es']:
set_lang(lang)
response.write("%s: %s<br />" % (get_lang(), _('Hello')))
response.write('Text: %s<br />' % text)
|
If we run this we get the following output:
1
2
3
4
5 | Default: Hello
['fr']: Bonjour
['en']: Good morning
['es']: Hola
Text: Hello
|
This is because the function _('Hello') just after the imports is called
when the default language is en so the variable text gets the value of
the English translation even though when the string was used the default
language was Spanish.
The rule of thumb in these situations is to try to avoid using the translation
functions in situations where they are not executed on each request. For
situations where this isn't possible, perhaps because you are working with
legacy code or with a library which doesn't support internationalization, you
need to use lazy translations.
If we modify the above example so that the import statements and assignment to
text look like this:
1
2
3
4
5
6
7 | from pylons.i18n import get_lang, lazy_gettext, set_lang
from helloworld.lib.base import *
log = logging.getLogger(__name__)
text = lazy_gettext('Hello')
|
then we get the output we expected:
1
2
3
4
5 | Default: Hello
['fr']: Bonjour
['en']: Good morning
['es']: Hola
Text: Hola
|
There are lazy versions of all the standard Pylons translation functions.
There is one drawback to be aware of when using the lazy translation functions:
they are not actually strings. This means that if our example had used the
following code it would have failed with an error cannot concatenate 'str'
and 'LazyString' objects:
1 | response.write('Text: ' + text + '<br />')
|
For this reason you should only use the lazy translations where absolutely
necessary and should always ensure they are converted to strings by calling
str() or repr() before they are used in operations with real strings.
Finally you can produce an egg of your project which includes the translation
files like this:
1 | $ python setup.py bdist_egg
|
The setup.py automatically includes the .mo language catalogs your
application needs so that your application can be distributed as an egg. This
is done with the following line in your setup.py file:
1 | package_data={'translate_demo': ['i18n/*/LC_MESSAGES/*.mo']},
|
Pylons also provides the ungettext() function. It's designed for
internationalizing plural words, and can be used as follows:
1
2 | ungettext('There is %(num)d file here', 'There are %(num)d files here',
n) % {'num': n}
|
Plural forms have a different type of entry in .pot/.po files, as
described in The Format of PO Files
in GNU Gettext's Manual:
1
2
3
4
5
6 | #: translate_demo/controllers/hello.py:12
#, python-format
msgid "There is %(num)d file here"
msgid_plural "There are %(num)d files here"
msgstr[0] ""
msgstr[1] ""
|
One thing to keep in mind is that other languages don't have the same plural
forms as English. While English only has 2 plural forms, singular and plural,
Slovenian has 4! That means that you must use ugettext for proper
pluralization. Specifically, the following will not work:
1
2
3
4
5 | # BAD!
if n == 1:
msg = _("There was no dog.")
else:
msg = _("There were no dogs.")
|
in para "For a project using Genshi..." there is a typo. it should read: