An update on localisation in MediaWiki (2009)

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

An update on localisation in MediaWiki (2009)

Siebrand Mazeland
On 31 December 2007 and 1 January 2008 I sent an e-mail to which this is a
follow up[1,2].

First things first, because not everyone reads e-mails completely:
* MediaWiki localisation (that is the translation of English source messages
to other languages) depends on you! If you speak a language other than
English, care about your language in MediaWiki and Wikimedia and like
translating, go to, register a user and start
contributing translations for MediaWiki and MediaWiki extensions. When your
localisation is complete, keep coming back regularly to re-complete it and do
quality control. Thank you in advance for all your contributions and effort.
* The i18n and L10n area of MediaWiki requires continuous efforts. If this
area of FOSS has your interest: we need your help. Please offer your
development skills to further MediaWiki's i18n, L10n and translation

All statistics are based on MediaWiki 1.16 alpha, SVN version r60527 (31
December 2009). Comparisons are to MediaWiki 1.14 alpha, SVN version r45277
(1 January 2009).

See for a wiki version of this

* Localisation or L10n - the process of adapting the software to be as
familiar as possible to a specific locale (topic of this message)
* Internationalisation or i18n - the process of ensuring that an application
is capable of adapting to local requirements (out of scope of this message)

MediaWiki has a user interface definition for 362 languages (up from 348). Of
those languages at least 39 language codes are duplicates and/or serve a
purpose for usability[5]. Reporting on them, however, is not relevant. So
MediaWiki in its current state supports 323 languages (up from 322). MediaWiki
has 346 core language files (up from 326), of which 27 are redirects from the
duplicates/usability group or just empty[6]. So MediaWiki has an active
in-product localisation for 308 languages (up from 299).

The MediaWiki core product has several areas that can be localised:
* regular messages that can and should be localised (2,369 - up 9% from 2,168)
* optional messages that can be localised, which is mostly used for languages
not using a Latin script (187 - up 8% from 173)
* ignored messages that should not be localised (152 - up 2% from 149)
* namespace names and namespace aliases (17 - no change)
* magic words (142 - up 8% from 132)
* special page names (88 - up 2% from 86)
* other (directionality, date formats, separators, book store lists, link
trail, and others)

Localisation of MediaWiki revolves around all of the above. Reporting is done
on the regular messages only.

MediaWiki is more than just the core product. On 1500 extensions (up 25%
from 1200) have some kind of documentation. This analysis only takes the code
currently present in into account.
The source code repository contains give or take 445 extensions (up 25% from
370). Most extensions in the MediaWiki Subversion repository now use the
reference implementation for i18n. Currently 8,200 messages for MediaWiki
extensions can be localised in a consistent way (up 37% from 6,000).

==MediaWiki localisation in practice==
MediaWiki localisation has moved further to a centralised collaborative
process in in the past year. Where in 2008 some wikis were
still translating in their own MediaWiki: namespace, the introduction of the
LocalisationUpdate extension[7], especially in the Wikimedia Foundation wikis,
has taken away the last hurdle for local translation against centralised
translation: instant gratification. Translations that are committed to
Subversion can be added to wikis without requiring software updates, as often
as desirable.

Little to no translations are submitted through the Bugzilla ticketing system
or directly by SVN committers. Exceptions are the localisations of Hebrew,
Cantonese, Simplified Chinese, Traditional Chinese, Classical Chinese and
Persian, that are still actively maintained in SVN, next to regular
contributors from the centralised system.

==The past, the present and the future==
MediaWiki localisation has always been a volunteer effort, and expect that it
will remain so. 2009 brought a successful Google Summer of Code project,
executed by Niklas Laxstrom [8,9] and the Wikimedia Foundation is supporting
the localisation that takes place at[10]. Not only
MediaWiki, but all Open Source projects that are supported there[11] benefit
from these developments. We want to keep using the Translate extension
technology and expand on it, as well as nourish our translator base of nearly
2,000 translators by providing them with better tooling and more projects in
2010. Vereniging Wikimedia Nederland[12], the Dutch Wikimedia Chapter has
granted 2,000 Euro to Stichting Open Progress[13] for the
Translation Rallies, that motivated its translators to make more than 60,000
new translations for MediaWiki and its extensions in August and December 2009.

New opportunities lie in better support of Translation Memory technology and
more supported projects to grow the community and allow the translators to
spend their time as productive as possible, while still allowing all the
socialising and collaboration features of MediaWiki. At the Google Summer of
Code Mentor Summit there was interest from the KDE Documentation Project[14],
the PHP Documentation Project, Pidgin, wxWidgets, and other projects. For
translatewiki staff this was a confirmation that our approach works. The
Translate extension however needs more development. If you want to work on an
exciting extension that makes a difference in multi language support for Open
Source software and MediaWiki content pages that require structured
translation, check out the Translate extension and help us make it better.
Your help *is* needed and most welcome!

The Wikimedia Strategic Planning process that is currently taking place also
allows for a broader perspective on the localisation of MediaWiki in a
Wikimedia context[15]. Support for several dozen MediaWiki extension in the
Wikia code repository is expected within the next few weeks. Wikimedia is, or
will soon be including a localisation score for language projects in their
statistics, so that in a year we expect to be able to analyse if localisation
is a requirement for a rise in usage or if it is a consequence[16].

==MediaWiki localisation statistics==
Daily statistics for MediaWiki and extension localisation have been available
for the past two years[17]. For the past two years (arbitrary) milestones have
been set for four collections of MediaWiki related messages. For the usability
of MediaWiki in a particular language, the group 'core most used' is the most
important. A language must qualify for MediaWiki to have 'minimal support' for
that language in the first group. Reaching further milestones indicates the
maturity of a localisation:
* core most used (469): 98%
* core (2,369 messages): 90%
* Wikimedia extensions (2,700 messages): 90%
* extensions (8,200 messages): 65%

Currently the following numbers of languages have passed the above
* core most used: 147 (45.6% of supported languages - up 35% from 109 - goal
of 130 passed)
* core: 82 (21.1% of supported languages - up 21% from 68 - goal of 90 missed
by 203 translations)
* Wikimedia extensions: 44 (13.6% of supported languages - up 22% from 36 -
goals of 50 missed by 1,500 translations)
* extensions: 39 (12.1% of supported languages - up 86% from 21 - goal of 30

I think the changes in the past year are very satisfying. MediaWiki
localisation has again improved enormously in the past year. Two of the four
goals I set in last years' e-mail have not been reached (only one of four
goals was reached for 2008). We nearly got there, though. Currently MediaWiki
core contains 377,394 messages (up 24% from 303,863 ultimo 2008).

So... Is MediaWiki doing well on localisation? Just like the past two years,
my personal opinion is that we do a proper job, but can still do a lot better.
After all, MediaWiki is the engine that runs a top 5 site in the world
committed to creating "a world in which every single human being can freely
share in the sum of all knowledge." Observing that there are also an estimated
hundred thousand MediaWiki installations out there, more than 250 Wikipedias
that all use the Wikimedia Commons media repository, and that 147 languages
out of 323 have a minimal localisation, there is a lot of room for
improvement; more realistically: the work will never be done, we the least we
can do is try to get there :).

Last year I mentioned languages from Africa performing way below average. I am
sad to conclude that this has not changed considerably. In an overview with a
weighted score for the localisation level of MediaWiki in a Wikimedia
context[19], the largest African languages have the lowest score (52 out of
100). Large languages spoken on multiple continents and large languages from
Europe are doing best (100 and 99 out of 100 respectively). Languages like
Oriya, Zulu, Burmese and Urdu are the large languages with the worst
localisation score. It is my personal aim to work towards an average L10n
score of 83 for the 50 largest languages in the world by the end of September

We have all the tools to successfully localise MediaWiki into any of the 7,000
or so languages that have been classified in ISO 639-3. We only need one
person per language to make and effort and make it happen. Reaching the first
milestone (core most used) takes about six hours of work. Using or the Gettext file, little to no technical knowledge is
required. Knowledge of MediaWiki is a plus.

This was the pitch, basically the same as in 2007 and 2008, with even more
experience and data. Goals for MediaWiki localisation per end of 2010 are
ambitious, but still realistic with the right effort:
* core most used: 170 languages with 98% or more localised
* core: 105 languages with 90% or more localised
* wikimedia extensions: 65 languages with 90% or more localised
* extensions: 50 languages with 65% or more localised

I would like to wish everyone involved in any aspect of MediaWiki a wonderful


Siebrand Mazeland

[5] als, be-x-old, ckb, crh, de-at, de-ch, de-formal, dk, en-gb, fiu-vro, gan,
got, hif, kk, kk-cn, iu, kk-kz, kk-tr, ko-kp, ku, ku-arab, nb, ruq, simple,
sr, tg, tp, tt, ug, zh, zh-classical, zh-cn, zh-sg, zh-hk, zh-min-nan, zh-mo,
zh-my, zh-tw, zh-yue
[6] als, be-x-old, bh, ckb, ckb-latn, crh, de-at, dk, en-rtl, fiu-vro, gan,
hif, hif-deva, ii, iu, kk, kk-cn, kk-kz, kk-tr, ko-kp, ks, ku, nb, pi, ruq,
simple, st, tg, tp, tt, ug, zh-classical, zh-cn, zh-min-nan, zh-mo, zh-my,
zh-sg, zh-yue

Wikimediaindia-l mailing list
[hidden email]