An update on localisation in MediaWiki

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

An update on localisation in MediaWiki

Siebrand Mazeland
I have not seen a comprehensive overview of MediaWiki localisation  discussed on the lists I am posting this message to, so I thought I might give it a try. All statistics are based on MediaWiki 1.12 alpha, SVN version r29106.

==Introduction==
*Localisation or L10n - the process of adapting the software to be as familiar as possible to a specific locale (in scope)
*Internationalisation or i18n - the process of ensuring that an application is capable of adapting to local requirements (out of scope)

MediaWiki has a user interface (UI) definition for 319 languages. Of those languages at least 17 language codes are duplicates and/or serve a purpose for usability[1]. Reporting on them, however, is not relevant. So MediaWiki in its current state supports 302 languages. To be able to generate statistics on localisation, a MessagesXx.php file should be present in languages/messages. There currently are 262 such files, of which 16 are redirects from the duplicates/usability group[2]. So MediaWiki has an active in-product localisation for 236 languages. 66 languages have an interface, but simply fall back to English.

The MediaWiki core product recognises several collections of localisable content (three of which are defined in messageTypes.inc):
* 'normal' messages that can be localised (1726)
* optional messages that can be localised, which usually only happens for languages not using a Latin script (161)
* ignored messages that should not be localised (100)
* namespace names and namespace aliases (17)
* skin names (7)
* magic words (120)
* special page names (76)
* other (directionality, date formats, separators, book store lists, link trail, and others)

Localisation of MediaWiki revolves around all of the above. Reporting is done on the normal messages only.

MediaWiki is more than just the core product. On http://www.mediawiki.org/wiki/Category:All_extensions some 750 extensions have some kind of documentation. This analysis will scope only to the code currently present in svn.wikimedia.org/svnroot/mediawiki/trunk. The source code repository contains give or take 230 extensions. Of those 230 extensions, about 140 contain messages that can be visible in the UI in some use case (debugging excluded). Out of those 140, about 10 extensions have an exotic implementation for localisation localisation support at all (just English text in the code). 10 extensions appear to be outdated. I have seen about 5 different 'standard' implementations of i18n in extensions. Since MediaWiki 1.11 there is wfLoadExtensionMessages. Not that many extensions use this yet for message handling. If you can help add more standard i18n support for extensions (an overview can be found at http://translatewiki.net/wiki/User:Siebrand/tobeadded) or help in standardising L10n for extensions, please do not hesitate.

==MediaWiki localisation in practice==
Localisation of MediaWiki is currently done in the following ways I am aware of:
* in local wikis: Sysops on local wikis shape and translate messages to fit their needs. This is being done in wikis that are part of Wikimedia, Wikia, Wikitravel, corporate wikis, etc. This type of localisation has the fewest benefits for the core product and extensions because it happens completely out of the scope of svn committers. I have heard Wikia supports languages that are not supported in the svn version. I would like to get some help in identifying and contacting these communities to try and get their localisations in the core product. Together with SPQRobin, I am trying to get what has been localised in local Wikipedias into the core product and recruit users that worked on the localisation to work on a more centralised way of localisation (see Betawiki)
* through bugzilla/svn: A user of MediaWiki submits patches for core messages and/or extensions. These users are mostly part of a wiki community that is part of Wikimedia. These are usually taken care of by committers raymond, rotemliss, and sometimes others). Some users maintain a language directly on SVN. At the moment, 10-15 languages are maintained this way: Danish, German, Persian, Hebrew, Indonesian, Kazach (3 scripts), Chinese (3 variants), and some more less frequently.
* through Betawiki: Betawiki was founded in mid 2005 by Niklas Laxström. In the years to follow, Betawiki has grown to be a MediaWiki localisation community with over 200 users that has contributed to the localisation of 120 languages each month in the past few months. Users that are only familiar with MediaWiki as a tool can localise almost every aspect of MediaWiki (except for the group 'other' mentioned earlier) in a wiki interface. The work of the translators is regularly committed to svn by nikerabbit, and myself. Betawiki also offers a .po export that enables users to use more advanced translation tools to make their translation. This option was added recently and no translations in this format have been sumitted yet. Betawiki also supports translation of 122 extensions, aiming to support everything that can be supported.

==MediaWiki localisation statistics==
MediaWiki localisation statistics have been around since June 2005 at http://www.mediawiki.org/wiki/Localisation_statistics[3]. Traditionally reports have focused on the complete set of core messages. Recently a small study was done after usage of messages, which resulted in a set of almost 500 'most often used messages in MediaWiki', based on usage of messages on the cluster of Wikimedia (http://translatewiki.net/wiki/Most_often_used_messages_in_MediaWiki).

Up to recently there were no statistics available on the localisation of extensions. Through groupStatistics.php in the extension Translate, these statistics can now be created. Aside from reporting on 'most often used MediaWiki messages', 'MediaWiki messages', and 'all extension messages supported by extension Translate' (short: extension messages). Additionally, a meta extension group of 34 extensions used in the projects of Wikimedia has been created (short: Wikimedia messages). A regularly updated table of these statistics can be found at http://translatewiki.net/wiki/Translating:Group_statistics.

Some (arbitrary) milestones have been set for the four above mentioned collections of messages. For the usability of MediaWiki in a particular language, the group 'core-mostused' is the most important. A language must qualify for MediaWiki to have minimal support for that language. Reaching the milestones for the first two groups is something the Wikimedia language committee considers to use as a requirement for new Wikimedia wikis:
* core-mostused (496 messages): 98%
* wikimedia extensions (354 messages): 90%
* core (1726 messages): 90%
* extensions (1785 messages): 65%

Currently the following numbers of languages have passed the above milestones:
* core-mostused: 47 (15,5% of supported languages)
* wikimedia extensions: 10 (3,3% of supported languages)
* core: 49 (16,2% of supported languages)
* extensions: 7 (2,3% of supported languages)

==Conclusion==
So... Are we doing well on localisation or do we suck? My personal opinion is that we do something in between. Observing that there are some 250 Wikipedias that all use the Wikimedia Commons media repository, and that only 47 languages have a minimal localisation, we could do better. With Single User Login around the corner (isn't it), we must do better. On the other hand, new language projects within Wikimedia all have excellent localisation of the core product. These languages include Asturian, Bikol Central, Lower Sorbian, Extremaduran, and Galician. But where is Hindi, for example, with currently only 7% of core messages translated?

With the Wikimedia Foundation aiming to put MediaWiki to good use in developing countries and products like NGO-in-a-box that include MediaWiki, the potential of MediaWiki as a tool in creating and preserving knowledge in the languages of the world is huge. We have to tap into that potential and *you* (yes, I am glad you read this far and are now reading my appeal) can help. If you know people that are proficient in a language and like contributing to localisation, please point them in the right direction. If you know of organisations that can help localising MediaWiki: please approach them and ask them to help.

We have all the tools now to successfully localise MediaWiki into any of the 7000 or so languages that have been classified in ISO 639-3. We only need one person per language to make it happen. Reaching the first two milestones (core-mostused and wikimedia extensions) takes about 16 hours of work. Using Betawiki or the .po, little to no technical knowledge is required.

This was the pitch. How about we aim to at least double the numbers by the end of 2008 to:
* core-mostused: 120
* wikimedia extensions: 50
* core: 90
* extensions: 20

I would like to wish everyone involved in any aspect of MediaWiki a wonderful 2008.

Cheers!

Siebrand Mazeland

[1] als,crh,iu,kk,kk-cn,kk-kz,kk-tr,ku,sr,sr-jc,sr-jl,zh,zh-cn,zh-sg,zh-hk,zh-min-nan,zh-yue
[2] crh,iu,kk,kk-cn,kk-kz,kk-tr,ku,sr,sr-jc,sr-jl,zh,zh-cn,zh-sg,zh-hk,zh-min-nan,zh-yue
[3] older locations are http://www.mediawiki.org/wiki/Localisation_statistics/stats and
    http://meta.wikimedia.org/wiki/Localization_statistics


_______________________________________________
Translators-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/translators-l
Reply | Threaded
Open this post in threaded view
|

Re: [Wikitech-l] An update on localisation in MediaWiki

Siebrand Mazeland
Hi Anders Wegge Jakobsen.

First of all the best wishes for 2008 to you.

Please become a part of the Betawiki community so that we can address your concerns. At the moment you appear to have some reservations that are to vague to address at this point in time.

Cheers! Siebrand

-----Oorspronkelijk bericht-----
Van: [hidden email] [mailto:[hidden email]] Namens Anders Wegge Jakobsen
Verzonden: maandag 31 december 2007 19:18
Aan: Wikimedia developers
CC: 'MediaWiki internationalisation'; 'Wikimedia Translators'
Onderwerp: Re: [Wikitech-l] An update on localisation in MediaWiki

"Siebrand Mazeland" <[hidden email]> writes:

> * through Betawiki: Betawiki was founded in mid 2005 by Niklas

 ...

> the translators is regularly committed to svn by nikerabbit, and
> myself. Betawiki also offers a .po export that enables users to use

 Considering the quality of your commits to MessagesDa.php, I consider Betawiki a poor substitute for a maintainer that actually speaks the language in question.

--
// Wegge
<http://geowiki.wegge.dk/wiki/Forside> - Alt om geocaching Bruger du den gratis spamfighther ser jeg kun dine indlæg *EN* gang.

--
This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.


_______________________________________________
Wikitech-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/wikitech-l



_______________________________________________
Translators-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/translators-l
Reply | Threaded
Open this post in threaded view
|

Re: [Wikitech-l] An update on localisation in MediaWiki

Siebrand Mazeland
Hi Wegge.

Too bad you appear to be not willing to be more specific (my interpretation) and enter a dialogue. If you would come to *any* platform to address specific your concerns, I am certain that you would find that we are *very* open to any concerns.

As I already explained: i18n is for developers, L10n is for translators. If you choose to be a developer, why do you choose to translate!?

Repeating myself:
*Localisation or L10n - the process of adapting the software to be as familiar as possible to a specific locale
*Internationalisation or i18n - the process of ensuring that an application is capable of adapting to local requirements

Please acknowledge that I do not wish to depreciate your efforts in any way. On the contrary: I value *and* appreciate *any* contribution to MediaWiki localisation.

Kind regards,

Siebrand

-----Oorspronkelijk bericht-----
Van: [hidden email] [mailto:[hidden email]] Namens Anders Wegge Jakobsen
Verzonden: dinsdag 1 januari 2008 1:55
Aan: Wikimedia developers
CC: 'MediaWiki internationalisation'; 'Wikimedia Translators'
Onderwerp: Re: [Wikitech-l] An update on localisation in MediaWiki

<snip>

 Plain and simple NO!

 I'm a software developer, and I'm not going to confine myself to a web interface. I fully accept that this is where the bees knees are, so I'm not going to stand in the way of progress.
<snip>


_______________________________________________
Translators-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/translators-l
Reply | Threaded
Open this post in threaded view
|

Re: [Wikitech-l] An update on localisation in MediaWiki

Anders Wegge Keller
In reply to this post by Siebrand Mazeland
Anders Wegge Jakobsen <[hidden email]> writes:

 Six weeks ago I became involved in an argument about translation of
the mediawiki software. I made the comment below.

>  Yes, the obvious solution is that I maintain a private translation;
> at the same time one or more of the admins at dawiki plays catchup,
> whenever the inerface suddenly start sprouting english words, and the
> rest of the world get to see the worst of OSS. Everyone is happy.

 Unfortunately, it have proven to be true. No substantial changes to
the danish localization have happened since then. Since I have cooled
of a bit since then, and my prediction have proven true, I'll try to
summarize the problems with localization on translatewiki as I see
them:

* With the current setup, translators will need to access the code, to
  actually see what cryptical strings like 'You have not specified
  target revision or revi sions to perform this function on.' actually
  mean.

* That a web interface exists does not equal that a large horde of
  skilled translators will be attracted.

* Noone likes to see others credited with their work.

 This is not an attempt to renew a heated argument. The idea of
providing a relative easy-to-use interface for translation work is
better than having no one translating the interface into any
particular language. But in my opinion, it is not at present time a
substitute for having someone wit at least rudimentary PHP coding
skills doing the translation and submitting patches or direct commits
to svn.

 And yes, the issue of crediting work was what angered me most. It
still is, and unless I'm the one individual in the world with the
thinnest skin on this matter, this issue will arise again.

--
// Wegge
<http://geowiki.wegge.dk/wiki/Forside> - Alt om geocaching
Bruger du den gratis spamfighther ser jeg kun dine indlæg *EN* gang.

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


_______________________________________________
Translators-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/translators-l
Reply | Threaded
Open this post in threaded view
|

Re: [Wikitech-l] An update on localisation in MediaWiki

Siebrand Mazeland
Hi Wegge,

Contrary to your belief I think you I are still quite angry at something that happended or something that someone did or wrote. I have problem identifying what it is exactly, which makes it hard to address. I find few of the statements you made below to be true.

Can you let us know what you would need to be satisfied, or in your eyes be properly credited for everything you have done for the Danish translation? Your current choice of debate does not strike me as solution driven, which is something I personally very much prefer.

Kind regards,

Siebrand Mazeland

-----Oorspronkelijk bericht-----
Van: [hidden email] [mailto:[hidden email]] Namens Anders Wegge Jakobsen
Verzonden: maandag 18 februari 2008 11:25
Aan: Wikimedia developers; Wikimedia Translators; MediaWiki internationalisation
Onderwerp: Re: [Wikitech-l] An update on localisation in MediaWiki

Anders Wegge Jakobsen <[hidden email]> writes:

 Six weeks ago I became involved in an argument about translation of the mediawiki software. I made the comment below.

>  Yes, the obvious solution is that I maintain a private translation;
> at the same time one or more of the admins at dawiki plays catchup,
> whenever the inerface suddenly start sprouting english words, and the
> rest of the world get to see the worst of OSS. Everyone is happy.

 Unfortunately, it have proven to be true. No substantial changes to the danish localization have happened since then. Since I have cooled of a bit since then, and my prediction have proven true, I'll try to summarize the problems with localization on translatewiki as I see
them:

* With the current setup, translators will need to access the code, to
  actually see what cryptical strings like 'You have not specified
  target revision or revi sions to perform this function on.' actually
  mean.

* That a web interface exists does not equal that a large horde of
  skilled translators will be attracted.

* Noone likes to see others credited with their work.

 This is not an attempt to renew a heated argument. The idea of providing a relative easy-to-use interface for translation work is better than having no one translating the interface into any particular language. But in my opinion, it is not at present time a substitute for having someone wit at least rudimentary PHP coding skills doing the translation and submitting patches or direct commits to svn.

 And yes, the issue of crediting work was what angered me most. It still is, and unless I'm the one individual in the world with the thinnest skin on this matter, this issue will arise again.

--
// Wegge
<http://geowiki.wegge.dk/wiki/Forside> - Alt om geocaching Bruger du den gratis spamfighther ser jeg kun dine indlæg *EN* gang.

--
This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l



_______________________________________________
Translators-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/translators-l
Reply | Threaded
Open this post in threaded view
|

Re: [Wikitech-l] An update on localisation in MediaWiki

Anders Wegge Keller
"Siebrand Mazeland" <[hidden email]> writes:

> Contrary to your belief I think you I are still quite angry at
> something that happended or something that someone did or wrote. I
> have problem identifying what it is exactly, which makes it hard to
> address. I find few of the statements you made below to be true.

 Yes, I'm still angry about some of the comments. I will be for
eternity. But since that won't change, just forget it. Neither you,
nor Niklas is to be blamed for someone else making flippant remarks.

> Can you let us know what you would need to be satisfied, or in your
> eyes be properly credited for everything you have done for the
> Danish translation? Your current choice of debate does not strike me
> as solution driven, which is something I personally very much
> prefer.

 The reason I may seem un-constructive, is the simple fact that while
I'm able to point out the problems as I see them, I have no idea how
to solve them. If we leave my ego out of the equation for the moment,
the main issue is that at some of the interface messages will be
completely unknown and opaque to the translators. The obvious solution
to that is instrumenting the code, so that any message can be seen in
its context. That is going to take quite a lot of time, and not
something I think will happen right away.

 More realistic, would be crafting a set of more or less static pages,
that displays all of the messages in the contexts they are used. That
will ba a game of constant catchup, but at least it will be easier
than to change the entire codebase to include a demo feature of sorts.


--
// Wegge
<http://geowiki.wegge.dk/wiki/Forside> - Alt om geocaching
Bruger du den gratis spamfighther ser jeg kun dine indlæg *EN* gang.

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


_______________________________________________
Translators-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/translators-l
Reply | Threaded
Open this post in threaded view
|

Re: [Wikitech-l] An update on localisation inMediaWiki

Siebrand Mazeland
Hi Wegge,

Thank you for your clarification. No system is perfect. I am of the opinion though that Translate can give more insight and information which leads to a more effective and efficient translation process than keeping an eye on SVN commits. Our opinions differ, obviously, so no need to elaborate on that. Instead, I chose to inform you. Please read on.

Extension Translate currently offers a way to add translation help in the language 'qqq'. Those hints are displayed in the Betawiki UI and are also exported to the .po files for offline translation. If the translation help is written correctly, translators would have all the context they need. All this is being done 'wiki-style', so improvements must be made and are being made.

An example of a translation hint is http://translatewiki.net/wiki/MediaWiki:Undeletelink/qqq for a message that was added recently.

Currently 588 of 1766 core messages have some form of documentation[1]. For extensions 99 messages have been documented. Making the sets complete is a lot of work. Currently about 10-15 messages are documented every week[2]. I would love more developer types like you to contribute on localisation by adding message documentation (basically it is a part of i18n). Please see this as an invitation. Anyone with the translator role can add such messages in Betawiki.

An example of a translation hint in Translate context can be seen at http://translatewiki.net/wiki/Image:Translation_hint_example.png. Additional fallback languages while translating are a second instrument we use to make life easier for translators[3].

I hope I have given you and others some additional insight in the workings of Betawiki with the above.

Cheers! Siebrand

[1] http://translatewiki.net/w/i.php?title=Special%3ATranslate&task=reviewall&group=core&language=qqq&limit=100
[2] http://translatewiki.net/w/i.php?days=14&limit=250&title=Special%3ARecentchanges&namespace=8&trailer=%2Fqqq
[3] http://translatewiki.net/wiki/Image:Translation_fallback_example.png


_______________________________________________
Translators-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/translators-l