[Wikimedia-l] Human-assisted machine translation (it was: "The case for supporting open source machine translation")
I have been giving some thought to Erik's proposal and while already
fascinating, I would like to put it in different terms.
Instead of asking "Could open source MT be such a strategic investment?", I
would ask "is there a way to have Wikimedia's technology and people
involved collaborate with MT systems?" The first can be seen as entering
areas quite out of reach, the second would be more about paving the way for
other actors that are already in the field. Our strength has always been
based around human collaboration empowered by technology, and if MT is
wished, then we should consider approaching it from our areas of expertise.
One of the biggest problems in MT is word disambiguation. Wikidata's item
properties could be a way of setting the general context for article
translation, and if that results not to be reliable enough, users should
have the opportunity to specify on the source text the intended meaning of
a certain word. While that could be less than ideal for literary works,
where double meanings and other subtleties must be taken into account, it
might be quite useful for Wikipedia, providing MT software a fertile soil
where to grow. The standards for specifying word meanings for MT software
are unknown to me, but it might be worth exploring.
Another interesting hurdle for MT is dictionary building. OmegaWiki seems
like a system that could be used for bridging the gap between pairs of
languages, in such a way that if we know the exact use of the word in the
source language, a user could seamlessly fill in the missing word and
definition in the target language. That could be a unique way of
collaboration between source-language speakers providing precision about
the meaning being used, and target-language speakers filling the gaps.
Dictionaries alone are not enough. Grammar rules would need to be wikified.
All in all, OmegaWiki/Wiktionary could become the front-end and repository
for external MT systems, either to be used in Wikipedia or with other pages.
It wouldn't be needed to create a new MT system, because the rule-based MT
programs that could make use of such infraestructure already exist. Some of
them are open-source too. If you are interested, I could ask for opinions
about the feasability in the Apertium lists. In my opinion, they also fit
into the "smartest, well-intentioned group of people" category that Erik
was asking about.