Do we have any data in wikidata / wiktionary that could be used for mechanic translations?

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Do we have any data in wikidata / wiktionary that could be used for mechanic translations?

Petr Bena
I was looking for a free (possibly open source) provider of automatic
translations for my open source application I am working on and quite
had troubles finding some. Then I realized we have a project called
"wiktionary" which could possibly (I was assuming it's open
dictionary) help me here, but I was quite disappointed as I couldn't
find any simple way to perform simple queries like:

translate "banana" from english to czech

I think that we could (maybe should in spirit of openness and
wikiness) have some wiki-based web application that would serve this
purpose - allow people query / translate simple words, but maybe even
whole phrases. If anyone could edit this, maybe it would grow up into
huge dictionary of all possible or frequent phrases that could be
easily translated to any language on world.

Do we already have anything like this?

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Do we have any data in wikidata / wiktionary that could be used for mechanic translations?

Lydia Pintscher
On Thu, May 22, 2014 at 5:41 PM, Petr Bena <[hidden email]> wrote:

> I was looking for a free (possibly open source) provider of automatic
> translations for my open source application I am working on and quite
> had troubles finding some. Then I realized we have a project called
> "wiktionary" which could possibly (I was assuming it's open
> dictionary) help me here, but I was quite disappointed as I couldn't
> find any simple way to perform simple queries like:
>
> translate "banana" from english to czech
>
> I think that we could (maybe should in spirit of openness and
> wikiness) have some wiki-based web application that would serve this
> purpose - allow people query / translate simple words, but maybe even
> whole phrases. If anyone could edit this, maybe it would grow up into
> huge dictionary of all possible or frequent phrases that could be
> easily translated to any language on world.
>
> Do we already have anything like this?

It doesn't exist yet but it is on the longer-term (aka 2015 earliest)
plan for the Wikidata team. The current proposal is at
https://www.wikidata.org/wiki/Wikidata:Wiktionary


Cheers
Lydia

--
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata

Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Do we have any data in wikidata / wiktionary that could be used for mechanic translations?

Petr Bena
In reply to this post by Petr Bena
Just to extend the idea little bit so that it's easier to answer "do
we have this?" (I am pretty sure we don't):

This service should be able to do things like this:

TRANSLATE hello there, how are you FROM english TO chinese
(preudo-query language is just for this example so that it's clear
what I want it to do)

and it /should/

1. look up whole sentence "hello there, how are you" in database, if
there is no translation for whole this sentence, it should:
2. split the sentence (by comma) and look only for "hello there" and
"how are you", if there is no translation for these it should:
3. split it by words and return "mechanic" translation for every word
(which is least wanted but better than nothing)

if people had possibility to insert & translate words, phrases,
sentences, I think this would be awesome application as lot of people
would probably insert incredible amount of data and translations.

I don't really know if this is something what wikimedia movement
should provide or support, but anyway, it would be nice to have open
source project :) I know it would be kind of reinventing of google
translate, but that, no matter how nice it is, isn't free for
developers (api's are paid) and isn't very open (source code is closed
and user ability to edit database is nowhere near to what people can
do on real wikis, like wikipedia)

On Thu, May 22, 2014 at 5:41 PM, Petr Bena <[hidden email]> wrote:

> I was looking for a free (possibly open source) provider of automatic
> translations for my open source application I am working on and quite
> had troubles finding some. Then I realized we have a project called
> "wiktionary" which could possibly (I was assuming it's open
> dictionary) help me here, but I was quite disappointed as I couldn't
> find any simple way to perform simple queries like:
>
> translate "banana" from english to czech
>
> I think that we could (maybe should in spirit of openness and
> wikiness) have some wiki-based web application that would serve this
> purpose - allow people query / translate simple words, but maybe even
> whole phrases. If anyone could edit this, maybe it would grow up into
> huge dictionary of all possible or frequent phrases that could be
> easily translated to any language on world.
>
> Do we already have anything like this?

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Do we have any data in wikidata / wiktionary that could be used for mechanic translations?

Petr Bena
In reply to this post by Lydia Pintscher
I am happy to know that we are doing at least "something" on this :)
hopefully a first step to some more complex solution? Because from the
proposal you linked I can't see how would I easily translate "apple"
to different language. I know I can perform a number of lookups and
queries to accomplish that, but IMHO it should be easier.

On Thu, May 22, 2014 at 5:47 PM, Lydia Pintscher
<[hidden email]> wrote:

> On Thu, May 22, 2014 at 5:41 PM, Petr Bena <[hidden email]> wrote:
>> I was looking for a free (possibly open source) provider of automatic
>> translations for my open source application I am working on and quite
>> had troubles finding some. Then I realized we have a project called
>> "wiktionary" which could possibly (I was assuming it's open
>> dictionary) help me here, but I was quite disappointed as I couldn't
>> find any simple way to perform simple queries like:
>>
>> translate "banana" from english to czech
>>
>> I think that we could (maybe should in spirit of openness and
>> wikiness) have some wiki-based web application that would serve this
>> purpose - allow people query / translate simple words, but maybe even
>> whole phrases. If anyone could edit this, maybe it would grow up into
>> huge dictionary of all possible or frequent phrases that could be
>> easily translated to any language on world.
>>
>> Do we already have anything like this?
>
> It doesn't exist yet but it is on the longer-term (aka 2015 earliest)
> plan for the Wikidata team. The current proposal is at
> https://www.wikidata.org/wiki/Wikidata:Wiktionary
>
>
> Cheers
> Lydia
>
> --
> Lydia Pintscher - http://about.me/lydia.pintscher
> Product Manager for Wikidata
>
> Wikimedia Deutschland e.V.
> Tempelhofer Ufer 23-24
> 10963 Berlin
> www.wikimedia.de
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
> Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Do we have any data in wikidata / wiktionary that could be used for mechanic translations?

Lydia Pintscher
On Thu, May 22, 2014 at 5:59 PM, Petr Bena <[hidden email]> wrote:
> I am happy to know that we are doing at least "something" on this :)
> hopefully a first step to some more complex solution? Because from the
> proposal you linked I can't see how would I easily translate "apple"
> to different language. I know I can perform a number of lookups and
> queries to accomplish that, but IMHO it should be easier.

Yes this is the groundwork for potentially more complex translation
systems later. If/when/how that'll be done I have no idea. But this is
the next step on the way ;-)


Cheers
Lydia

--
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata

Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Do we have any data in wikidata / wiktionary that could be used for mechanic translations?

Lars Aronsson
In reply to this post by Petr Bena
On 05/22/2014 05:41 PM, Petr Bena wrote:
> I was looking for a free (possibly open source) provider of automatic
> translations for my open source application I am working on and quite
> had troubles finding some. Then I realized we have a project called
> "wiktionary" which could possibly (I was assuming it's open
> dictionary) help me here, but I was quite disappointed as I couldn't
> find any simple way to perform simple queries like:

There are several open-source machine translation projects.
They are either rule-based or statistics-based. One of the
rule-based projects is Apertium.

When you start from zero, building a rule-based system
gives you a useful system quite fast, especially if the
two languages are similar. A statistics-based system (such
as Google Translate) requires enormous amounts of
data to become useful.

It's not something that you can start as a subproject
within Wiktionary, not even as a separate WMF project.
It's a very large task.

One naive approach is to base a statistics-based
machine translator (SMT) on the European Union's
freely available parallel text corpus. When you try
to translate Finnish "terve" (which means: hello!)
into English in such a system, it will say "health",
since the same word also means health, and EU
texts only talk about healthcare, never "hello".


--
   Lars Aronsson ([hidden email])
   Aronsson Datateknik - http://aronsson.se



_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Do we have any data in wikidata / wiktionary that could be used for mechanic translations?

Scott MacLeod
If you Petr were going to take a rules' based approach to what you've
outlined above, and use the already existing Wikidata interlinguality,
which I think is based around the 'item with a label' (think a Wikipedia
Encyclopedia article - is this correct?), and build on Wiktionary, could
one 'reduce' Wikidata's intelinguality from an 'item' to a 'word' (and also
co-anticipate voice, smartphones, and extensibility / scalability to all
7,106+ languages, for example, as well)? What else would be needed, and
what would some of the initial challenges to beginning this way?

Cheers,
Scott

(I write the above in the context of developing wiki CC MIT OCW-centric
WUaS for free online university degrees, and which plans to be in all 7106+
languages
http://worlduniversity.wikia.com/wiki/Languages as schools, and develop a
universal translator -
http://worlduniversity.wikia.com/wiki/WUaS_Universal_Translator - as well).




On Thu, May 22, 2014 at 9:03 AM, Lars Aronsson <[hidden email]> wrote:

> On 05/22/2014 05:41 PM, Petr Bena wrote:
>
>> I was looking for a free (possibly open source) provider of automatic
>> translations for my open source application I am working on and quite
>> had troubles finding some. Then I realized we have a project called
>> "wiktionary" which could possibly (I was assuming it's open
>> dictionary) help me here, but I was quite disappointed as I couldn't
>> find any simple way to perform simple queries like:
>>
>
> There are several open-source machine translation projects.
> They are either rule-based or statistics-based. One of the
> rule-based projects is Apertium.
>
> When you start from zero, building a rule-based system
> gives you a useful system quite fast, especially if the
> two languages are similar. A statistics-based system (such
> as Google Translate) requires enormous amounts of
> data to become useful.
>
> It's not something that you can start as a subproject
> within Wiktionary, not even as a separate WMF project.
> It's a very large task.
>
> One naive approach is to base a statistics-based
> machine translator (SMT) on the European Union's
> freely available parallel text corpus. When you try
> to translate Finnish "terve" (which means: hello!)
> into English in such a system, it will say "health",
> since the same word also means health, and EU
> texts only talk about healthcare, never "hello".
>
>
> --
>   Lars Aronsson ([hidden email])
>   Aronsson Datateknik - http://aronsson.se
>
>
>
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



--
http://scottmacleod.com/worlduniversityandschool.htm

This email is intended only for the use of the individual or entity to
which it is addressed and may contain information that is privileged and
confidential. If the reader of this email message is not the intended
recipient, you are hereby notified that any dissemination, distribution, or
copying of this communication is prohibited. If you have received this
email in error, please notify the sender and destroy/delete all copies of
the transmittal. Thank you.
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Do we have any data in wikidata / wiktionary that could be used for mechanic translations?

Gabriel Wicke-3
In reply to this post by Petr Bena
On 05/22/2014 08:41 AM, Petr Bena wrote:

> I was looking for a free (possibly open source) provider of automatic
> translations for my open source application I am working on and quite
> had troubles finding some. Then I realized we have a project called
> "wiktionary" which could possibly (I was assuming it's open
> dictionary) help me here, but I was quite disappointed as I couldn't
> find any simple way to perform simple queries like:
>
> translate "banana" from english to czech
>
> I think that we could (maybe should in spirit of openness and
> wikiness) have some wiki-based web application that would serve this
> purpose - allow people query / translate simple words, but maybe even
> whole phrases. If anyone could edit this, maybe it would grow up into
> huge dictionary of all possible or frequent phrases that could be
> easily translated to any language on world.
>
> Do we already have anything like this?

This is currently being developed:

https://www.mediawiki.org/wiki/Content_translation

It will provide all the tools needed to translate wiki articles, including
dictionary lookup. The back-end service interfaces will be fairly generic &
will use open source tools like dictd and apertium, so might be useful for
non-wiki projects.

You can also use existing commercial APIs of course.

Gabriel

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Do we have any data in wikidata / wiktionary that could be used for mechanic translations?

Petr Bena
In reply to this post by Lars Aronsson
this isn't about translation of content of current wikimedia projects,
but more about creating a generic tool that anyone could use to
translate anything, so not really what [[Content translation]]
describes

On Thu, May 22, 2014 at 6:39 PM, Gabriel Wicke <[hidden email]> wrote:
> This is currently being developed:
>
> https://www.mediawiki.org/wiki/Content_translation
>
> It will provide all the tools needed to translate wiki articles, including
> dictionary lookup. The back-end service interfaces will be fairly generic &
> will use open source tools like dictd and apertium, so might be useful for
> non-wiki projects.

Yes, this statistics based system would be more like what I meant, but
keep in mind that if it was open, so that anyone could contribute on
that database, just like wikipedia is, it would probably collect
enormous amount of data pretty quickly, just as wikipedia did.

On Thu, May 22, 2014 at 6:03 PM, Lars Aronsson <[hidden email]> wrote:
> A statistics-based system (such
> as Google Translate) requires enormous amounts of
> data to become useful.
>
> It's not something that you can start as a subproject
> within Wiktionary, not even as a separate WMF project.
> It's a very large task.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Do we have any data in wikidata / wiktionary that could be used for mechanic translations?

Scott MacLeod
Great ... looks like MediaWiki Content translation and Wiktionary may
provide another important approach to a possible Universal Translator ... :)

Scott





On Thu, May 22, 2014 at 9:48 AM, Petr Bena <[hidden email]> wrote:

> this isn't about translation of content of current wikimedia projects,
> but more about creating a generic tool that anyone could use to
> translate anything, so not really what [[Content translation]]
> describes
>
> On Thu, May 22, 2014 at 6:39 PM, Gabriel Wicke <[hidden email]>
> wrote:
> > This is currently being developed:
> >
> > https://www.mediawiki.org/wiki/Content_translation
> >
> > It will provide all the tools needed to translate wiki articles,
> including
> > dictionary lookup. The back-end service interfaces will be fairly
> generic &
> > will use open source tools like dictd and apertium, so might be useful
> for
> > non-wiki projects.
>
> Yes, this statistics based system would be more like what I meant, but
> keep in mind that if it was open, so that anyone could contribute on
> that database, just like wikipedia is, it would probably collect
> enormous amount of data pretty quickly, just as wikipedia did.
>
> On Thu, May 22, 2014 at 6:03 PM, Lars Aronsson <[hidden email]> wrote:
> > A statistics-based system (such
> > as Google Translate) requires enormous amounts of
> > data to become useful.
> >
> > It's not something that you can start as a subproject
> > within Wiktionary, not even as a separate WMF project.
> > It's a very large task.
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



--
http://scottmacleod.com/worlduniversityandschool.htm

This email is intended only for the use of the individual or entity to
which it is addressed and may contain information that is privileged and
confidential. If the reader of this email message is not the intended
recipient, you are hereby notified that any dissemination, distribution, or
copying of this communication is prohibited. If you have received this
email in error, please notify the sender and destroy/delete all copies of
the transmittal. Thank you.
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Do we have any data in wikidata / wiktionary that could be used for mechanic translations?

Kristian Kankainen
In reply to this post by Lars Aronsson
There exists more free and openly accessible parallell texts beside the
EU ones. One bigger project is OPUS[1], which contains free software
translations and subtitles for example.

Another kind of text that is suitable for statistical machine
translation is comparable texts. They are texts written about the same
thing, but not necessary translation of each other. This kind of text is
harder to align into a translation dictionary model, but this kind of
texts might be easier to find. From one point of view, the whole
Wikipedia with it's language links can be seen as a huge corpus of
comparable texts. There exists free tools for aligning comparable texts,
one that pops into mind right now is Yalign[2], [3]. Another source for
comparable texts is news articles about the same event.

Best wishes!
Kristian

[1] http://opus.lingfil.uu.se/
[2] http://yalign.machinalis.com/
[3] https://github.com/machinalis/yalign

22.05.2014 19:03, Lars Aronsson kirjutas:

> On 05/22/2014 05:41 PM, Petr Bena wrote:
>> I was looking for a free (possibly open source) provider of automatic
>> translations for my open source application I am working on and quite
>> had troubles finding some. Then I realized we have a project called
>> "wiktionary" which could possibly (I was assuming it's open
>> dictionary) help me here, but I was quite disappointed as I couldn't
>> find any simple way to perform simple queries like:
>
> There are several open-source machine translation projects.
> They are either rule-based or statistics-based. One of the
> rule-based projects is Apertium.
>
> When you start from zero, building a rule-based system
> gives you a useful system quite fast, especially if the
> two languages are similar. A statistics-based system (such
> as Google Translate) requires enormous amounts of
> data to become useful.
>
> It's not something that you can start as a subproject
> within Wiktionary, not even as a separate WMF project.
> It's a very large task.
>
> One naive approach is to base a statistics-based
> machine translator (SMT) on the European Union's
> freely available parallel text corpus. When you try
> to translate Finnish "terve" (which means: hello!)
> into English in such a system, it will say "health",
> since the same word also means health, and EU
> texts only talk about healthcare, never "hello".
>
>


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l