Hi All,
I created a tool to extract translations from different editions of Wiktionary. Right now it supports 39 different Wiktionaries. It only extracts translations and ignores the rest. Supported Wiktionaries: Azerbaijani, Bulgarian, Catalan, Czech, Danish, Greek, English, Esperanto, Spanish, Estonian, Basque, Finnish, French, Galician, Hebrew, Croatian, Hungarian, Indonesian, Italian, Georgian, Latin, Lithuanian, Malagasy, Dutch, Norwegian, Occitan, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Serbian, Swedish, Swahili, Turkish, Ukrainian, Vietnamese and Chinese. Adding a new Wiktionary is done via a configuration file. Right now the beta version is available for download at: https://github.com/juditacs/wikt2dict Documentation is in progress, until then the README should be enough to get started. Please test it and send me your feedback and bug reports. Thanks, Judit Ács _______________________________________________ Wiktionary-l mailing list [hidden email] https://lists.wikimedia.org/mailman/listinfo/wiktionary-l |
Great,
Do you plane to add more functions, like generating misceleanous output (ebooks versions, "printable" pdf, etc.) from a dump? The main problem is probably to convert all templates… Le 2013-07-12 13:19, Judit a écrit : > Hi All, > > I created a tool to extract translations from different editions of > Wiktionary. Right now it supports 39 different Wiktionaries. It only > extracts translations and ignores the rest. > > Supported Wiktionaries: > Azerbaijani, Bulgarian, Catalan, Czech, Danish, Greek, English, > Esperanto, > Spanish, Estonian, Basque, Finnish, French, Galician, Hebrew, > Croatian, > Hungarian, Indonesian, Italian, Georgian, Latin, Lithuanian, > Malagasy, > Dutch, Norwegian, Occitan, Polish, Portuguese, Romanian, Russian, > Slovak, > Slovenian, Serbian, Swedish, Swahili, Turkish, Ukrainian, Vietnamese > and > Chinese. > > Adding a new Wiktionary is done via a configuration file. > > Right now the beta version is available for download at: > https://github.com/juditacs/wikt2dict > > Documentation is in progress, until then the README should be enough > to get > started. > > Please test it and send me your feedback and bug reports. > > Thanks, > Judit Ács > _______________________________________________ > Wiktionary-l mailing list > [hidden email] > https://lists.wikimedia.org/mailman/listinfo/wiktionary-l -- Association Culture-Libre http://www.culture-libre.org/ _______________________________________________ Wiktionary-l mailing list [hidden email] https://lists.wikimedia.org/mailman/listinfo/wiktionary-l |
Hi,
I don't plan to generate different output formats as the dictionaries by themselves are more suitable for automated usage than as a normal dictionary but it sounds interesting, I may do it in the future. Since the first version I added a triangulating function that basically tries to build new translation pairs based on the ones extracted from the Wiktionaries. It works reasonably well (85%+ correct manually tested on a few language pairs) and yields many results. I plan to further improve these methods. BTW the data is available on demand (e.g. you send me an email). Judit 2013/7/12 Mathieu Stumpf <[hidden email]> > Great, > > Do you plane to add more functions, like generating misceleanous output > (ebooks versions, "printable" pdf, etc.) from a dump? The main problem is > probably to convert all templates… > > Le 2013-07-12 13:19, Judit a écrit : > >> Hi All, >> >> I created a tool to extract translations from different editions of >> Wiktionary. Right now it supports 39 different Wiktionaries. It only >> extracts translations and ignores the rest. >> >> Supported Wiktionaries: >> Azerbaijani, Bulgarian, Catalan, Czech, Danish, Greek, English, Esperanto, >> Spanish, Estonian, Basque, Finnish, French, Galician, Hebrew, Croatian, >> Hungarian, Indonesian, Italian, Georgian, Latin, Lithuanian, Malagasy, >> Dutch, Norwegian, Occitan, Polish, Portuguese, Romanian, Russian, Slovak, >> Slovenian, Serbian, Swedish, Swahili, Turkish, Ukrainian, Vietnamese and >> Chinese. >> >> Adding a new Wiktionary is done via a configuration file. >> >> Right now the beta version is available for download at: >> https://github.com/juditacs/**wikt2dict<https://github.com/juditacs/wikt2dict> >> >> Documentation is in progress, until then the README should be enough to >> get >> started. >> >> Please test it and send me your feedback and bug reports. >> >> Thanks, >> Judit Ács >> ______________________________**_________________ >> Wiktionary-l mailing list >> [hidden email].**org <[hidden email]> >> https://lists.wikimedia.org/**mailman/listinfo/wiktionary-l<https://lists.wikimedia.org/mailman/listinfo/wiktionary-l> >> > > -- > Association Culture-Libre > http://www.culture-libre.org/ > > ______________________________**_________________ > Wiktionary-l mailing list > [hidden email].**org <[hidden email]> > https://lists.wikimedia.org/**mailman/listinfo/wiktionary-l<https://lists.wikimedia.org/mailman/listinfo/wiktionary-l> > Wiktionary-l mailing list [hidden email] https://lists.wikimedia.org/mailman/listinfo/wiktionary-l |
For those who are not aware of DBpedia Wiktionary [1]
it also supports translations (among many other lexical information) i.e. http://wiktionary.dbpedia.org/page/german-English-Adjective-2en\ It's a little harder to fully configure a new language but you can get a lot more with that For now we support en, de, el, fr & ru and we will happily accept contributions for other languages Best, Dimitris [1] http://wiktionary.dbpedia.org/ On Fri, Jul 12, 2013 at 3:22 PM, Judit, Ács <[hidden email]> wrote: > Hi, > > I don't plan to generate different output formats as the dictionaries by > themselves are more suitable for automated usage than as a normal > dictionary but it sounds interesting, I may do it in the future. > > Since the first version I added a triangulating function that basically > tries to build new translation pairs based on the ones extracted from the > Wiktionaries. It works reasonably well (85%+ correct manually tested on a > few language pairs) and yields many results. I plan to further improve > these methods. > > BTW the data is available on demand (e.g. you send me an email). > > Judit > > > 2013/7/12 Mathieu Stumpf <[hidden email]> > > > Great, > > > > Do you plane to add more functions, like generating misceleanous output > > (ebooks versions, "printable" pdf, etc.) from a dump? The main problem is > > probably to convert all templates… > > > > Le 2013-07-12 13:19, Judit a écrit : > > > >> Hi All, > >> > >> I created a tool to extract translations from different editions of > >> Wiktionary. Right now it supports 39 different Wiktionaries. It only > >> extracts translations and ignores the rest. > >> > >> Supported Wiktionaries: > >> Azerbaijani, Bulgarian, Catalan, Czech, Danish, Greek, English, > Esperanto, > >> Spanish, Estonian, Basque, Finnish, French, Galician, Hebrew, Croatian, > >> Hungarian, Indonesian, Italian, Georgian, Latin, Lithuanian, Malagasy, > >> Dutch, Norwegian, Occitan, Polish, Portuguese, Romanian, Russian, > Slovak, > >> Slovenian, Serbian, Swedish, Swahili, Turkish, Ukrainian, Vietnamese and > >> Chinese. > >> > >> Adding a new Wiktionary is done via a configuration file. > >> > >> Right now the beta version is available for download at: > >> https://github.com/juditacs/**wikt2dict< > https://github.com/juditacs/wikt2dict> > >> > >> Documentation is in progress, until then the README should be enough to > >> get > >> started. > >> > >> Please test it and send me your feedback and bug reports. > >> > >> Thanks, > >> Judit Ács > >> ______________________________**_________________ > >> Wiktionary-l mailing list > >> [hidden email].**org <[hidden email]> > >> https://lists.wikimedia.org/**mailman/listinfo/wiktionary-l< > https://lists.wikimedia.org/mailman/listinfo/wiktionary-l> > >> > > > > -- > > Association Culture-Libre > > http://www.culture-libre.org/ > > > > ______________________________**_________________ > > Wiktionary-l mailing list > > [hidden email].**org <[hidden email]> > > https://lists.wikimedia.org/**mailman/listinfo/wiktionary-l< > https://lists.wikimedia.org/mailman/listinfo/wiktionary-l> > > > _______________________________________________ > Wiktionary-l mailing list > [hidden email] > https://lists.wikimedia.org/mailman/listinfo/wiktionary-l > > -- Dimitris Kontokostas Department of Computer Science, University of Leipzig Research Group: http://aksw.org Homepage:http://aksw.org/DimitrisKontokostas _______________________________________________ Wiktionary-l mailing list [hidden email] https://lists.wikimedia.org/mailman/listinfo/wiktionary-l |
In reply to this post by Judit, Ács
Hi,
I added the support for German Wiktionary, it is available in the newest version. There is a quick test script that should get you 300k+ translations from the German Wiktionary in less than 15 minutes. The dictionaries in 50 languages built using wikt2dict and other resources (parallel and comparable corpora) are available here: http://hlt.sztaki.hu/resources/index.html Please let me know if you find parsing errors. I understand that DBPedia Wiktionary does a lot more than wikt2dict and I do not plan to compete with that. However, adding 35+ Wiktionaries would have been near impossible for me. This a quick (and dirty) way to extract the translations. Cheers, Judit 2013/7/12 Judit, Ács <[hidden email]> > Hi All, > > I created a tool to extract translations from different editions of > Wiktionary. Right now it supports 39 different Wiktionaries. It only > extracts translations and ignores the rest. > > Supported Wiktionaries: > Azerbaijani, Bulgarian, Catalan, Czech, Danish, Greek, English, Esperanto, > Spanish, Estonian, Basque, Finnish, French, Galician, Hebrew, Croatian, > Hungarian, Indonesian, Italian, Georgian, Latin, Lithuanian, Malagasy, > Dutch, Norwegian, Occitan, Polish, Portuguese, Romanian, Russian, Slovak, > Slovenian, Serbian, Swedish, Swahili, Turkish, Ukrainian, Vietnamese and > Chinese. > > Adding a new Wiktionary is done via a configuration file. > > Right now the beta version is available for download at: > https://github.com/juditacs/wikt2dict > > Documentation is in progress, until then the README should be enough to > get started. > > Please test it and send me your feedback and bug reports. > > Thanks, > Judit Ács > Wiktionary-l mailing list [hidden email] https://lists.wikimedia.org/mailman/listinfo/wiktionary-l |
Free forum by Nabble | Edit this page |