Wiktionary parsing ; multiple languages

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

Wiktionary parsing ; multiple languages

Moutupsi Paul
Hi All,



Greeting,



I am a CS grad student from Data Science Lab Stony Brook<https://sites.google.com/site/datascienceslab/> and I am dropping this mail to request information about parsing multi-lingual Wiktionary data. Our lab has been using Wikipedia data for quite a while now but we are really interested in taking advantage of the massive Wiktionary content which we feel , after proper parsing, can become an rich muti-language corpus.



But the big hurdle is a parsing tool. We have tried a few Wiktionary parsing tools



1.       https://github.com/clbecker/perl-wiktionary-parser/

2.       https://code.google.com/p/wikokit/wiki/GettingStartedWiktionaryParser

3.       https://github.com/benreynwar/wiktionary-parser/tree/master/wiktionary_parser

4.       http://www.ukp.tu-darmstadt.de/software/jwktl/



but none of them are available in a ready-to-use or easy-to-extend in multiple language mode. (I am currently trying to work with wikokit (parser 2 above)  )



I request for some advice, suggestion or redirection towards best available Wiktionary parser. We are mainly looking to extract meanings, POS, examples, translations etc. (more can never hurt).



Any help is appreciated. Kindly let know if further information is needed.



Regards,

Moutupsi

_______________________________________________
Wiktionary-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
Reply | Threaded
Open this post in threaded view
|

Re: Wiktionary parsing ; multiple languages

Sebastian Hellmann
Hi Moutupsi,
there are actually some problems, that can be better solved by a
community than by software alone. It took quite some efforts and three
years, but we are very close to really start now.

Since two days, we have a working minimal example for the Wiktionary2RDF
subproject of DBpedia, so the community can really pick it up now.
Main docu is here: http://dbpedia.org/Wiktionary

Now that the software and the linked data and sparql hosting are
working, we will try to find maintainers for each language. DBpedia
already has a vast network for this:
http://wiki.dbpedia.org/Internationalization

I think there will be configs + data for these languages quite soon: ko,
sr, el, es with many more to follow. You are welcome to join in, try to
produce the data you need and give back your results to the community.

There are two views on the software, one for people who just want to use
it and create configs: https://github.com/dbpedia/dbpedia-wiktionary

and for Scala/Java developers:
https://github.com/dbpedia/extraction-framework/tree/master/wiktionary

Data can be found here: http://downloads.dbpedia.org/wiktionary/dumps/

I will write a blog post announcing this soon.

All the best,
Sebastian


Am 04.04.2013 03:21, schrieb Moutupsi Paul:

> Hi All,
>
>
>
> Greeting,
>
>
>
> I am a CS grad student from Data Science Lab Stony Brook<https://sites.google.com/site/datascienceslab/> and I am dropping this mail to request information about parsing multi-lingual Wiktionary data. Our lab has been using Wikipedia data for quite a while now but we are really interested in taking advantage of the massive Wiktionary content which we feel , after proper parsing, can become an rich muti-language corpus.
>
>
>
> But the big hurdle is a parsing tool. We have tried a few Wiktionary parsing tools
>
>
>
> 1.       https://github.com/clbecker/perl-wiktionary-parser/
>
> 2.       https://code.google.com/p/wikokit/wiki/GettingStartedWiktionaryParser
>
> 3.       https://github.com/benreynwar/wiktionary-parser/tree/master/wiktionary_parser
>
> 4.       http://www.ukp.tu-darmstadt.de/software/jwktl/
>
>
>
> but none of them are available in a ready-to-use or easy-to-extend in multiple language mode. (I am currently trying to work with wikokit (parser 2 above)  )
>
>
>
> I request for some advice, suggestion or redirection towards best available Wiktionary parser. We are mainly looking to extract meanings, POS, examples, translations etc. (more can never hurt).
>
>
>
> Any help is appreciated. Kindly let know if further information is needed.
>
>
>
> Regards,
>
> Moutupsi
>
> _______________________________________________
> Wiktionary-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
>


--
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Projects: http://nlp2rdf.org , http://linguistics.okfn.org ,
http://dbpedia.org/Wiktionary , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org

_______________________________________________
Wiktionary-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
Reply | Threaded
Open this post in threaded view
|

Re: Wiktionary parsing ; multiple languages

Dimitris Kontokostas
In reply to this post by Moutupsi Paul
Hi Moutupsi,

You should definitely take look at DBpedia Wiktionary (
http://dbpedia.org/Wiktionary).
It supports everything you want and can be easily configured for other
languages.

Best,
Dimitris


On Thu, Apr 4, 2013 at 4:21 AM, Moutupsi Paul <[hidden email]>wrote:

> Hi All,
>
>
>
> Greeting,
>
>
>
> I am a CS grad student from Data Science Lab Stony Brook<
> https://sites.google.com/site/datascienceslab/> and I am dropping this
> mail to request information about parsing multi-lingual Wiktionary data.
> Our lab has been using Wikipedia data for quite a while now but we are
> really interested in taking advantage of the massive Wiktionary content
> which we feel , after proper parsing, can become an rich muti-language
> corpus.
>
>
>
> But the big hurdle is a parsing tool. We have tried a few Wiktionary
> parsing tools
>
>
>
> 1.       https://github.com/clbecker/perl-wiktionary-parser/
>
> 2.
> https://code.google.com/p/wikokit/wiki/GettingStartedWiktionaryParser
>
> 3.
> https://github.com/benreynwar/wiktionary-parser/tree/master/wiktionary_parser
>
> 4.       http://www.ukp.tu-darmstadt.de/software/jwktl/
>
>
>
> but none of them are available in a ready-to-use or easy-to-extend in
> multiple language mode. (I am currently trying to work with wikokit (parser
> 2 above)  )
>
>
>
> I request for some advice, suggestion or redirection towards best
> available Wiktionary parser. We are mainly looking to extract meanings,
> POS, examples, translations etc. (more can never hurt).
>
>
>
> Any help is appreciated. Kindly let know if further information is needed.
>
>
>
> Regards,
>
> Moutupsi
>
> _______________________________________________
> Wiktionary-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
>
> --
> Dimitris Kontokostas
> Department of Computer Science, University of Leipzig
> Research Group: http://aksw.org
> Homepage:http://aksw.org/DimitrisKontokostas
> <https://lists.wikimedia.org/mailman/listinfo/wiktionary-l>
_______________________________________________
Wiktionary-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
Reply | Threaded
Open this post in threaded view
|

Re: Wiktionary parsing ; multiple languages

Andrew Krizhanovsky
DBpedia Wiktionary - is very interesting project!

Is it possible to get list of synonyms for the first meaning of the
noun "dog" now?
http://en.wiktionary.org/wiki/dog#Synonyms

Best regards,
Andrew Krizhanovsky.

On Fri, Apr 5, 2013 at 11:05 AM, Dimitris Kontokostas
<[hidden email]> wrote:

> Hi Moutupsi,
>
> You should definitely take look at DBpedia Wiktionary (
> http://dbpedia.org/Wiktionary).
> It supports everything you want and can be easily configured for other
> languages.
>
> Best,
> Dimitris
>
>
> On Thu, Apr 4, 2013 at 4:21 AM, Moutupsi Paul <[hidden email]>wrote:
>
>> Hi All,
>>
>>
>>
>> Greeting,
>>
>>
>>
>> I am a CS grad student from Data Science Lab Stony Brook<
>> https://sites.google.com/site/datascienceslab/> and I am dropping this
>> mail to request information about parsing multi-lingual Wiktionary data.
>> Our lab has been using Wikipedia data for quite a while now but we are
>> really interested in taking advantage of the massive Wiktionary content
>> which we feel , after proper parsing, can become an rich muti-language
>> corpus.
>>
>>
>>
>> But the big hurdle is a parsing tool. We have tried a few Wiktionary
>> parsing tools
>>
>>
>>
>> 1.       https://github.com/clbecker/perl-wiktionary-parser/
>>
>> 2.
>> https://code.google.com/p/wikokit/wiki/GettingStartedWiktionaryParser
>>
>> 3.
>> https://github.com/benreynwar/wiktionary-parser/tree/master/wiktionary_parser
>>
>> 4.       http://www.ukp.tu-darmstadt.de/software/jwktl/
>>
>>
>>
>> but none of them are available in a ready-to-use or easy-to-extend in
>> multiple language mode. (I am currently trying to work with wikokit (parser
>> 2 above)  )
>>
>>
>>
>> I request for some advice, suggestion or redirection towards best
>> available Wiktionary parser. We are mainly looking to extract meanings,
>> POS, examples, translations etc. (more can never hurt).
>>
>>
>>
>> Any help is appreciated. Kindly let know if further information is needed.
>>
>>
>>
>> Regards,
>>
>> Moutupsi
>>
>> _______________________________________________
>> Wiktionary-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
>>
>> --
>> Dimitris Kontokostas
>> Department of Computer Science, University of Leipzig
>> Research Group: http://aksw.org
>> Homepage:http://aksw.org/DimitrisKontokostas
>> <https://lists.wikimedia.org/mailman/listinfo/wiktionary-l>
> _______________________________________________
> Wiktionary-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiktionary-l

_______________________________________________
Wiktionary-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
Reply | Threaded
Open this post in threaded view
|

Re: Wiktionary parsing ; multiple languages

mathieu lovato stumpf guntz
In reply to this post by Sebastian Hellmann
Le 2013-04-05 08:50, Sebastian Hellmann a écrit :
> Hi Moutupsi,
> there are actually some problems, that can be better solved by a
> community than by software alone. It took quite some efforts and
> three
> years, but we are very close to really start now.

I added the dbpedia wiktionary entry on [1]. I wasn't aware of your
effort, despite being really interesting in the wiktionary future. Could
you please  read [1] and update it with your vision as a dbpedia
contributor?

[1] https://meta.wikimedia.org/wiki/Wiktionary_future

>
> Since two days, we have a working minimal example for the
> Wiktionary2RDF subproject of DBpedia, so the community can really
> pick
> it up now.
> Main docu is here: http://dbpedia.org/Wiktionary
>
> Now that the software and the linked data and sparql hosting are
> working, we will try to find maintainers for each language. DBpedia
> already has a vast network for this:
> http://wiki.dbpedia.org/Internationalization
>
> I think there will be configs + data for these languages quite soon:
> ko, sr, el, es with many more to follow. You are welcome to join in,
> try to produce the data you need and give back your results to the
> community.
>
> There are two views on the software, one for people who just want to
> use it and create configs:
> https://github.com/dbpedia/dbpedia-wiktionary
>
> and for Scala/Java developers:
>
> https://github.com/dbpedia/extraction-framework/tree/master/wiktionary
>
> Data can be found here:
> http://downloads.dbpedia.org/wiktionary/dumps/
>
> I will write a blog post announcing this soon.
>
> All the best,
> Sebastian
>
>
> Am 04.04.2013 03:21, schrieb Moutupsi Paul:
>> Hi All,
>>
>>
>>
>> Greeting,
>>
>>
>>
>> I am a CS grad student from Data Science Lab Stony
>> Brook<https://sites.google.com/site/datascienceslab/> and I am
>> dropping this mail to request information about parsing multi-lingual
>> Wiktionary data. Our lab has been using Wikipedia data for quite a
>> while now but we are really interested in taking advantage of the
>> massive Wiktionary content which we feel , after proper parsing, can
>> become an rich muti-language corpus.
>>
>>
>>
>> But the big hurdle is a parsing tool. We have tried a few Wiktionary
>> parsing tools
>>
>>
>>
>> 1.       https://github.com/clbecker/perl-wiktionary-parser/
>>
>> 2.      
>> https://code.google.com/p/wikokit/wiki/GettingStartedWiktionaryParser
>>
>> 3.      
>> https://github.com/benreynwar/wiktionary-parser/tree/master/wiktionary_parser
>>
>> 4.       http://www.ukp.tu-darmstadt.de/software/jwktl/
>>
>>
>>
>> but none of them are available in a ready-to-use or easy-to-extend
>> in multiple language mode. (I am currently trying to work with wikokit
>> (parser 2 above)  )
>>
>>
>>
>> I request for some advice, suggestion or redirection towards best
>> available Wiktionary parser. We are mainly looking to extract
>> meanings, POS, examples, translations etc. (more can never hurt).
>>
>>
>>
>> Any help is appreciated. Kindly let know if further information is
>> needed.
>>
>>
>>
>> Regards,
>>
>> Moutupsi
>>
>> _______________________________________________
>> Wiktionary-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
>>

--
Association Culture-Libre
http://www.culture-libre.org/

_______________________________________________
Wiktionary-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
Reply | Threaded
Open this post in threaded view
|

Re: Wiktionary parsing ; multiple languages

Sebastian Hellmann
In reply to this post by Andrew Krizhanovsky
Hi Andrew,
actually the tools to solve this problem are in place:
http://en.wiktionary.org/wiki/house#English-abode
links to a sense, the highlighting is there, also if you go to Editing
Gadgets you can enable  "Enable definition editing options." to add
glosses. This was created by Yair_rand and it allows you to connect
senses with the help of glosses such as "abode".

However, this has not received any uptake by the Wiktionary community.

The idea is to  have something like (on
http://en.wiktionary.org/wiki/house#English-establishment)
# {{senseid|en|establishment}}An [[establishment]], whether actual, as a
pub, or virtual, as a website. Particularly restaurant, casino, or
financial or trading company.
...
*  {{sense|establishment}} [[shop]]
...
{{trans-top|an establishment}}

But these do not occur frequently. For senses these seem to be available
however:
http://wiktionary.dbpedia.org/resource/as_soon_as_possible-English-Adverb-1en

Query:
http://wiktionary.dbpedia.org/sparql
select * where {Graph ?g {?s
<http://wiktionary.dbpedia.org/terms/hasSynonym> ?o } } limit 100

All the best,
Sebastian

Am 05.04.2013 11:23, schrieb Andrew Krizhanovsky:

> DBpedia Wiktionary - is very interesting project!
>
> Is it possible to get list of synonyms for the first meaning of the
> noun "dog" now?
> http://en.wiktionary.org/wiki/dog#Synonyms
>
> Best regards,
> Andrew Krizhanovsky.
>
> On Fri, Apr 5, 2013 at 11:05 AM, Dimitris Kontokostas
> <[hidden email]> wrote:
>> Hi Moutupsi,
>>
>> You should definitely take look at DBpedia Wiktionary (
>> http://dbpedia.org/Wiktionary).
>> It supports everything you want and can be easily configured for other
>> languages.
>>
>> Best,
>> Dimitris
>>
>>
>> On Thu, Apr 4, 2013 at 4:21 AM, Moutupsi Paul <[hidden email]>wrote:
>>
>>> Hi All,
>>>
>>>
>>>
>>> Greeting,
>>>
>>>
>>>
>>> I am a CS grad student from Data Science Lab Stony Brook<
>>> https://sites.google.com/site/datascienceslab/> and I am dropping this
>>> mail to request information about parsing multi-lingual Wiktionary data.
>>> Our lab has been using Wikipedia data for quite a while now but we are
>>> really interested in taking advantage of the massive Wiktionary content
>>> which we feel , after proper parsing, can become an rich muti-language
>>> corpus.
>>>
>>>
>>>
>>> But the big hurdle is a parsing tool. We have tried a few Wiktionary
>>> parsing tools
>>>
>>>
>>>
>>> 1.       https://github.com/clbecker/perl-wiktionary-parser/
>>>
>>> 2.
>>> https://code.google.com/p/wikokit/wiki/GettingStartedWiktionaryParser
>>>
>>> 3.
>>> https://github.com/benreynwar/wiktionary-parser/tree/master/wiktionary_parser
>>>
>>> 4.       http://www.ukp.tu-darmstadt.de/software/jwktl/
>>>
>>>
>>>
>>> but none of them are available in a ready-to-use or easy-to-extend in
>>> multiple language mode. (I am currently trying to work with wikokit (parser
>>> 2 above)  )
>>>
>>>
>>>
>>> I request for some advice, suggestion or redirection towards best
>>> available Wiktionary parser. We are mainly looking to extract meanings,
>>> POS, examples, translations etc. (more can never hurt).
>>>
>>>
>>>
>>> Any help is appreciated. Kindly let know if further information is needed.
>>>
>>>
>>>
>>> Regards,
>>>
>>> Moutupsi
>>>
>>> _______________________________________________
>>> Wiktionary-l mailing list
>>> [hidden email]
>>> https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
>>>
>>> --
>>> Dimitris Kontokostas
>>> Department of Computer Science, University of Leipzig
>>> Research Group: http://aksw.org
>>> Homepage:http://aksw.org/DimitrisKontokostas
>>> <https://lists.wikimedia.org/mailman/listinfo/wiktionary-l>
>> _______________________________________________
>> Wiktionary-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
> _______________________________________________
> Wiktionary-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
>


--
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Projects: http://nlp2rdf.org , http://linguistics.okfn.org ,
http://dbpedia.org/Wiktionary , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org

_______________________________________________
Wiktionary-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
Reply | Threaded
Open this post in threaded view
|

Re: Wiktionary parsing ; multiple languages

Andrew Krizhanovsky
Thank Sebastian, for quick reply.

>> But these do not occur frequently. For senses these seem to be available however...

Can you count - how many senses and synonyms were successfully
extracted from English Wiktionary and Russian Wiktionary,
i.e. how many senses and synonyms are available now in DBpedia Wiktionary?

It will be interesting to compare with number of senses and synonyms
extracted from Wiktionaries by wikokit parser,
see http://code.google.com/p/wikokit/#Statistics

Best regards,
Andrew.

On Fri, Apr 5, 2013 at 5:57 PM, Sebastian Hellmann
<[hidden email]> wrote:

> Hi Andrew,
> actually the tools to solve this problem are in place:
> http://en.wiktionary.org/wiki/house#English-abode
> links to a sense, the highlighting is there, also if you go to Editing
> Gadgets you can enable  "Enable definition editing options." to add glosses.
> This was created by Yair_rand and it allows you to connect senses with the
> help of glosses such as "abode".
>
> However, this has not received any uptake by the Wiktionary community.
>
> The idea is to  have something like (on
> http://en.wiktionary.org/wiki/house#English-establishment)
> # {{senseid|en|establishment}}An [[establishment]], whether actual, as a
> pub, or virtual, as a website. Particularly restaurant, casino, or financial
> or trading company.
> ...
> *  {{sense|establishment}} [[shop]]
> ...
> {{trans-top|an establishment}}
>
> But these do not occur frequently. For senses these seem to be available
> however:
> http://wiktionary.dbpedia.org/resource/as_soon_as_possible-English-Adverb-1en
>
> Query:
> http://wiktionary.dbpedia.org/sparql
> select * where {Graph ?g {?s
> <http://wiktionary.dbpedia.org/terms/hasSynonym> ?o } } limit 100
>
> All the best,
> Sebastian
>
> Am 05.04.2013 11:23, schrieb Andrew Krizhanovsky:
>
>> DBpedia Wiktionary - is very interesting project!
>>
>> Is it possible to get list of synonyms for the first meaning of the
>> noun "dog" now?
>> http://en.wiktionary.org/wiki/dog#Synonyms
>>
>> Best regards,
>> Andrew Krizhanovsky.
>>
>> On Fri, Apr 5, 2013 at 11:05 AM, Dimitris Kontokostas
>> <[hidden email]> wrote:
>>>
>>> Hi Moutupsi,
>>>
>>> You should definitely take look at DBpedia Wiktionary (
>>> http://dbpedia.org/Wiktionary).
>>> It supports everything you want and can be easily configured for other
>>> languages.
>>>
>>> Best,
>>> Dimitris
>>>
>>>
>>> On Thu, Apr 4, 2013 at 4:21 AM, Moutupsi Paul
>>> <[hidden email]>wrote:
>>>
>>>> Hi All,
>>>>
>>>>
>>>>
>>>> Greeting,
>>>>
>>>>
>>>>
>>>> I am a CS grad student from Data Science Lab Stony Brook<
>>>> https://sites.google.com/site/datascienceslab/> and I am dropping this
>>>> mail to request information about parsing multi-lingual Wiktionary data.
>>>> Our lab has been using Wikipedia data for quite a while now but we are
>>>> really interested in taking advantage of the massive Wiktionary content
>>>> which we feel , after proper parsing, can become an rich muti-language
>>>> corpus.
>>>>
>>>>
>>>>
>>>> But the big hurdle is a parsing tool. We have tried a few Wiktionary
>>>> parsing tools
>>>>
>>>>
>>>>
>>>> 1.       https://github.com/clbecker/perl-wiktionary-parser/
>>>>
>>>> 2.
>>>> https://code.google.com/p/wikokit/wiki/GettingStartedWiktionaryParser
>>>>
>>>> 3.
>>>>
>>>> https://github.com/benreynwar/wiktionary-parser/tree/master/wiktionary_parser
>>>>
>>>> 4.       http://www.ukp.tu-darmstadt.de/software/jwktl/
>>>>
>>>>
>>>>
>>>> but none of them are available in a ready-to-use or easy-to-extend in
>>>> multiple language mode. (I am currently trying to work with wikokit
>>>> (parser
>>>> 2 above)  )
>>>>
>>>>
>>>>
>>>> I request for some advice, suggestion or redirection towards best
>>>> available Wiktionary parser. We are mainly looking to extract meanings,
>>>> POS, examples, translations etc. (more can never hurt).
>>>>
>>>>
>>>>
>>>> Any help is appreciated. Kindly let know if further information is
>>>> needed.
>>>>
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Moutupsi
>>>>
>>>> _______________________________________________
>>>> Wiktionary-l mailing list
>>>> [hidden email]
>>>> https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
>>>>
>>>> --
>>>> Dimitris Kontokostas
>>>> Department of Computer Science, University of Leipzig
>>>> Research Group: http://aksw.org
>>>> Homepage:http://aksw.org/DimitrisKontokostas
>>>> <https://lists.wikimedia.org/mailman/listinfo/wiktionary-l>
>>>
>>> _______________________________________________
>>> Wiktionary-l mailing list
>>> [hidden email]
>>> https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
>>
>> _______________________________________________
>> Wiktionary-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
>>
>
>
> --
> Dipl. Inf. Sebastian Hellmann
>
> Department of Computer Science, University of Leipzig
> Projects: http://nlp2rdf.org , http://linguistics.okfn.org ,
> http://dbpedia.org/Wiktionary , http://dbpedia.org
> Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
> Research Group: http://aksw.org

_______________________________________________
Wiktionary-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
Reply | Threaded
Open this post in threaded view
|

Re: Wiktionary parsing ; multiple languages

Sebastian Hellmann
In reply to this post by mathieu lovato stumpf guntz
Hi Mathieu,

Am 05.04.2013 11:56, schrieb Mathieu Stumpf:
> I added the dbpedia wiktionary entry on [1]. I wasn't aware of your
> effort, despite being really interesting in the wiktionary future.
> Could you please  read [1] and update it with your vision as a dbpedia
> contributor?
>
> [1] https://meta.wikimedia.org/wiki/Wiktionary_future
>

this page is interesting, but seems to be very idealistic. I am not
sure, every language community agrees to use a common model. I also
wonder if this is possible at all and whether there is an overlap. Do
you think it makes sense to edit that page? Normally, there is a lot of
talk and planning and nothing comes around in the end.

Note that the good thing about Wiktionary is, that you can add
information freely without adhering to a preset structure.

DBpedia is already implementing adapters to load data from WikiData. So
Once WikiData is working for Wiktionary, we will have data from there
and from the remaining Wikisyntax and merge them.DBpedia and WikiData
have a loose cooperation for a joint task in a Google Summer of Code
proposal.

All the best,
Sebastian




--
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Projects: http://nlp2rdf.org , http://linguistics.okfn.org ,
http://dbpedia.org/Wiktionary , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org

_______________________________________________
Wiktionary-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
Reply | Threaded
Open this post in threaded view
|

Re: Wiktionary parsing ; multiple languages

Amgine-3
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

There are several Wiktionary proposals for GSOC. I'm aware of another
for pronunciation recording from wiktionary pages, and one to create a
DICT-like api either as an extension to MW api or as a special page
extension.

Amgine


On 05/04/13 10:44 AM, Sebastian Hellmann wrote:

> Hi Mathieu,
>
> Am 05.04.2013 11:56, schrieb Mathieu Stumpf:
>> I added the dbpedia wiktionary entry on [1]. I wasn't aware of
>> your effort, despite being really interesting in the wiktionary
>> future. Could you please  read [1] and update it with your vision
>> as a dbpedia contributor?
>>
>> [1] https://meta.wikimedia.org/wiki/Wiktionary_future
>>
>
> this page is interesting, but seems to be very idealistic. I am
> not sure, every language community agrees to use a common model. I
> also wonder if this is possible at all and whether there is an
> overlap. Do you think it makes sense to edit that page? Normally,
> there is a lot of talk and planning and nothing comes around in the
> end.
>
> Note that the good thing about Wiktionary is, that you can add
> information freely without adhering to a preset structure.
>
> DBpedia is already implementing adapters to load data from
> WikiData. So Once WikiData is working for Wiktionary, we will have
> data from there and from the remaining Wikisyntax and merge
> them.DBpedia and WikiData have a loose cooperation for a joint task
> in a Google Summer of Code proposal.
>
> All the best, Sebastian
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJRXxx2AAoJEBGze5c9ley6M4QH/1PdSnexBUBj+8BWr8LMBrao
WJAzSwMowGsxi+27DcC1VxqWocGgFEbiJ8OTezN47SbcDpQu1QAQOIvq/iU0fgeE
zNdV8zLf2C+BH4Ods1Qm6LcPi3efWx4GHtr07BQjmUB/1iW2qZ1adyPu32C6SfTU
hsmEnYxDFAXoXSnfJtTZN8SFC4licZykHzJMQke2nibexVPfbkv4s202pCU+Uey1
YyZkWYFzw8cDInODME2OgHIbzEiACq99bsrB2U+1p/aikIt1p5qsBG7k2qkuMUaA
XoIF8EvjVt2dkuwTnVCeK8O1XlizgaDmx7uURZOMO7CCTGBqB845zUNowvvveCM=
=okg4
-----END PGP SIGNATURE-----

_______________________________________________
Wiktionary-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
Reply | Threaded
Open this post in threaded view
|

Re: Wiktionary parsing ; multiple languages

Sebastian Hellmann
Hi Amgine,
I think these are just ideas for now and the students still have to
break it down into proposals, right?
Do you have links to these ideas?

Our Wiktionary related ideas for GSoC are here:
http://wiki.dbpedia.org/gsoc2013/ideas#h254-12
http://wiki.dbpedia.org/gsoc2013/ideas#h254-13

-- Sebastian


Am 05.04.2013 20:48, schrieb Amgine:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> There are several Wiktionary proposals for GSOC. I'm aware of another
> for pronunciation recording from wiktionary pages, and one to create a
> DICT-like api either as an extension to MW api or as a special page
> extension.
>
> Amgine
>
>
> On 05/04/13 10:44 AM, Sebastian Hellmann wrote:
>> Hi Mathieu,
>>
>> Am 05.04.2013 11:56, schrieb Mathieu Stumpf:
>>> I added the dbpedia wiktionary entry on [1]. I wasn't aware of
>>> your effort, despite being really interesting in the wiktionary
>>> future. Could you please  read [1] and update it with your vision
>>> as a dbpedia contributor?
>>>
>>> [1] https://meta.wikimedia.org/wiki/Wiktionary_future
>>>
>> this page is interesting, but seems to be very idealistic. I am
>> not sure, every language community agrees to use a common model. I
>> also wonder if this is possible at all and whether there is an
>> overlap. Do you think it makes sense to edit that page? Normally,
>> there is a lot of talk and planning and nothing comes around in the
>> end.
>>
>> Note that the good thing about Wiktionary is, that you can add
>> information freely without adhering to a preset structure.
>>
>> DBpedia is already implementing adapters to load data from
>> WikiData. So Once WikiData is working for Wiktionary, we will have
>> data from there and from the remaining Wikisyntax and merge
>> them.DBpedia and WikiData have a loose cooperation for a joint task
>> in a Google Summer of Code proposal.
>>
>> All the best, Sebastian
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.12 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iQEcBAEBAgAGBQJRXxx2AAoJEBGze5c9ley6M4QH/1PdSnexBUBj+8BWr8LMBrao
> WJAzSwMowGsxi+27DcC1VxqWocGgFEbiJ8OTezN47SbcDpQu1QAQOIvq/iU0fgeE
> zNdV8zLf2C+BH4Ods1Qm6LcPi3efWx4GHtr07BQjmUB/1iW2qZ1adyPu32C6SfTU
> hsmEnYxDFAXoXSnfJtTZN8SFC4licZykHzJMQke2nibexVPfbkv4s202pCU+Uey1
> YyZkWYFzw8cDInODME2OgHIbzEiACq99bsrB2U+1p/aikIt1p5qsBG7k2qkuMUaA
> XoIF8EvjVt2dkuwTnVCeK8O1XlizgaDmx7uURZOMO7CCTGBqB845zUNowvvveCM=
> =okg4
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> Wiktionary-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
>


--
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Projects: http://nlp2rdf.org , http://linguistics.okfn.org ,
http://dbpedia.org/Wiktionary , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org

_______________________________________________
Wiktionary-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
Reply | Threaded
Open this post in threaded view
|

Re: Wiktionary parsing ; multiple languages

Amgine-3
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

One is being written up by the student. The other is a project idea
still looking for a student.

Amgine



On 05/04/13 03:25 PM, Sebastian Hellmann wrote:

> Hi Amgine, I think these are just ideas for now and the students
> still have to break it down into proposals, right? Do you have
> links to these ideas?
>
> Our Wiktionary related ideas for GSoC are here:
> http://wiki.dbpedia.org/gsoc2013/ideas#h254-12 
> http://wiki.dbpedia.org/gsoc2013/ideas#h254-13
>
> -- Sebastian
>
>
> Am 05.04.2013 20:48, schrieb Amgine: There are several Wiktionary
> proposals for GSOC. I'm aware of another for pronunciation
> recording from wiktionary pages, and one to create a DICT-like api
> either as an extension to MW api or as a special page extension.
>
> Amgine
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJRX1gYAAoJEBGze5c9ley6ciIH/RVCBcqK4yNLZGXHMGcZQk7u
Pi9Yk4+GJ6vV/ayFNZDyqqvxYyAdu9D0/CJwPIjvAWrIVG2Xj7JLWM9l1liGgjgJ
r85UFHKODk3Z3O9dkcieAKQcIBDn8UJjNACvep3f2JPmlOjLJeXLtM+0Jgo6sHvX
gGjHqBZx3lwnbdDFKRgO5sxCOOQPvn4vstJ5wfAVnUVpCwqP3dkhNOI+m8luNvBZ
OchGrxKlNGt8JxDvwW7Z530v5/EKtyl2UUJjXuxw/BBUWu/EIv61jiloVDJMOW/R
6icCYdu84xv5t+fl2r4s/sVgP8VtfhirH+CUd+CuEkhrm3XH+PoTA03XsmgkshU=
=RZBR
-----END PGP SIGNATURE-----

_______________________________________________
Wiktionary-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
Reply | Threaded
Open this post in threaded view
|

Re: Wiktionary parsing ; multiple languages

Sebastian Hellmann
In reply to this post by Andrew Krizhanovsky
Hi Andrew,
some statistics are in here:
http://svn.aksw.org/papers/2012/JIST_Wiktionary/public.pdf

I executed a SPARQL query on the store to do these statistics:
http://downloads.dbpedia.org/wiktionary/stats_2013_04_06.csv

We tried to honor ELE[1] for extraction, so most likely, if the the
Wiktionary page deviates from ELE, then results are not so good for it.


I assume you are familiar with SPARQL, because of your D2R mapping for
wikokit. Here is the query:
Select ?g ?p count(?p) as ?count  where { Graph ?g { ?s ?p ?o } } group
by ?p ?g order by desc (?g) desc(?count)
It takes to long to run over http. If you are interested in more
difficult statistics and calculations, I can also give you better access
to our service (maybe even ssh access).

All the best,
Sebastian

[1] https://en.wiktionary.org/wiki/Wiktionary:Entry_layout_explained

Am 05.04.2013 18:13, schrieb Andrew Krizhanovsky:

> Thank Sebastian, for quick reply.
>
>>> But these do not occur frequently. For senses these seem to be available however...
> Can you count - how many senses and synonyms were successfully
> extracted from English Wiktionary and Russian Wiktionary,
> i.e. how many senses and synonyms are available now in DBpedia Wiktionary?
>
> It will be interesting to compare with number of senses and synonyms
> extracted from Wiktionaries by wikokit parser,
> seehttp://code.google.com/p/wikokit/#Statistics
>
> Best regards,
> Andrew.
>
> On Fri, Apr 5, 2013 at 5:57 PM, Sebastian Hellmann
> <[hidden email]>  wrote:
>> Hi Andrew,
>> actually the tools to solve this problem are in place:
>> http://en.wiktionary.org/wiki/house#English-abode
>> links to a sense, the highlighting is there, also if you go to Editing
>> Gadgets you can enable  "Enable definition editing options." to add glosses.
>> This was created by Yair_rand and it allows you to connect senses with the
>> help of glosses such as "abode".
>>
>> However, this has not received any uptake by the Wiktionary community.
>>
>> The idea is to  have something like (on
>> http://en.wiktionary.org/wiki/house#English-establishment)
>> # {{senseid|en|establishment}}An [[establishment]], whether actual, as a
>> pub, or virtual, as a website. Particularly restaurant, casino, or financial
>> or trading company.
>> ...
>> *  {{sense|establishment}} [[shop]]
>> ...
>> {{trans-top|an establishment}}
>>
>> But these do not occur frequently. For senses these seem to be available
>> however:
>> http://wiktionary.dbpedia.org/resource/as_soon_as_possible-English-Adverb-1en
>>
>> Query:
>> http://wiktionary.dbpedia.org/sparql
>> select * where {Graph ?g {?s
>> <http://wiktionary.dbpedia.org/terms/hasSynonym>  ?o } } limit 100
>>
>> All the best,
>> Sebastian
>>
>> Am 05.04.2013 11:23, schrieb Andrew Krizhanovsky:
>>
>>> DBpedia Wiktionary - is very interesting project!
>>>
>>> Is it possible to get list of synonyms for the first meaning of the
>>> noun "dog" now?
>>> http://en.wiktionary.org/wiki/dog#Synonyms
>>>
>>> Best regards,
>>> Andrew Krizhanovsky.
>>>
>>> On Fri, Apr 5, 2013 at 11:05 AM, Dimitris Kontokostas
>>> <[hidden email]>  wrote:
>>>> Hi Moutupsi,
>>>>
>>>> You should definitely take look at DBpedia Wiktionary (
>>>> http://dbpedia.org/Wiktionary).
>>>> It supports everything you want and can be easily configured for other
>>>> languages.
>>>>
>>>> Best,
>>>> Dimitris
>>>>
>>>>
>>>> On Thu, Apr 4, 2013 at 4:21 AM, Moutupsi Paul
>>>> <[hidden email]>wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>>
>>>>>
>>>>> Greeting,
>>>>>
>>>>>
>>>>>
>>>>> I am a CS grad student from Data Science Lab Stony Brook<
>>>>> https://sites.google.com/site/datascienceslab/>  and I am dropping this
>>>>> mail to request information about parsing multi-lingual Wiktionary data.
>>>>> Our lab has been using Wikipedia data for quite a while now but we are
>>>>> really interested in taking advantage of the massive Wiktionary content
>>>>> which we feel , after proper parsing, can become an rich muti-language
>>>>> corpus.
>>>>>
>>>>>
>>>>>
>>>>> But the big hurdle is a parsing tool. We have tried a few Wiktionary
>>>>> parsing tools
>>>>>
>>>>>
>>>>>
>>>>> 1.https://github.com/clbecker/perl-wiktionary-parser/
>>>>>
>>>>> 2.
>>>>> https://code.google.com/p/wikokit/wiki/GettingStartedWiktionaryParser
>>>>>
>>>>> 3.
>>>>>
>>>>> https://github.com/benreynwar/wiktionary-parser/tree/master/wiktionary_parser
>>>>>
>>>>> 4.http://www.ukp.tu-darmstadt.de/software/jwktl/
>>>>>
>>>>>
>>>>>
>>>>> but none of them are available in a ready-to-use or easy-to-extend in
>>>>> multiple language mode. (I am currently trying to work with wikokit
>>>>> (parser
>>>>> 2 above)  )
>>>>>
>>>>>
>>>>>
>>>>> I request for some advice, suggestion or redirection towards best
>>>>> available Wiktionary parser. We are mainly looking to extract meanings,
>>>>> POS, examples, translations etc. (more can never hurt).
>>>>>
>>>>>
>>>>>
>>>>> Any help is appreciated. Kindly let know if further information is
>>>>> needed.
>>>>>
>>>>>
>>>>>
>>>>> Regards,
>>>>>
>>>>> Moutupsi
>>>>>
>>>>> _______________________________________________
>>>>> Wiktionary-l mailing list
>>>>> [hidden email]
>>>>> https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
>>>>>
>>>>> --
>>>>> Dimitris Kontokostas
>>>>> Department of Computer Science, University of Leipzig
>>>>> Research Group:http://aksw.org
>>>>> Homepage:http://aksw.org/DimitrisKontokostas
>>>>> <https://lists.wikimedia.org/mailman/listinfo/wiktionary-l>
>>>> _______________________________________________
>>>> Wiktionary-l mailing list
>>>> [hidden email]
>>>> https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
>>> _______________________________________________
>>> Wiktionary-l mailing list
>>> [hidden email]
>>> https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
>>>
>> --
>> Dipl. Inf. Sebastian Hellmann
>>
>> Department of Computer Science, University of Leipzig
>> Projects:http://nlp2rdf.org  ,http://linguistics.okfn.org  ,
>> http://dbpedia.org/Wiktionary  ,http://dbpedia.org
>> Homepage:http://bis.informatik.uni-leipzig.de/SebastianHellmann
>> Research Group:http://aksw.org


--
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Projects: http://nlp2rdf.org , http://linguistics.okfn.org ,
http://dbpedia.org/Wiktionary , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org

_______________________________________________
Wiktionary-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
Reply | Threaded
Open this post in threaded view
|

Re: Wiktionary parsing ; multiple languages

Andrew Krizhanovsky
Thank you for the paper. I like the overview in this paper and the
clear description of Wiktionary parsing difficulties.

In the beginning of the wikokit development I thought about
Finite-state machine in order to extract data, but it was very complex
for me, and Wiktionary data formatting are too various in kind or
quality :) So, I selected usual procedural programming with short
pieces of regular expressions.

But you project proves that Finite-state machines could be used in
non-trivial situations. Great!

-- Andrew Krizhanovsky.

On Sun, Apr 7, 2013 at 8:53 AM, Sebastian Hellmann
<[hidden email]> wrote:

> Hi Andrew,
> some statistics are in here:
> http://svn.aksw.org/papers/2012/JIST_Wiktionary/public.pdf
>
> I executed a SPARQL query on the store to do these statistics:
> http://downloads.dbpedia.org/wiktionary/stats_2013_04_06.csv
>
> We tried to honor ELE[1] for extraction, so most likely, if the the
> Wiktionary page deviates from ELE, then results are not so good for it.
>
>
> I assume you are familiar with SPARQL, because of your D2R mapping for
> wikokit. Here is the query:
> Select ?g ?p count(?p) as ?count  where { Graph ?g { ?s ?p ?o } } group by
> ?p ?g order by desc (?g) desc(?count)
> It takes to long to run over http. If you are interested in more difficult
> statistics and calculations, I can also give you better access to our
> service (maybe even ssh access).
>
> All the best,
> Sebastian
>
> [1] https://en.wiktionary.org/wiki/Wiktionary:Entry_layout_explained
>
> Am 05.04.2013 18:13, schrieb Andrew Krizhanovsky:
>>
>> Thank Sebastian, for quick reply.
>>
>>>> But these do not occur frequently. For senses these seem to be available
>>>> however...
>>
>> Can you count - how many senses and synonyms were successfully
>> extracted from English Wiktionary and Russian Wiktionary,
>> i.e. how many senses and synonyms are available now in DBpedia Wiktionary?
>>
>> It will be interesting to compare with number of senses and synonyms
>> extracted from Wiktionaries by wikokit parser,
>> seehttp://code.google.com/p/wikokit/#Statistics
>>
>> Best regards,
>> Andrew.
>>
>> On Fri, Apr 5, 2013 at 5:57 PM, Sebastian Hellmann
>> <[hidden email]>  wrote:
>>>
>>> Hi Andrew,
>>> actually the tools to solve this problem are in place:
>>> http://en.wiktionary.org/wiki/house#English-abode
>>> links to a sense, the highlighting is there, also if you go to Editing
>>> Gadgets you can enable  "Enable definition editing options." to add
>>> glosses.
>>> This was created by Yair_rand and it allows you to connect senses with
>>> the
>>> help of glosses such as "abode".
>>>
>>> However, this has not received any uptake by the Wiktionary community.
>>>
>>> The idea is to  have something like (on
>>> http://en.wiktionary.org/wiki/house#English-establishment)
>>> # {{senseid|en|establishment}}An [[establishment]], whether actual, as a
>>> pub, or virtual, as a website. Particularly restaurant, casino, or
>>> financial
>>> or trading company.
>>> ...
>>> *  {{sense|establishment}} [[shop]]
>>> ...
>>> {{trans-top|an establishment}}
>>>
>>> But these do not occur frequently. For senses these seem to be available
>>> however:
>>>
>>> http://wiktionary.dbpedia.org/resource/as_soon_as_possible-English-Adverb-1en
>>>
>>> Query:
>>> http://wiktionary.dbpedia.org/sparql
>>> select * where {Graph ?g {?s
>>> <http://wiktionary.dbpedia.org/terms/hasSynonym>  ?o } } limit 100
>>>
>>> All the best,
>>> Sebastian
>>>
>>> Am 05.04.2013 11:23, schrieb Andrew Krizhanovsky:
>>>
>>>> DBpedia Wiktionary - is very interesting project!
>>>>
>>>> Is it possible to get list of synonyms for the first meaning of the
>>>> noun "dog" now?
>>>> http://en.wiktionary.org/wiki/dog#Synonyms
>>>>
>>>> Best regards,
>>>> Andrew Krizhanovsky.
>>>>
>>>> On Fri, Apr 5, 2013 at 11:05 AM, Dimitris Kontokostas
>>>> <[hidden email]>  wrote:
>>>>>
>>>>> Hi Moutupsi,
>>>>>
>>>>> You should definitely take look at DBpedia Wiktionary (
>>>>> http://dbpedia.org/Wiktionary).
>>>>> It supports everything you want and can be easily configured for other
>>>>> languages.
>>>>>
>>>>> Best,
>>>>> Dimitris
>>>>>
>>>>>
>>>>> On Thu, Apr 4, 2013 at 4:21 AM, Moutupsi Paul
>>>>> <[hidden email]>wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>>
>>>>>>
>>>>>> Greeting,
>>>>>>
>>>>>>
>>>>>>
>>>>>> I am a CS grad student from Data Science Lab Stony Brook<
>>>>>> https://sites.google.com/site/datascienceslab/>  and I am dropping
>>>>>> this
>>>>>> mail to request information about parsing multi-lingual Wiktionary
>>>>>> data.
>>>>>> Our lab has been using Wikipedia data for quite a while now but we are
>>>>>> really interested in taking advantage of the massive Wiktionary
>>>>>> content
>>>>>> which we feel , after proper parsing, can become an rich muti-language
>>>>>> corpus.
>>>>>>
>>>>>>
>>>>>>
>>>>>> But the big hurdle is a parsing tool. We have tried a few Wiktionary
>>>>>> parsing tools
>>>>>>
>>>>>>
>>>>>>
>>>>>> 1.https://github.com/clbecker/perl-wiktionary-parser/
>>>>>>
>>>>>> 2.
>>>>>> https://code.google.com/p/wikokit/wiki/GettingStartedWiktionaryParser
>>>>>>
>>>>>> 3.
>>>>>>
>>>>>>
>>>>>> https://github.com/benreynwar/wiktionary-parser/tree/master/wiktionary_parser
>>>>>>
>>>>>> 4.http://www.ukp.tu-darmstadt.de/software/jwktl/
>>>>>>
>>>>>>
>>>>>>
>>>>>> but none of them are available in a ready-to-use or easy-to-extend in
>>>>>> multiple language mode. (I am currently trying to work with wikokit
>>>>>> (parser
>>>>>> 2 above)  )
>>>>>>
>>>>>>
>>>>>>
>>>>>> I request for some advice, suggestion or redirection towards best
>>>>>> available Wiktionary parser. We are mainly looking to extract
>>>>>> meanings,
>>>>>> POS, examples, translations etc. (more can never hurt).
>>>>>>
>>>>>>
>>>>>>
>>>>>> Any help is appreciated. Kindly let know if further information is
>>>>>> needed.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Moutupsi
>>>>>>
>>>>>> _______________________________________________
>>>>>> Wiktionary-l mailing list
>>>>>> [hidden email]
>>>>>> https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
>>>>>>
>>>>>> --
>>>>>> Dimitris Kontokostas
>>>>>> Department of Computer Science, University of Leipzig
>>>>>> Research Group:http://aksw.org
>>>>>> Homepage:http://aksw.org/DimitrisKontokostas
>>>>>> <https://lists.wikimedia.org/mailman/listinfo/wiktionary-l>
>>>>>
>>>>> _______________________________________________
>>>>> Wiktionary-l mailing list
>>>>> [hidden email]
>>>>> https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
>>>>
>>>> _______________________________________________
>>>> Wiktionary-l mailing list
>>>> [hidden email]
>>>> https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
>>>>
>>> --
>>> Dipl. Inf. Sebastian Hellmann
>>>
>>> Department of Computer Science, University of Leipzig
>>> Projects:http://nlp2rdf.org  ,http://linguistics.okfn.org  ,
>>> http://dbpedia.org/Wiktionary  ,http://dbpedia.org
>>> Homepage:http://bis.informatik.uni-leipzig.de/SebastianHellmann
>>> Research Group:http://aksw.org
>
>
>
> --
> Dipl. Inf. Sebastian Hellmann
> Department of Computer Science, University of Leipzig
> Projects: http://nlp2rdf.org , http://linguistics.okfn.org ,
> http://dbpedia.org/Wiktionary , http://dbpedia.org
> Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
> Research Group: http://aksw.org

_______________________________________________
Wiktionary-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
Reply | Threaded
Open this post in threaded view
|

Re: Wiktionary parsing ; multiple languages

Lars Aronsson
On 04/07/2013 09:46 AM, Andrew Krizhanovsky wrote:
> Thank you for the paper. I like the overview in this paper and the
> clear description of Wiktionary parsing difficulties.

An issue that is related to Wiktionary parsing is the
automatic creation of Wiktionary entries by bots.

I have used a bot to create inflection entries, but
only for Swedish words in the English Wiktionary,
and not for main entries with definitions. What
attempts of that kind have been made, and what
software or data structures have they used?
Could that work be generalized and coordinated?


--
   Lars Aronsson ([hidden email])
   Aronsson Datateknik - http://aronsson.se



_______________________________________________
Wiktionary-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
Reply | Threaded
Open this post in threaded view
|

Re: Wiktionary parsing ; multiple languages

mathieu lovato stumpf guntz
In reply to this post by Sebastian Hellmann
Le 2013-04-05 19:44, Sebastian Hellmann a écrit :

> Hi Mathieu,
>
> Am 05.04.2013 11:56, schrieb Mathieu Stumpf:
>> I added the dbpedia wiktionary entry on [1]. I wasn't aware of your
>> effort, despite being really interesting in the wiktionary future.
>> Could you please  read [1] and update it with your vision as a dbpedia
>> contributor?
>>
>> [1] https://meta.wikimedia.org/wiki/Wiktionary_future
>>
>
> this page is interesting, but seems to be very idealistic. I am not
> sure, every language community agrees to use a common model. I also
> wonder if this is possible at all and whether there is an overlap. Do
> you think it makes sense to edit that page? Normally, there is a lot
> of talk and planning and nothing comes around in the end.

I would present that in an other way which would to say that this page
try to adress the problem with long term perpectives, but with real
concrete goals. Sure you can't reach the one solution that will make
everybody happy, but making people talk together of their specifics
issues and expectations from wikitionaries is a path which I think worth
to be explored. To my mind, this should help us to have a better
overview of various linguistic knowledge people are expecting to find in
wiktionnaries, and how to improve the transmission of this knowledge
between each chapters.

As it is said on the page, this is not a trivial problem, because it
asks to gather a lot of linguistic expertise, as well as think about the
UX we want to provide to end users and facilitate for third parties.

> Note that the good thing about Wiktionary is, that you can add
> information freely without adhering to a preset structure.

Yes and no. Sure if you don't count with the wikisyntax, there are no
specific structure imposed to wiktionnaries chapters. But in practice,
you know that they did adopted a more or less rigid structure, because
that was relevant. But now we are in a situation where each chapter have
its own idiom of templates, that not only make harder to automate
cross-chapter information transmission, but also can make newcommers
affraid. This is a really serious issue, I know that at least for the
french chapter, we are losing wannabe contributor, because of heavy use
we make of template. Don't get me wrong here, I'm not blaming the french
wiktionary community, to my mind it's an upstream issue.

You know that having more editors is one of our community goals, don't
you? Well, to have more editor, we have to make the participating
leurning curve as small as possible. And that require a good UX. And
that require a well thought end-user interface/API integration. I have
no doubt it will be really difficult to integrate the Visual Editor into
the french wiktionary for example, because articles there heavily relies
on templates, and as far as I know, the Visual Editor doesn't provide
(yet?) any tool to structure information further than
section/bold/italic. But in the french wiktionary, even sections are
created using templates!

>
> DBpedia is already implementing adapters to load data from WikiData.
> So Once WikiData is working for Wiktionary, we will have data from
> there and from the remaining Wikisyntax and merge them.DBpedia and
> WikiData have a loose cooperation for a joint task in a Google Summer
> of Code proposal.

Well, that's great, we need such a work to be done too. Thank you to do
it.

_______________________________________________
Wiktionary-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
Reply | Threaded
Open this post in threaded view
|

Re: Wiktionary parsing ; multiple languages

mathieu lovato stumpf guntz
In reply to this post by Amgine-3
Le 2013-04-05 20:48, Amgine a écrit :

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> There are several Wiktionary proposals for GSOC. I'm aware of another
> for pronunciation recording from wiktionary pages, and one to create
> a
> DICT-like api either as an extension to MW api or as a special page
> extension.
>
> Amgine

Oh, great! If you have some relevant links, please share them on the
meta page. :)

>
>
> On 05/04/13 10:44 AM, Sebastian Hellmann wrote:
>> Hi Mathieu,
>>
>> Am 05.04.2013 11:56, schrieb Mathieu Stumpf:
>>> I added the dbpedia wiktionary entry on [1]. I wasn't aware of
>>> your effort, despite being really interesting in the wiktionary
>>> future. Could you please  read [1] and update it with your vision
>>> as a dbpedia contributor?
>>>
>>> [1] https://meta.wikimedia.org/wiki/Wiktionary_future
>>>
>>
>> this page is interesting, but seems to be very idealistic. I am
>> not sure, every language community agrees to use a common model. I
>> also wonder if this is possible at all and whether there is an
>> overlap. Do you think it makes sense to edit that page? Normally,
>> there is a lot of talk and planning and nothing comes around in the
>> end.
>>
>> Note that the good thing about Wiktionary is, that you can add
>> information freely without adhering to a preset structure.
>>
>> DBpedia is already implementing adapters to load data from
>> WikiData. So Once WikiData is working for Wiktionary, we will have
>> data from there and from the remaining Wikisyntax and merge
>> them.DBpedia and WikiData have a loose cooperation for a joint task
>> in a Google Summer of Code proposal.
>>
>> All the best, Sebastian
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.12 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iQEcBAEBAgAGBQJRXxx2AAoJEBGze5c9ley6M4QH/1PdSnexBUBj+8BWr8LMBrao
> WJAzSwMowGsxi+27DcC1VxqWocGgFEbiJ8OTezN47SbcDpQu1QAQOIvq/iU0fgeE
> zNdV8zLf2C+BH4Ods1Qm6LcPi3efWx4GHtr07BQjmUB/1iW2qZ1adyPu32C6SfTU
> hsmEnYxDFAXoXSnfJtTZN8SFC4licZykHzJMQke2nibexVPfbkv4s202pCU+Uey1
> YyZkWYFzw8cDInODME2OgHIbzEiACq99bsrB2U+1p/aikIt1p5qsBG7k2qkuMUaA
> XoIF8EvjVt2dkuwTnVCeK8O1XlizgaDmx7uURZOMO7CCTGBqB845zUNowvvveCM=
> =okg4
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> Wiktionary-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiktionary-l

--
Association Culture-Libre
http://www.culture-libre.org/

_______________________________________________
Wiktionary-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiktionary-l