Wikipedia preprocessing tool (wikiprep)

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Wikipedia preprocessing tool (wikiprep)

Evgeniy Gabrilovich
Dear Wikipedia researchers,

WikiPrep is a preprocessing script written in Perl that takes an XML dump of
Wikipedia, and
infers some information that was implicitly present there. In particular, it
performs the following tasks:
1) Template substitution in article texts
2) Building a hierarchy of categories (i.e., for each category, it collects ids
of its immediate descendants)
3) Identifies related articles based on contextual clues
4) Resolves link redirection and dumps additional information that allows one to
easily build a link graph
   for the entire Wikipedia snapshot
5) Computes statistics about categories and links
6) Collects anchor text associated with links pointing at each article

WikiPrep is distributed under the terms of GNU General Public License version 2,
and
is available at http://www.cs.technion.ac.il/~gabr/resources/code/wikiprep.

Regards,

Evgeniy.

_______________________________________________
Wiki-research-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Wikipedia preprocessing tool (wikiprep)

Gregory Maxwell
On 8/6/07, Evgeniy Gabrilovich <[hidden email]> wrote:
> Dear Wikipedia researchers,
>
> WikiPrep is a preprocessing script written in Perl that takes an XML dump of
> Wikipedia, and

This looks quite useful. Thanks!

_______________________________________________
Wiki-research-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Wikipedia preprocessing tool (wikiprep)

alain_desilets
In reply to this post by Evgeniy Gabrilovich
Interesting work!

As a member of the steering committee for the WikiSym conference(www.wikisym.org), I would encourage you (and all wiki researchers on this list) to submit a paper at future editions of the conference.

It's unfortunately too late for the 2007 edition.


----
Alain D├ęsilets, National Research Council of Canada
Chair, WikiSym 2007

2007 International Symposium on Wikis
Wikis at Work in the World:
Open, Organic, Participatory Media for the 21st Century

http://www.wikisym.org/ws2007/



> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On
> Behalf Of Evgeniy Gabrilovich
> Sent: August 6, 2007 6:00 PM
> To: [hidden email]
> Subject: [Wiki-research-l] Wikipedia preprocessing tool (wikiprep)
>
> Dear Wikipedia researchers,
>
> WikiPrep is a preprocessing script written in Perl that takes
> an XML dump of Wikipedia, and infers some information that
> was implicitly present there. In particular, it performs the
> following tasks:
> 1) Template substitution in article texts
> 2) Building a hierarchy of categories (i.e., for each
> category, it collects ids of its immediate descendants)
> 3) Identifies related articles based on contextual clues
> 4) Resolves link redirection and dumps additional information
> that allows one to easily build a link graph
>    for the entire Wikipedia snapshot
> 5) Computes statistics about categories and links
> 6) Collects anchor text associated with links pointing at each article
>
> WikiPrep is distributed under the terms of GNU General Public
> License version 2, and is available at
> http://www.cs.technion.ac.il/~gabr/resources/code/wikiprep.
>
> Regards,
>
> Evgeniy.
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> http://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>

_______________________________________________
Wiki-research-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/wiki-research-l