A bot to create articles about species

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

A bot to create articles about species

Lars Aronsson
User:Lsj has written 4000 lines of C# source code on top
of the DotNetWikiBot framework, to create 10,000 articles
in Swedish about bird species in the spring of 2012 and
recently even more articles in Swedish about fungi species.

Some information about his Lsjbot is found here,
http://sv.wikipedia.org/wiki/Wikipedia:Projekt_DotNetWikiBot_Framework/Lsjbot

The otherwise very reluctant/skeptic/picky Swedish Wikipedia
community has gladly accepted these well-written articles.

I think it would be interesting if a community of wikipedians
in some other language would try to translate this bot.
Some languages might have notability or relevance requirements
that these species don't fulfill, others might think 1700
bytes is a too short article. But I think the citation of
sources and correctness of fact would be generally accepted.

Here is a blog post in Swedish about the bird articles,
http://wikimediasverige.wordpress.com/2012/03/06/10-000-fagelarter-pa-svenska/

Some 3,600 birds are found in this category for articles
that were bot-created and have not yet been inspected,
http://sv.wikipedia.org/wiki/Kategori:Robotskapade_f%C3%A5gelartiklar
Some 54,000 fungi species are found here,
http://sv.wikipedia.org/wiki/Kategori:Robotskapade_svampartiklar
The birds more often have common names, which are preferred
as article names instead of the Latin/scientific names,
e.g. the blue-and-white swallow,
http://sv.wikipedia.org/wiki/Bl%C3%A5vit_svala
where the Latin name is a bot-created redirect to the
bot-created article,
http://sv.wikipedia.org/wiki/Pygochelidon_cyanoleuca

At the Swedish Wikipedia village pump there is now a
discussion of whether to continue with species of animals,
plants, bacteria, etc.
http://sv.wikipedia.org/wiki/Wikipedia:Bybrunnen#Botskapande_av_artiklar_f.C3.B6r_alla_v.C3.A4rldens_arter.3F


--
   Lars Aronsson ([hidden email])
   Aronsson Datateknik - http://aronsson.se



_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: A bot to create articles about species

Nikola Smolenski-2
On 18/10/12 03:26, Lars Aronsson wrote:

> User:Lsj has written 4000 lines of C# source code on top
> of the DotNetWikiBot framework, to create 10,000 articles
> in Swedish about bird species in the spring of 2012 and
> recently even more articles in Swedish about fungi species.
>
> Some information about his Lsjbot is found here,
> http://sv.wikipedia.org/wiki/Wikipedia:Projekt_DotNetWikiBot_Framework/Lsjbot
>
> The otherwise very reluctant/skeptic/picky Swedish Wikipedia
> community has gladly accepted these well-written articles.
>
> I think it would be interesting if a community of wikipedians
> in some other language would try to translate this bot.
> Some languages might have notability or relevance requirements
> that these species don't fulfill, others might think 1700
> bytes is a too short article. But I think the citation of
> sources and correctness of fact would be generally accepted.
The need for such bots should cease after Wikidata is fully deployed. I
suggest to interested programmers that they should direct their effort
there.


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: A bot to create articles about species

Steven Walling
On Wed, Oct 17, 2012 at 11:46 PM, Nikola Smolenski <[hidden email]>
wrote:
> The need for such bots should cease after Wikidata is fully deployed. I
> suggest to interested programmers that they should direct their effort
> there.

Why is that the case?

I didn't understand the scope of Wikidata to include actual creation
of articles that don't exist. Only to provide data about topics
across projects. Sure, that might be extremely helpful to someone with a
bot to populate species articles, but I'm skeptical that Wikidata would
or should be creating millions of articles about such things. If you
consider something even slightly more controversial than species, such as
schools, many projects would not welcome a third party mass-creating pages
about a topic that is described in Wikidata.

Steven
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: A bot to create articles about species

Nikola Smolenski-2
On 18/10/12 09:25, Steven Walling wrote:

> On Wed, Oct 17, 2012 at 11:46 PM, Nikola Smolenski<[hidden email]>
> wrote:
>> The need for such bots should cease after Wikidata is fully deployed. I
>> suggest to interested programmers that they should direct their effort
>> there.
>
> Why is that the case?
>
> I didn't understand the scope of Wikidata to include actual creation
> of articles that don't exist. Only to provide data about topics
> across projects. Sure, that might be extremely helpful to someone with a
> bot to populate species articles, but I'm skeptical that Wikidata would
> or should be creating millions of articles about such things. If you
> consider something even slightly more controversial than species, such as
> schools, many projects would not welcome a third party mass-creating pages
> about a topic that is described in Wikidata.
Wikidata won't need to create articles. Rather, if you are trying to see
a page without an article, Wikipedia will check if an item with
appropriate name exists in Wikidata and generate the article on the fly
if Wikipedia has a local article template for this type of article.


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: A bot to create articles about species

John Erling Blad
On Thu, Oct 18, 2012 at 10:08 AM, Nikola Smolenski <[hidden email]> wrote:

> On 18/10/12 09:25, Steven Walling wrote:
>>
>> On Wed, Oct 17, 2012 at 11:46 PM, Nikola Smolenski<[hidden email]>
>> wrote:
>>>
>>> The need for such bots should cease after Wikidata is fully deployed. I
>>> suggest to interested programmers that they should direct their effort
>>> there.
>>
>>
>> Why is that the case?

The necessary data to create those articles will be available in
Wikidata, and possibly a lot more than we currently have in our
templates. That could make it possible to create really awsome
articles, if it were not for one thing - it is extremly hard to create
well-formed text automatically. One of the more common problems are
names that uses different inflection rules due to context and how they
are written. Such inflection rules are not part of the Wikidata
project and will probably be a major undertaking in itself.

Note that some languages does not need such inflection rules and then
it is fairly simple to create articles from templates. In other cases
it might be good enough to simply say "Pygochelidon cyanoleuca is a
bird" and add an automatic template.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: A bot to create articles about species

Nikola Smolenski-2
On 18/10/12 11:06, John Erling Blad wrote:
> well-formed text automatically. One of the more common problems are
> names that uses different inflection rules due to context and how they
> are written. Such inflection rules are not part of the Wikidata
> project and will probably be a major undertaking in itself.

Why do you think that inflection rules will not be a part of Wikidata?
They would be hugely needed on Wiktionary and there is no reason for
Wikidata not being able to contain them.


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: A bot to create articles about species

John Erling Blad
Getting working inflection rules for even a single language is a major
task, and doing so for several hundred languages would be a
overwhelming task.  I can't see how this can be implemented as part of
the Wikidata project within a reasonable time frame.

There is a few shortcuts that can be made, and it is possible to make
some generalized tools. For an open source alternative take a look at
Apertium (http://en.wikipedia.org/wiki/Apertium). Usually it is only
the generation/disambiguation phase that is necessary, and this makes
the task somewhat simpler, but it is still a major undertaking.

Note that some of the basic tools already exist, we only need to
interface them to Mediawiki, but the tools needs definition files to
work (that is inflection rules for Northern Sami language for example,
or Norwegian bokmål and nynorsk, or Swedish) and it is those
definitions that is the major task.

John

On Thu, Oct 18, 2012 at 11:14 AM, Nikola Smolenski <[hidden email]> wrote:

> On 18/10/12 11:06, John Erling Blad wrote:
>>
>> well-formed text automatically. One of the more common problems are
>> names that uses different inflection rules due to context and how they
>> are written. Such inflection rules are not part of the Wikidata
>> project and will probably be a major undertaking in itself.
>
>
> Why do you think that inflection rules will not be a part of Wikidata? They
> would be hugely needed on Wiktionary and there is no reason for Wikidata not
> being able to contain them.
>
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: A bot to create articles about species

Denny Vrandečić
In reply to this post by Nikola Smolenski-2
For now, we have no plans for Wikidata to create articles. This would,
in my opinion, meddle too much with the autonomy of the Wikipedia
language projects.

What will be possible is to facilitate the creation of such bots, as
some data that might be used for the article might be taken from and
maintained in Wikidata, and the creation of templates that use data
from Wikidata.

Wikidata currently has no plans for creating text using natural
language generation techniques. We would love for someone else to do
this kind of awesome on top of Wikidata.

I hope this helps,
Denny




2012/10/18 Nikola Smolenski <[hidden email]>:

> On 18/10/12 09:25, Steven Walling wrote:
>>
>> On Wed, Oct 17, 2012 at 11:46 PM, Nikola Smolenski<[hidden email]>
>> wrote:
>>>
>>> The need for such bots should cease after Wikidata is fully deployed. I
>>> suggest to interested programmers that they should direct their effort
>>> there.
>>
>>
>> Why is that the case?
>>
>> I didn't understand the scope of Wikidata to include actual creation
>> of articles that don't exist. Only to provide data about topics
>> across projects. Sure, that might be extremely helpful to someone with a
>> bot to populate species articles, but I'm skeptical that Wikidata would
>> or should be creating millions of articles about such things. If you
>> consider something even slightly more controversial than species, such as
>> schools, many projects would not welcome a third party mass-creating pages
>> about a topic that is described in Wikidata.
>
>
> Wikidata won't need to create articles. Rather, if you are trying to see a
> page without an article, Wikipedia will check if an item with appropriate
> name exists in Wikidata and generate the article on the fly if Wikipedia has
> a local article template for this type of article.
>
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



--
Project director Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. +49-30-219 158 26-0 | http://wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: A bot to create articles about species

John Erling Blad
For those interested this type of text synthesis, it can be done by
using finite-state automata and transducers (FST's). The simplest way
to make them is by cross-compiling into Lua from some other known
form.
John

On Thu, Oct 18, 2012 at 2:10 PM, Denny Vrandečić
<[hidden email]> wrote:

> For now, we have no plans for Wikidata to create articles. This would,
> in my opinion, meddle too much with the autonomy of the Wikipedia
> language projects.
>
> What will be possible is to facilitate the creation of such bots, as
> some data that might be used for the article might be taken from and
> maintained in Wikidata, and the creation of templates that use data
> from Wikidata.
>
> Wikidata currently has no plans for creating text using natural
> language generation techniques. We would love for someone else to do
> this kind of awesome on top of Wikidata.
>
> I hope this helps,
> Denny
>
>
>
>
> 2012/10/18 Nikola Smolenski <[hidden email]>:
>> On 18/10/12 09:25, Steven Walling wrote:
>>>
>>> On Wed, Oct 17, 2012 at 11:46 PM, Nikola Smolenski<[hidden email]>
>>> wrote:
>>>>
>>>> The need for such bots should cease after Wikidata is fully deployed. I
>>>> suggest to interested programmers that they should direct their effort
>>>> there.
>>>
>>>
>>> Why is that the case?
>>>
>>> I didn't understand the scope of Wikidata to include actual creation
>>> of articles that don't exist. Only to provide data about topics
>>> across projects. Sure, that might be extremely helpful to someone with a
>>> bot to populate species articles, but I'm skeptical that Wikidata would
>>> or should be creating millions of articles about such things. If you
>>> consider something even slightly more controversial than species, such as
>>> schools, many projects would not welcome a third party mass-creating pages
>>> about a topic that is described in Wikidata.
>>
>>
>> Wikidata won't need to create articles. Rather, if you are trying to see a
>> page without an article, Wikipedia will check if an item with appropriate
>> name exists in Wikidata and generate the article on the fly if Wikipedia has
>> a local article template for this type of article.
>>
>>
>> _______________________________________________
>> Wikitech-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
>
> --
> Project director Wikidata
> Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
> Tel. +49-30-219 158 26-0 | http://wikimedia.de
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das
> Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: A bot to create articles about species

Nikola Smolenski-2
In reply to this post by Denny Vrandečić
On 18/10/12 14:10, Denny Vrandečić wrote:
> For now, we have no plans for Wikidata to create articles. This would,
> in my opinion, meddle too much with the autonomy of the Wikipedia
> language projects.

I don't know if I am so bad at explaining things or if this is such a
complex thing to grasp.

No one has ever suggested for Wikidata to create articles. The only
thing suggested was for Wikipedias to display an article-like template
filled with Wikidata data if they have no article on a certain topic,
instead of "Wikipedia does not have an article with this exact name"
page they display now.

This would not only not meddle with Wikipedias' autonomies, it would
require active engagement on part of the community in order to create
the templates. If a community doesn't want the articles, they simply
won't create the templates. Yet I believe this will reduce community
tension since most communities don't like bot-created articles so this
seems to be a reasonable compromise.

> What will be possible is to facilitate the creation of such bots, as
> some data that might be used for the article might be taken from and
> maintained in Wikidata, and the creation of templates that use data
> from Wikidata.
>
> Wikidata currently has no plans for creating text using natural
> language generation techniques. We would love for someone else to do
> this kind of awesome on top of Wikidata.

Natural language generation is not necessary for any of this.

> 2012/10/18 Nikola Smolenski<[hidden email]>:
>> On 18/10/12 09:25, Steven Walling wrote:
>>>
>>> On Wed, Oct 17, 2012 at 11:46 PM, Nikola Smolenski<[hidden email]>
>>> wrote:
>>>>
>>>> The need for such bots should cease after Wikidata is fully deployed. I
>>>> suggest to interested programmers that they should direct their effort
>>>> there.
>>>
>>>
>>> Why is that the case?
>>>
>>> I didn't understand the scope of Wikidata to include actual creation
>>> of articles that don't exist. Only to provide data about topics
>>> across projects. Sure, that might be extremely helpful to someone with a
>>> bot to populate species articles, but I'm skeptical that Wikidata would
>>> or should be creating millions of articles about such things. If you
>>> consider something even slightly more controversial than species, such as
>>> schools, many projects would not welcome a third party mass-creating pages
>>> about a topic that is described in Wikidata.
>>
>>
>> Wikidata won't need to create articles. Rather, if you are trying to see a
>> page without an article, Wikipedia will check if an item with appropriate
>> name exists in Wikidata and generate the article on the fly if Wikipedia has
>> a local article template for this type of article.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: A bot to create articles about species

Ole Palnatoke Andersen-2
On Mon, Oct 22, 2012 at 2:17 PM, Nikola Smolenski <[hidden email]> wrote:

> No one has ever suggested for Wikidata to create articles.
>

OK; then I misunderstood "and generate the article on the fly"..

Regards,
Ole



--
http://palnatoke.org * @palnatoke * +4522934588
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: A bot to create articles about species

Derric Atzrott
>> No one has ever suggested for Wikidata to create articles.
>>
>
>OK; then I misunderstood "and generate the article on the fly"..

So to make sure that I understand this correctly, this is the idea:
* Let's say I search on the lojban Wikipedia for Creagerstown, Maryland
* The article doesn't exist, but Wikidata has information on it.
* I'm told the article doesn't exist, but presented with a template showing when
the town was founded etc. along with search results

Assuming the above is correct:
Would they be able to make use of that template if they wanted to quickly throw
together a stub on the topic?

Thank you,
Derric Atzrott


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: A bot to create articles about species

Nikola Smolenski-2
On 22/10/12 14:31, Derric Atzrott wrote:
>>> No one has ever suggested for Wikidata to create articles.
>>
>> OK; then I misunderstood "and generate the article on the fly"..
>
> So to make sure that I understand this correctly, this is the idea:
> * Let's say I search on the lojban Wikipedia for Creagerstown, Maryland
> * The article doesn't exist, but Wikidata has information on it.
> * I'm told the article doesn't exist, but presented with a template showing when
> the town was founded etc. along with search results

Yes, exactly. Though depending on what the community wants you don't
even have to be told that the article doesn't exist. And I assume the
template would be human-readable, like aforementioned
http://sv.wikipedia.org/wiki/Bl%C3%A5vit_svala

> Would they be able to make use of that template if they wanted to quickly throw
> together a stub on the topic?

Yes, they could use
http://www.mediawiki.org/wiki/Manual:Creating_pages_with_preloaded_text


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l