Summary of findings from WMF Summer of Research program now available

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Summary of findings from WMF Summer of Research program now available

Steven Walling-3
Greetings everyone,

Now that the the WMF summer research program in the Community Department has
come to a close, I wanted to point interested parties to the body of
findings we've produced.

We covered a lot of territory so to save you the trouble if you just want to
browse, we collected our most salient results into one wiki page.

   - Relevant blog post here:
   http://blog.wikimedia.org/2011/09/06/summer-research-findings/


   - Summary of findings on Meta, with links to further documentation:
   https://secure.wikimedia.org/wikipedia/meta/wiki/Research:Wikimedia_Summer_of_Research_2011/Summary_of_Findings

Next steps are twofold for this program:

   1. We'll be working with the Global Development team and some volunteers
   from the local community to extend these analyses to cover Portuguese
   Wikipedia, specifically to support Global Dev's work in Brazil.
   2. We're choosing and implementing a platform to release not just our
   code, but the datasets we compiled over the summer. You'll hear more about
   this soon, but we're taking our time in order to decide on a solution that
   will work in the long term for sharing open data beyond the dumps.

Last but not least, if anyone would like to have a more in-depth discussion
about these findings and the research that produced them, I'm definitely
open to hosting an IRC office hours with some members of the team. Just let
me know if you're interested (on or offlist) and I'll set something up soon.

--
Steven Walling
Fellow at Wikimedia Foundation
wikimediafoundation.org
_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Summary of findings from WMF Summer of Research program now available

John Mark Vandenberg
Thanks Steven, and the Community Department.

I am instantly drawn to the analysis of redlinks.
Can we please have this data!!
Article writers are on stand by ready to kill red links ;-)

The special page for this is dead.

http://en.wikipedia.org/wiki/Special:WantedPages

--
John Vandenberg

_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: [Wiki-research-l] Summary of findings from WMF Summer of Research program now available

R.Stuart Geiger
Thanks for the interest, John!  I put the list of the top 250 up at
http://en.wikipedia.org/wiki/Wikipedia:Most_wanted_articles -- but I
didn't exactly publicize it.  I guess this is my chance to do so now!
Also, a list of the top 1000 redlinked articles is up on a separate
page at http://en.wikipedia.org/wiki/Wikipedia:Most_wanted_articles/July_2011
and the entire dataset is up at
http://toolserver.org/~swalker/redlink_list.csv -- note that it is
42.8mb!

If you have any other questions about the redlinks/bluelinks dataset,
feel free to ask me.  And you can check out the meta page for more fun
links data, such as how many more links we added between 2009 and
2011, or incoming links to articles about countries / each country's
population: http://meta.wikimedia.org/wiki/Research:One_Link,_Two_Links,_Red_Links,_Blue_Links

Stuart

----
Stuart Geiger
User:Staeiou / @staeiou
Ph.D student, UC-Berkeley School of Information

On Tue, Sep 6, 2011 at 10:19 AM, John Vandenberg <[hidden email]> wrote:

> Thanks Steven, and the Community Department.
>
> I am instantly drawn to the analysis of redlinks.
> Can we please have this data!!
> Article writers are on stand by ready to kill red links ;-)
>
> The special page for this is dead.
>
> http://en.wikipedia.org/wiki/Special:WantedPages
>
> --
> John Vandenberg
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>

_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: [Wiki-research-l] Summary of findings from WMF Summer of Research program now available

Yaroslav M. Blanter
On Tue, 6 Sep 2011 10:45:41 -0700, "R.Stuart Geiger" <[hidden email]>
wrote:

> Thanks for the interest, John!  I put the list of the top 250 up at
> http://en.wikipedia.org/wiki/Wikipedia:Most_wanted_articles -- but I
> didn't exactly publicize it.  I guess this is my chance to do so now!
> Also, a list of the top 1000 redlinked articles is up on a separate
> page at
> http://en.wikipedia.org/wiki/Wikipedia:Most_wanted_articles/July_2011
> and the entire dataset is up at
> http://toolserver.org/~swalker/redlink_list.csv -- note that it is
> 42.8mb!
>
> If you have any other questions about the redlinks/bluelinks dataset,
> feel free to ask me.  And you can check out the meta page for more fun
> links data, such as how many more links we added between 2009 and
> 2011, or incoming links to articles about countries / each country's
> population:
>
http://meta.wikimedia.org/wiki/Research:One_Link,_Two_Links,_Red_Links,_Blue_Links
>
> Stuart
>
> ----
> Stuart Geiger
> User:Staeiou / @staeiou
> Ph.D student, UC-Berkeley School of Information
>
> On Tue, Sep 6, 2011 at 10:19 AM, John Vandenberg <[hidden email]>
wrote:

>> Thanks Steven, and the Community Department.
>>
>> I am instantly drawn to the analysis of redlinks.
>> Can we please have this data!!
>> Article writers are on stand by ready to kill red links ;-)
>>
>> The special page for this is dead.
>>
>> http://en.wikipedia.org/wiki/Special:WantedPages
>>
>> --
>> John Vandenberg
>>
>> _______________________________________________

From what I see, the page http://en.wikipedia.org/wiki/Special:WantedPages
is just misleading: For instance, one of the most ranking missing articles,
[[Alison Campbell]], has all 5000+ links leading not from other articles,
but from article talk pages, where it is not explicitly present, which
means someone put this red link into one of the highly used templates for
project evaluations (I did not investigate which one). I actually doubt
that the person is even notable, though there is a short stub in Dutch
Wikipedia. There is no way that this is really one of the most wanted
articles. Others I tried from the first page share the same problem.

Cheers
Yaroslav

_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: [Wiki-research-l] Summary of findings from WMF Summer of Research program now available

M. Williamson
In reply to this post by R.Stuart Geiger
I would be interested to know what the most wanted pages would be if all
links from templates were excluded. If I introduce a redlink into a template
that's transcluded on 2000 pages, it immediately becomes a most wanted
article. I'd also be very interested in seeing this data for other
Wikipedias, particularly Spanish (es) and Serbo-Croatian (sh).

2011/9/6 R.Stuart Geiger <[hidden email]>

> Thanks for the interest, John!  I put the list of the top 250 up at
> http://en.wikipedia.org/wiki/Wikipedia:Most_wanted_articles -- but I
> didn't exactly publicize it.  I guess this is my chance to do so now!
> Also, a list of the top 1000 redlinked articles is up on a separate
> page at
> http://en.wikipedia.org/wiki/Wikipedia:Most_wanted_articles/July_2011
> and the entire dataset is up at
> http://toolserver.org/~swalker/redlink_list.csv -- note that it is
> 42.8mb!
>
> If you have any other questions about the redlinks/bluelinks dataset,
> feel free to ask me.  And you can check out the meta page for more fun
> links data, such as how many more links we added between 2009 and
> 2011, or incoming links to articles about countries / each country's
> population:
> http://meta.wikimedia.org/wiki/Research:One_Link,_Two_Links,_Red_Links,_Blue_Links
>
> Stuart
>
> ----
> Stuart Geiger
> User:Staeiou / @staeiou
> Ph.D student, UC-Berkeley School of Information
>
> On Tue, Sep 6, 2011 at 10:19 AM, John Vandenberg <[hidden email]> wrote:
> > Thanks Steven, and the Community Department.
> >
> > I am instantly drawn to the analysis of redlinks.
> > Can we please have this data!!
> > Article writers are on stand by ready to kill red links ;-)
> >
> > The special page for this is dead.
> >
> > http://en.wikipedia.org/wiki/Special:WantedPages
> >
> > --
> > John Vandenberg
> >
> > _______________________________________________
> > Wiki-research-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
>
> _______________________________________________
> foundation-l mailing list
> [hidden email]
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: [Wiki-research-l] Summary of findings from WMF Summer of Research program now available

Emilio J. Rodríguez-Posada
In reply to this post by Steven Walling-3
The interesting thing here is, 4.8M unique red links in 2009, and unique
5.6M red links in 2011. *The more articles are created, the more articles
are missing*.

2011/9/6 Steven Walling <[hidden email]>

> Greetings everyone,
>
> Now that the the WMF summer research program in the Community Department
> has come to a close, I wanted to point interested parties to the body of
> findings we've produced.
>
> We covered a lot of territory so to save you the trouble if you just want
> to browse, we collected our most salient results into one wiki page.
>
>    - Relevant blog post here:
>    http://blog.wikimedia.org/2011/09/06/summer-research-findings/
>
>
>    - Summary of findings on Meta, with links to further documentation:
>    https://secure.wikimedia.org/wikipedia/meta/wiki/Research:Wikimedia_Summer_of_Research_2011/Summary_of_Findings
>
> Next steps are twofold for this program:
>
>    1. We'll be working with the Global Development team and some
>    volunteers from the local community to extend these analyses to cover
>    Portuguese Wikipedia, specifically to support Global Dev's work in Brazil.
>    2. We're choosing and implementing a platform to release not just our
>    code, but the datasets we compiled over the summer. You'll hear more about
>    this soon, but we're taking our time in order to decide on a solution that
>    will work in the long term for sharing open data beyond the dumps.
>
> Last but not least, if anyone would like to have a more in-depth discussion
> about these findings and the research that produced them, I'm definitely
> open to hosting an IRC office hours with some members of the team. Just let
> me know if you're interested (on or offlist) and I'll set something up soon.
>
> --
> Steven Walling
> Fellow at Wikimedia Foundation
> wikimediafoundation.org
>
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: [Wiki-research-l] Summary of findings from WMF Summer of Research program now available

Liam Wyatt
On 10/09/2011, at 23:04, emijrp <[hidden email]> wrote:

> The interesting thing here is, 4.8M unique red links in 2009, and unique
> 5.6M red links in 2011. *The more articles are created, the more articles
> are missing*.
>
Along those lines, I recall seeing (at least three years ago) some research that said the proportion of redlinks was remaining stable even as the number of articles grew. They hypothesised that if the proportion decreased then that would imply that we would eventually stop and "finish" the encyclopedia. And on the other hand if the proportion of redlinks increased that it would imply that the project would eventually decay through too much entropy. Instead of the two extremes the research said that, a bit like goldilocks, the growth was "just right" and could continue indefinitely. Does anyone else remember this research or it's name/author?

-Liam

Wittylama.com/blog
Peace, love & metadata
_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: [Wiki-research-l] Summary of findings from WMF Summer of Research program now available

Tilman Bayer
On Sun, Sep 11, 2011 at 3:04 AM, Liam Wyatt <[hidden email]> wrote:
> On 10/09/2011, at 23:04, emijrp <[hidden email]> wrote:
>
>> The interesting thing here is, 4.8M unique red links in 2009, and unique
>> 5.6M red links in 2011. *The more articles are created, the more articles
>> are missing*.
>>
> Along those lines, I recall seeing (at least three years ago) some research that said the proportion of redlinks was remaining stable even as the number of articles grew. They hypothesised that if the proportion decreased then that would imply that we would eventually stop and "finish" the encyclopedia. And on the other hand if the proportion of redlinks increased that it would imply that the project would eventually decay through too much entropy. Instead of the two extremes the research said that, a bit like goldilocks, the growth was "just right" and could continue indefinitely. Does anyone else remember this research or it's name/author?

http://dl.acm.org/citation.cfm?id=1378720 Spinellis/Louridas, "The
collaborative organization of knowledge", complemented in
http://www.spinellis.gr/blog/20080808/ and summarized in
https://secure.wikimedia.org/wikipedia/en/wiki/Wikipedia:Wikipedia_Signpost/2008-08-11/Growth_study

--
Tilman Bayer
Movement Communications
Wikimedia Foundation
IRC (Freenode): HaeB

_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: [Wiki-research-l] Summary of findings from WMF Summer of Research program now available

Andrew Gray-3
In reply to this post by Yaroslav M. Blanter
On 8 September 2011 10:58, Yaroslav M. Blanter <[hidden email]> wrote:

> From what I see, the page http://en.wikipedia.org/wiki/Special:WantedPages
> is just misleading: For instance, one of the most ranking missing articles,
> [[Alison Campbell]], has all 5000+ links leading not from other articles,
> but from article talk pages, where it is not explicitly present, which
> means someone put this red link into one of the highly used templates for
> project evaluations (I did not investigate which one). I actually doubt
> that the person is even notable, though there is a short stub in Dutch
> Wikipedia. There is no way that this is really one of the most wanted
> articles. Others I tried from the first page share the same problem.

It's in a project-specific to-do list - for a fairly minor project, as
these things go, but even a smallish project on enwiki has a lot of
articles!

http://en.wikipedia.org/wiki/Template:Northern_Ireland_tasks

(If anyone's wondering, Alison Clarke is the former Miss Northern
Ireland, engaged to marry a prominent sportsman, and thus presumably
something of a minor local celebrity. I make no comment on
notability.)

For future research on redlinks, it would definitely be worth
distinguishing between "links in article text" and "links from
projectspace / inline templates". Technically more difficult to figure
out, of course, but that's why we call them researchers ;-)

--
- Andrew Gray
  [hidden email]

_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l