Sampling new editors in English Wikipedia

classic Classic list List threaded Threaded
27 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Sampling new editors in English Wikipedia

Haifeng Zhang
Hi folks,

My work needs to randomly sample new editors in each month, e.g., 100 editors per month.

Do any of you have good suggestions for how to do this efficiently?

I could think of using the dump files, but wonder are there other options?


Thanks,

Haifeng Zhang
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Sampling new editors in English Wikipedia

Pine W
Hi, can you expand on what you mean by "sample"? If you're referring to
analyzing users' edit histories then that should be fine. However, if
you're planning to send surveys or messages to them, sending them
barnstars, or otherwise manipulating their on-wiki experience, that would
be problematic.

Pine
( https://meta.wikimedia.org/wiki/User:Pine )


On Tue, Mar 12, 2019 at 6:19 PM Haifeng Zhang <[hidden email]>
wrote:

> Hi folks,
>
> My work needs to randomly sample new editors in each month, e.g., 100
> editors per month.
>
> Do any of you have good suggestions for how to do this efficiently?
>
> I could think of using the dump files, but wonder are there other options?
>
>
> Thanks,
>
> Haifeng Zhang
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Sampling new editors in English Wikipedia

Leila Zia
Hi Pine,

Haifeng has a simple question about how to sample editors other than
via dumps. It would be great if someone who knows the answer to help
them to move forward.

If you are interested to learn more about their research, instead of
answering their question, my recommendation would be to start the
conversation with: "can you tell us more about your research?" kind of
question. I find the current way of communication very speculative,
and that is not good for making a vibrant research community that can
help us address some of our big questions.

Best,
Leila

On Tue, Mar 12, 2019 at 12:08 PM Pine W <[hidden email]> wrote:

>
> Hi, can you expand on what you mean by "sample"? If you're referring to
> analyzing users' edit histories then that should be fine. However, if
> you're planning to send surveys or messages to them, sending them
> barnstars, or otherwise manipulating their on-wiki experience, that would
> be problematic.
>
> Pine
> ( https://meta.wikimedia.org/wiki/User:Pine )
>
>
> On Tue, Mar 12, 2019 at 6:19 PM Haifeng Zhang <[hidden email]>
> wrote:
>
> > Hi folks,
> >
> > My work needs to randomly sample new editors in each month, e.g., 100
> > editors per month.
> >
> > Do any of you have good suggestions for how to do this efficiently?
> >
> > I could think of using the dump files, but wonder are there other options?
> >
> >
> > Thanks,
> >
> > Haifeng Zhang
> > _______________________________________________
> > Wiki-research-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Sampling new editors in English Wikipedia

Stuart A. Yeates
There are a number of new-editor-heavy noticeboards. I would suggest
posting an invite there to your survey (or whatever) If you ask for
editor's usernames you can filter out those who don't meet your
definition of 'new'

I'm thinking of places like:
https://en.wikipedia.org/wiki/Wikipedia:Teahouse and
https://en.wikipedia.org/wiki/Wikipedia:Help_desk

cheers
stuart


--
...let us be heard from red core to black sky

On Wed, 13 Mar 2019 at 08:37, Leila Zia <[hidden email]> wrote:

>
> Hi Pine,
>
> Haifeng has a simple question about how to sample editors other than
> via dumps. It would be great if someone who knows the answer to help
> them to move forward.
>
> If you are interested to learn more about their research, instead of
> answering their question, my recommendation would be to start the
> conversation with: "can you tell us more about your research?" kind of
> question. I find the current way of communication very speculative,
> and that is not good for making a vibrant research community that can
> help us address some of our big questions.
>
> Best,
> Leila
>
> On Tue, Mar 12, 2019 at 12:08 PM Pine W <[hidden email]> wrote:
> >
> > Hi, can you expand on what you mean by "sample"? If you're referring to
> > analyzing users' edit histories then that should be fine. However, if
> > you're planning to send surveys or messages to them, sending them
> > barnstars, or otherwise manipulating their on-wiki experience, that would
> > be problematic.
> >
> > Pine
> > ( https://meta.wikimedia.org/wiki/User:Pine )
> >
> >
> > On Tue, Mar 12, 2019 at 6:19 PM Haifeng Zhang <[hidden email]>
> > wrote:
> >
> > > Hi folks,
> > >
> > > My work needs to randomly sample new editors in each month, e.g., 100
> > > editors per month.
> > >
> > > Do any of you have good suggestions for how to do this efficiently?
> > >
> > > I could think of using the dump files, but wonder are there other options?
> > >
> > >
> > > Thanks,
> > >
> > > Haifeng Zhang
> > > _______________________________________________
> > > Wiki-research-l mailing list
> > > [hidden email]
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > >
> > _______________________________________________
> > Wiki-research-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Sampling new editors in English Wikipedia

Haifeng Zhang
Pine and Stuart,

I meant extracting a random sample of new editors (month by month) from Wikipedia edit history.

It is not about survey of new editors, but still thanks for your suggestions.


Thanks,
Haifeng Zhang

Postdoctoral Research Fellow
Human-Computer Interaction Institute
Carnegie Mellon University
________________________________
From: Wiki-research-l <[hidden email]> on behalf of Stuart A. Yeates <[hidden email]>
Sent: Tuesday, March 12, 2019 3:46:19 PM
To: Research into Wikimedia content and communities
Subject: Re: [Wiki-research-l] Sampling new editors in English Wikipedia

There are a number of new-editor-heavy noticeboards. I would suggest
posting an invite there to your survey (or whatever) If you ask for
editor's usernames you can filter out those who don't meet your
definition of 'new'

I'm thinking of places like:
https://en.wikipedia.org/wiki/Wikipedia:Teahouse and
https://en.wikipedia.org/wiki/Wikipedia:Help_desk

cheers
stuart


--
...let us be heard from red core to black sky

On Wed, 13 Mar 2019 at 08:37, Leila Zia <[hidden email]> wrote:

>
> Hi Pine,
>
> Haifeng has a simple question about how to sample editors other than
> via dumps. It would be great if someone who knows the answer to help
> them to move forward.
>
> If you are interested to learn more about their research, instead of
> answering their question, my recommendation would be to start the
> conversation with: "can you tell us more about your research?" kind of
> question. I find the current way of communication very speculative,
> and that is not good for making a vibrant research community that can
> help us address some of our big questions.
>
> Best,
> Leila
>
> On Tue, Mar 12, 2019 at 12:08 PM Pine W <[hidden email]> wrote:
> >
> > Hi, can you expand on what you mean by "sample"? If you're referring to
> > analyzing users' edit histories then that should be fine. However, if
> > you're planning to send surveys or messages to them, sending them
> > barnstars, or otherwise manipulating their on-wiki experience, that would
> > be problematic.
> >
> > Pine
> > ( https://meta.wikimedia.org/wiki/User:Pine )
> >
> >
> > On Tue, Mar 12, 2019 at 6:19 PM Haifeng Zhang <[hidden email]>
> > wrote:
> >
> > > Hi folks,
> > >
> > > My work needs to randomly sample new editors in each month, e.g., 100
> > > editors per month.
> > >
> > > Do any of you have good suggestions for how to do this efficiently?
> > >
> > > I could think of using the dump files, but wonder are there other options?
> > >
> > >
> > > Thanks,
> > >
> > > Haifeng Zhang
> > > _______________________________________________
> > > Wiki-research-l mailing list
> > > [hidden email]
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > >
> > _______________________________________________
> > Wiki-research-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Sampling new editors in English Wikipedia

Pine W
Hi Haifeng, thanks for the information. I think that your idea of looking
in the dumps makes sense. Am I understanding correctly that you would like
advice regarding how to do that in the most efficient way?

Hi Leila, I believe that I asked for more information regarding Heifeng's
work. There has been discussion on English Wikipedia regarding volunteers
being unhappy with the interventions or proposed interventions of
researchers. I think that asking about the nature of Haifeng's research is
legitimate, and I tried to provide some examples of possible types of
research. I'm trying to protect the community from problematic
interventions, while also welcoming research that is accepted by the
community.

Pine
( https://meta.wikimedia.org/wiki/User:Pine )


On Tue, Mar 12, 2019 at 8:00 PM Haifeng Zhang <[hidden email]>
wrote:

> Pine and Stuart,
>
> I meant extracting a random sample of new editors (month by month) from
> Wikipedia edit history.
>
> It is not about survey of new editors, but still thanks for your
> suggestions.
>
>
> Thanks,
> Haifeng Zhang
>
> Postdoctoral Research Fellow
> Human-Computer Interaction Institute
> Carnegie Mellon University
> ________________________________
> From: Wiki-research-l <[hidden email]> on
> behalf of Stuart A. Yeates <[hidden email]>
> Sent: Tuesday, March 12, 2019 3:46:19 PM
> To: Research into Wikimedia content and communities
> Subject: Re: [Wiki-research-l] Sampling new editors in English Wikipedia
>
> There are a number of new-editor-heavy noticeboards. I would suggest
> posting an invite there to your survey (or whatever) If you ask for
> editor's usernames you can filter out those who don't meet your
> definition of 'new'
>
> I'm thinking of places like:
> https://en.wikipedia.org/wiki/Wikipedia:Teahouse and
> https://en.wikipedia.org/wiki/Wikipedia:Help_desk
>
> cheers
> stuart
>
>
> --
> ...let us be heard from red core to black sky
>
> On Wed, 13 Mar 2019 at 08:37, Leila Zia <[hidden email]> wrote:
> >
> > Hi Pine,
> >
> > Haifeng has a simple question about how to sample editors other than
> > via dumps. It would be great if someone who knows the answer to help
> > them to move forward.
> >
> > If you are interested to learn more about their research, instead of
> > answering their question, my recommendation would be to start the
> > conversation with: "can you tell us more about your research?" kind of
> > question. I find the current way of communication very speculative,
> > and that is not good for making a vibrant research community that can
> > help us address some of our big questions.
> >
> > Best,
> > Leila
> >
> > On Tue, Mar 12, 2019 at 12:08 PM Pine W <[hidden email]> wrote:
> > >
> > > Hi, can you expand on what you mean by "sample"? If you're referring to
> > > analyzing users' edit histories then that should be fine. However, if
> > > you're planning to send surveys or messages to them, sending them
> > > barnstars, or otherwise manipulating their on-wiki experience, that
> would
> > > be problematic.
> > >
> > > Pine
> > > ( https://meta.wikimedia.org/wiki/User:Pine )
> > >
> > >
> > > On Tue, Mar 12, 2019 at 6:19 PM Haifeng Zhang <[hidden email]
> >
> > > wrote:
> > >
> > > > Hi folks,
> > > >
> > > > My work needs to randomly sample new editors in each month, e.g., 100
> > > > editors per month.
> > > >
> > > > Do any of you have good suggestions for how to do this efficiently?
> > > >
> > > > I could think of using the dump files, but wonder are there other
> options?
> > > >
> > > >
> > > > Thanks,
> > > >
> > > > Haifeng Zhang
> > > > _______________________________________________
> > > > Wiki-research-l mailing list
> > > > [hidden email]
> > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > >
> > > _______________________________________________
> > > Wiki-research-l mailing list
> > > [hidden email]
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> > _______________________________________________
> > Wiki-research-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Sampling new editors in English Wikipedia

Isaac Johnson
Hey Haifeng,
If you decide to process the dumps, you should be able to easily repurpose
some quick code that I wrote for a similar project:
https://github.com/geohci/miscellaneous-wikimedia/tree/master/editor-turnover

Notably, I'd suggest using the stub history dumps as they are much smaller
because they do not include the actual content. For instance, for March 1st
and English Wikipedia (https://dumps.wikimedia.org/enwiki/20190301/), this
file would be enwiki-20190301-stub-meta-history.xml.gz and is 57.9 GB.

Best,
Isaac

On Tue, Mar 12, 2019 at 3:56 PM Pine W <[hidden email]> wrote:

> Hi Haifeng, thanks for the information. I think that your idea of looking
> in the dumps makes sense. Am I understanding correctly that you would like
> advice regarding how to do that in the most efficient way?
>
> Hi Leila, I believe that I asked for more information regarding Heifeng's
> work. There has been discussion on English Wikipedia regarding volunteers
> being unhappy with the interventions or proposed interventions of
> researchers. I think that asking about the nature of Haifeng's research is
> legitimate, and I tried to provide some examples of possible types of
> research. I'm trying to protect the community from problematic
> interventions, while also welcoming research that is accepted by the
> community.
>
> Pine
> ( https://meta.wikimedia.org/wiki/User:Pine )
>
>
> On Tue, Mar 12, 2019 at 8:00 PM Haifeng Zhang <[hidden email]>
> wrote:
>
> > Pine and Stuart,
> >
> > I meant extracting a random sample of new editors (month by month) from
> > Wikipedia edit history.
> >
> > It is not about survey of new editors, but still thanks for your
> > suggestions.
> >
> >
> > Thanks,
> > Haifeng Zhang
> >
> > Postdoctoral Research Fellow
> > Human-Computer Interaction Institute
> > Carnegie Mellon University
> > ________________________________
> > From: Wiki-research-l <[hidden email]> on
> > behalf of Stuart A. Yeates <[hidden email]>
> > Sent: Tuesday, March 12, 2019 3:46:19 PM
> > To: Research into Wikimedia content and communities
> > Subject: Re: [Wiki-research-l] Sampling new editors in English Wikipedia
> >
> > There are a number of new-editor-heavy noticeboards. I would suggest
> > posting an invite there to your survey (or whatever) If you ask for
> > editor's usernames you can filter out those who don't meet your
> > definition of 'new'
> >
> > I'm thinking of places like:
> > https://en.wikipedia.org/wiki/Wikipedia:Teahouse and
> > https://en.wikipedia.org/wiki/Wikipedia:Help_desk
> >
> > cheers
> > stuart
> >
> >
> > --
> > ...let us be heard from red core to black sky
> >
> > On Wed, 13 Mar 2019 at 08:37, Leila Zia <[hidden email]> wrote:
> > >
> > > Hi Pine,
> > >
> > > Haifeng has a simple question about how to sample editors other than
> > > via dumps. It would be great if someone who knows the answer to help
> > > them to move forward.
> > >
> > > If you are interested to learn more about their research, instead of
> > > answering their question, my recommendation would be to start the
> > > conversation with: "can you tell us more about your research?" kind of
> > > question. I find the current way of communication very speculative,
> > > and that is not good for making a vibrant research community that can
> > > help us address some of our big questions.
> > >
> > > Best,
> > > Leila
> > >
> > > On Tue, Mar 12, 2019 at 12:08 PM Pine W <[hidden email]> wrote:
> > > >
> > > > Hi, can you expand on what you mean by "sample"? If you're referring
> to
> > > > analyzing users' edit histories then that should be fine. However, if
> > > > you're planning to send surveys or messages to them, sending them
> > > > barnstars, or otherwise manipulating their on-wiki experience, that
> > would
> > > > be problematic.
> > > >
> > > > Pine
> > > > ( https://meta.wikimedia.org/wiki/User:Pine )
> > > >
> > > >
> > > > On Tue, Mar 12, 2019 at 6:19 PM Haifeng Zhang <
> [hidden email]
> > >
> > > > wrote:
> > > >
> > > > > Hi folks,
> > > > >
> > > > > My work needs to randomly sample new editors in each month, e.g.,
> 100
> > > > > editors per month.
> > > > >
> > > > > Do any of you have good suggestions for how to do this efficiently?
> > > > >
> > > > > I could think of using the dump files, but wonder are there other
> > options?
> > > > >
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Haifeng Zhang
> > > > > _______________________________________________
> > > > > Wiki-research-l mailing list
> > > > > [hidden email]
> > > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > > >
> > > > _______________________________________________
> > > > Wiki-research-l mailing list
> > > > [hidden email]
> > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > >
> > > _______________________________________________
> > > Wiki-research-l mailing list
> > > [hidden email]
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> > _______________________________________________
> > Wiki-research-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > _______________________________________________
> > Wiki-research-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>


--
Isaac Johnson -- Research Scientist -- Wikimedia Foundation
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Sampling new editors in English Wikipedia

Leila Zia
In reply to this post by Pine W
On Tue, Mar 12, 2019 at 1:56 PM Pine W <[hidden email]> wrote:
>
> Hi Leila, I believe that I asked for more information regarding Heifeng's
> work.

You stated

"However, if you're planning to send surveys or messages to them,
sending them barnstars, or otherwise manipulating their on-wiki
experience, that would be problematic."

and I'm suggesting that you enter from a question angle, please.

> There has been discussion on English Wikipedia regarding volunteers
> being unhappy with the interventions or proposed interventions of
> researchers. I think that asking about the nature of Haifeng's research is
> legitimate, and I tried to provide some examples of possible types of
> research.

Please check your email. There was no question there in the part
related to this discussion. Also, even if there was a question posed,
I highly recommend you enter from a different angle to these
conversations. There are many reasons someone may need the sampled
data of newcomers. A few examples: they may want to test the
assumption whether the arrivals (registrations) to a specific
Wikipedia language follow a Poisson process or not, they may want to
learn about the distribution of topics editors in a given language
edit in the first 24 hours after they open the account, they may want
to build a prediction model to predict whether the editor will make
the n-th edit or not given that they have started at time x, they may
want to see whether external events have strong correlations with
account registration and Wikipedia activity, they may want to see if
the change to HTTPS had impact on registrations, etc. There are
literally millions of questions people may ask (given that the data is
available to them) with respect to Wikipedia. The answer to some of
them may require interaction with Wikipedia editors, the answer to
some may not. So the safest bet to start having a fruitful
conversation is to ask: can you tell us more about what you're trying
to do?

> I'm trying to protect the community from problematic
> interventions, while also welcoming research that is accepted by the
> community.

I understand and I'm looking forward to having conversations with you
all about how to achieve that.

Best,
Leila

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Sampling new editors in English Wikipedia

Stuart A. Yeates
In reply to this post by Isaac Johnson
Note that this code deals with accounts, not editors, which is what
Haifeng asked for.

There are many reasons, both licit and illicit for editors to have
more than one account. I know I have more than ten for
policy-compliant reasons.

cheers
stuart


--
...let us be heard from red core to black sky

On Wed, 13 Mar 2019 at 10:21, Isaac Johnson <[hidden email]> wrote:

>
> Hey Haifeng,
> If you decide to process the dumps, you should be able to easily repurpose
> some quick code that I wrote for a similar project:
> https://github.com/geohci/miscellaneous-wikimedia/tree/master/editor-turnover
>
> Notably, I'd suggest using the stub history dumps as they are much smaller
> because they do not include the actual content. For instance, for March 1st
> and English Wikipedia (https://dumps.wikimedia.org/enwiki/20190301/), this
> file would be enwiki-20190301-stub-meta-history.xml.gz and is 57.9 GB.
>
> Best,
> Isaac
>
> On Tue, Mar 12, 2019 at 3:56 PM Pine W <[hidden email]> wrote:
>
> > Hi Haifeng, thanks for the information. I think that your idea of looking
> > in the dumps makes sense. Am I understanding correctly that you would like
> > advice regarding how to do that in the most efficient way?
> >
> > Hi Leila, I believe that I asked for more information regarding Heifeng's
> > work. There has been discussion on English Wikipedia regarding volunteers
> > being unhappy with the interventions or proposed interventions of
> > researchers. I think that asking about the nature of Haifeng's research is
> > legitimate, and I tried to provide some examples of possible types of
> > research. I'm trying to protect the community from problematic
> > interventions, while also welcoming research that is accepted by the
> > community.
> >
> > Pine
> > ( https://meta.wikimedia.org/wiki/User:Pine )
> >
> >
> > On Tue, Mar 12, 2019 at 8:00 PM Haifeng Zhang <[hidden email]>
> > wrote:
> >
> > > Pine and Stuart,
> > >
> > > I meant extracting a random sample of new editors (month by month) from
> > > Wikipedia edit history.
> > >
> > > It is not about survey of new editors, but still thanks for your
> > > suggestions.
> > >
> > >
> > > Thanks,
> > > Haifeng Zhang
> > >
> > > Postdoctoral Research Fellow
> > > Human-Computer Interaction Institute
> > > Carnegie Mellon University
> > > ________________________________
> > > From: Wiki-research-l <[hidden email]> on
> > > behalf of Stuart A. Yeates <[hidden email]>
> > > Sent: Tuesday, March 12, 2019 3:46:19 PM
> > > To: Research into Wikimedia content and communities
> > > Subject: Re: [Wiki-research-l] Sampling new editors in English Wikipedia
> > >
> > > There are a number of new-editor-heavy noticeboards. I would suggest
> > > posting an invite there to your survey (or whatever) If you ask for
> > > editor's usernames you can filter out those who don't meet your
> > > definition of 'new'
> > >
> > > I'm thinking of places like:
> > > https://en.wikipedia.org/wiki/Wikipedia:Teahouse and
> > > https://en.wikipedia.org/wiki/Wikipedia:Help_desk
> > >
> > > cheers
> > > stuart
> > >
> > >
> > > --
> > > ...let us be heard from red core to black sky
> > >
> > > On Wed, 13 Mar 2019 at 08:37, Leila Zia <[hidden email]> wrote:
> > > >
> > > > Hi Pine,
> > > >
> > > > Haifeng has a simple question about how to sample editors other than
> > > > via dumps. It would be great if someone who knows the answer to help
> > > > them to move forward.
> > > >
> > > > If you are interested to learn more about their research, instead of
> > > > answering their question, my recommendation would be to start the
> > > > conversation with: "can you tell us more about your research?" kind of
> > > > question. I find the current way of communication very speculative,
> > > > and that is not good for making a vibrant research community that can
> > > > help us address some of our big questions.
> > > >
> > > > Best,
> > > > Leila
> > > >
> > > > On Tue, Mar 12, 2019 at 12:08 PM Pine W <[hidden email]> wrote:
> > > > >
> > > > > Hi, can you expand on what you mean by "sample"? If you're referring
> > to
> > > > > analyzing users' edit histories then that should be fine. However, if
> > > > > you're planning to send surveys or messages to them, sending them
> > > > > barnstars, or otherwise manipulating their on-wiki experience, that
> > > would
> > > > > be problematic.
> > > > >
> > > > > Pine
> > > > > ( https://meta.wikimedia.org/wiki/User:Pine )
> > > > >
> > > > >
> > > > > On Tue, Mar 12, 2019 at 6:19 PM Haifeng Zhang <
> > [hidden email]
> > > >
> > > > > wrote:
> > > > >
> > > > > > Hi folks,
> > > > > >
> > > > > > My work needs to randomly sample new editors in each month, e.g.,
> > 100
> > > > > > editors per month.
> > > > > >
> > > > > > Do any of you have good suggestions for how to do this efficiently?
> > > > > >
> > > > > > I could think of using the dump files, but wonder are there other
> > > options?
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Haifeng Zhang
> > > > > > _______________________________________________
> > > > > > Wiki-research-l mailing list
> > > > > > [hidden email]
> > > > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > > > >
> > > > > _______________________________________________
> > > > > Wiki-research-l mailing list
> > > > > [hidden email]
> > > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > >
> > > > _______________________________________________
> > > > Wiki-research-l mailing list
> > > > [hidden email]
> > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > >
> > > _______________________________________________
> > > Wiki-research-l mailing list
> > > [hidden email]
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > _______________________________________________
> > > Wiki-research-l mailing list
> > > [hidden email]
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > >
> > _______________________________________________
> > Wiki-research-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
>
>
> --
> Isaac Johnson -- Research Scientist -- Wikimedia Foundation
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Sampling new editors in English Wikipedia

Isaac Johnson
Yes, thanks for the clarification Stuart. I don't know of any statistics to
suggest how widespread this is, but it might be worth checking, especially
if you are focusing on editors with higher edit counts (who I suspect are
more likely to have multiple accounts for licit reasons).

On Tue, Mar 12, 2019 at 4:34 PM Stuart A. Yeates <[hidden email]> wrote:

> Note that this code deals with accounts, not editors, which is what
> Haifeng asked for.
>
> There are many reasons, both licit and illicit for editors to have
> more than one account. I know I have more than ten for
> policy-compliant reasons.
>
> cheers
> stuart
>
>
> --
> ...let us be heard from red core to black sky
>
> On Wed, 13 Mar 2019 at 10:21, Isaac Johnson <[hidden email]> wrote:
> >
> > Hey Haifeng,
> > If you decide to process the dumps, you should be able to easily
> repurpose
> > some quick code that I wrote for a similar project:
> >
> https://github.com/geohci/miscellaneous-wikimedia/tree/master/editor-turnover
> >
> > Notably, I'd suggest using the stub history dumps as they are much
> smaller
> > because they do not include the actual content. For instance, for March
> 1st
> > and English Wikipedia (https://dumps.wikimedia.org/enwiki/20190301/),
> this
> > file would be enwiki-20190301-stub-meta-history.xml.gz and is 57.9 GB.
> >
> > Best,
> > Isaac
> >
> > On Tue, Mar 12, 2019 at 3:56 PM Pine W <[hidden email]> wrote:
> >
> > > Hi Haifeng, thanks for the information. I think that your idea of
> looking
> > > in the dumps makes sense. Am I understanding correctly that you would
> like
> > > advice regarding how to do that in the most efficient way?
> > >
> > > Hi Leila, I believe that I asked for more information regarding
> Heifeng's
> > > work. There has been discussion on English Wikipedia regarding
> volunteers
> > > being unhappy with the interventions or proposed interventions of
> > > researchers. I think that asking about the nature of Haifeng's
> research is
> > > legitimate, and I tried to provide some examples of possible types of
> > > research. I'm trying to protect the community from problematic
> > > interventions, while also welcoming research that is accepted by the
> > > community.
> > >
> > > Pine
> > > ( https://meta.wikimedia.org/wiki/User:Pine )
> > >
> > >
> > > On Tue, Mar 12, 2019 at 8:00 PM Haifeng Zhang <[hidden email]
> >
> > > wrote:
> > >
> > > > Pine and Stuart,
> > > >
> > > > I meant extracting a random sample of new editors (month by month)
> from
> > > > Wikipedia edit history.
> > > >
> > > > It is not about survey of new editors, but still thanks for your
> > > > suggestions.
> > > >
> > > >
> > > > Thanks,
> > > > Haifeng Zhang
> > > >
> > > > Postdoctoral Research Fellow
> > > > Human-Computer Interaction Institute
> > > > Carnegie Mellon University
> > > > ________________________________
> > > > From: Wiki-research-l <[hidden email]>
> on
> > > > behalf of Stuart A. Yeates <[hidden email]>
> > > > Sent: Tuesday, March 12, 2019 3:46:19 PM
> > > > To: Research into Wikimedia content and communities
> > > > Subject: Re: [Wiki-research-l] Sampling new editors in English
> Wikipedia
> > > >
> > > > There are a number of new-editor-heavy noticeboards. I would suggest
> > > > posting an invite there to your survey (or whatever) If you ask for
> > > > editor's usernames you can filter out those who don't meet your
> > > > definition of 'new'
> > > >
> > > > I'm thinking of places like:
> > > > https://en.wikipedia.org/wiki/Wikipedia:Teahouse and
> > > > https://en.wikipedia.org/wiki/Wikipedia:Help_desk
> > > >
> > > > cheers
> > > > stuart
> > > >
> > > >
> > > > --
> > > > ...let us be heard from red core to black sky
> > > >
> > > > On Wed, 13 Mar 2019 at 08:37, Leila Zia <[hidden email]> wrote:
> > > > >
> > > > > Hi Pine,
> > > > >
> > > > > Haifeng has a simple question about how to sample editors other
> than
> > > > > via dumps. It would be great if someone who knows the answer to
> help
> > > > > them to move forward.
> > > > >
> > > > > If you are interested to learn more about their research, instead
> of
> > > > > answering their question, my recommendation would be to start the
> > > > > conversation with: "can you tell us more about your research?"
> kind of
> > > > > question. I find the current way of communication very speculative,
> > > > > and that is not good for making a vibrant research community that
> can
> > > > > help us address some of our big questions.
> > > > >
> > > > > Best,
> > > > > Leila
> > > > >
> > > > > On Tue, Mar 12, 2019 at 12:08 PM Pine W <[hidden email]>
> wrote:
> > > > > >
> > > > > > Hi, can you expand on what you mean by "sample"? If you're
> referring
> > > to
> > > > > > analyzing users' edit histories then that should be fine.
> However, if
> > > > > > you're planning to send surveys or messages to them, sending them
> > > > > > barnstars, or otherwise manipulating their on-wiki experience,
> that
> > > > would
> > > > > > be problematic.
> > > > > >
> > > > > > Pine
> > > > > > ( https://meta.wikimedia.org/wiki/User:Pine )
> > > > > >
> > > > > >
> > > > > > On Tue, Mar 12, 2019 at 6:19 PM Haifeng Zhang <
> > > [hidden email]
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi folks,
> > > > > > >
> > > > > > > My work needs to randomly sample new editors in each month,
> e.g.,
> > > 100
> > > > > > > editors per month.
> > > > > > >
> > > > > > > Do any of you have good suggestions for how to do this
> efficiently?
> > > > > > >
> > > > > > > I could think of using the dump files, but wonder are there
> other
> > > > options?
> > > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Haifeng Zhang
> > > > > > > _______________________________________________
> > > > > > > Wiki-research-l mailing list
> > > > > > > [hidden email]
> > > > > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > > > > >
> > > > > > _______________________________________________
> > > > > > Wiki-research-l mailing list
> > > > > > [hidden email]
> > > > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > > >
> > > > > _______________________________________________
> > > > > Wiki-research-l mailing list
> > > > > [hidden email]
> > > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > >
> > > > _______________________________________________
> > > > Wiki-research-l mailing list
> > > > [hidden email]
> > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > > _______________________________________________
> > > > Wiki-research-l mailing list
> > > > [hidden email]
> > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > >
> > > _______________________________________________
> > > Wiki-research-l mailing list
> > > [hidden email]
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > >
> >
> >
> > --
> > Isaac Johnson -- Research Scientist -- Wikimedia Foundation
> > _______________________________________________
> > Wiki-research-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>


--
Isaac Johnson -- Research Scientist -- Wikimedia Foundation
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Sampling new editors in English Wikipedia

Stuart A. Yeates
There are thousands and thousands of editors with multiple accounts.
Those who have been bothered to add a category are listed at
https://en.wikipedia.org/wiki/Category:Wikipedians_with_alternative_accounts

Many editors who engage in outreach are advised to create new accounts
for themselves regularly, simply because the experience of new account
creation changes over time and helping users streamline that
(especially in situations such as editathons) requires thorough
knowledge of account creation and the things that can make it go
wrong. Pretty much a prerequisite for the old  accountcreator
userright https://en.wikipedia.org/wiki/Wikipedia:Account_creator
(which I've had on several occasions) and the new eventcoordinator
userright  https://en.wikipedia.org/wiki/Wikipedia:Event_coordinator
(which is too new for me to have had yet).

cheers
stuart
--
...let us be heard from red core to black sky

On Wed, 13 Mar 2019 at 10:40, Isaac Johnson <[hidden email]> wrote:

>
> Yes, thanks for the clarification Stuart. I don't know of any statistics to
> suggest how widespread this is, but it might be worth checking, especially
> if you are focusing on editors with higher edit counts (who I suspect are
> more likely to have multiple accounts for licit reasons).
>
> On Tue, Mar 12, 2019 at 4:34 PM Stuart A. Yeates <[hidden email]> wrote:
>
> > Note that this code deals with accounts, not editors, which is what
> > Haifeng asked for.
> >
> > There are many reasons, both licit and illicit for editors to have
> > more than one account. I know I have more than ten for
> > policy-compliant reasons.
> >
> > cheers
> > stuart
> >
> >
> > --
> > ...let us be heard from red core to black sky
> >
> > On Wed, 13 Mar 2019 at 10:21, Isaac Johnson <[hidden email]> wrote:
> > >
> > > Hey Haifeng,
> > > If you decide to process the dumps, you should be able to easily
> > repurpose
> > > some quick code that I wrote for a similar project:
> > >
> > https://github.com/geohci/miscellaneous-wikimedia/tree/master/editor-turnover
> > >
> > > Notably, I'd suggest using the stub history dumps as they are much
> > smaller
> > > because they do not include the actual content. For instance, for March
> > 1st
> > > and English Wikipedia (https://dumps.wikimedia.org/enwiki/20190301/),
> > this
> > > file would be enwiki-20190301-stub-meta-history.xml.gz and is 57.9 GB.
> > >
> > > Best,
> > > Isaac
> > >
> > > On Tue, Mar 12, 2019 at 3:56 PM Pine W <[hidden email]> wrote:
> > >
> > > > Hi Haifeng, thanks for the information. I think that your idea of
> > looking
> > > > in the dumps makes sense. Am I understanding correctly that you would
> > like
> > > > advice regarding how to do that in the most efficient way?
> > > >
> > > > Hi Leila, I believe that I asked for more information regarding
> > Heifeng's
> > > > work. There has been discussion on English Wikipedia regarding
> > volunteers
> > > > being unhappy with the interventions or proposed interventions of
> > > > researchers. I think that asking about the nature of Haifeng's
> > research is
> > > > legitimate, and I tried to provide some examples of possible types of
> > > > research. I'm trying to protect the community from problematic
> > > > interventions, while also welcoming research that is accepted by the
> > > > community.
> > > >
> > > > Pine
> > > > ( https://meta.wikimedia.org/wiki/User:Pine )
> > > >
> > > >
> > > > On Tue, Mar 12, 2019 at 8:00 PM Haifeng Zhang <[hidden email]
> > >
> > > > wrote:
> > > >
> > > > > Pine and Stuart,
> > > > >
> > > > > I meant extracting a random sample of new editors (month by month)
> > from
> > > > > Wikipedia edit history.
> > > > >
> > > > > It is not about survey of new editors, but still thanks for your
> > > > > suggestions.
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Haifeng Zhang
> > > > >
> > > > > Postdoctoral Research Fellow
> > > > > Human-Computer Interaction Institute
> > > > > Carnegie Mellon University
> > > > > ________________________________
> > > > > From: Wiki-research-l <[hidden email]>
> > on
> > > > > behalf of Stuart A. Yeates <[hidden email]>
> > > > > Sent: Tuesday, March 12, 2019 3:46:19 PM
> > > > > To: Research into Wikimedia content and communities
> > > > > Subject: Re: [Wiki-research-l] Sampling new editors in English
> > Wikipedia
> > > > >
> > > > > There are a number of new-editor-heavy noticeboards. I would suggest
> > > > > posting an invite there to your survey (or whatever) If you ask for
> > > > > editor's usernames you can filter out those who don't meet your
> > > > > definition of 'new'
> > > > >
> > > > > I'm thinking of places like:
> > > > > https://en.wikipedia.org/wiki/Wikipedia:Teahouse and
> > > > > https://en.wikipedia.org/wiki/Wikipedia:Help_desk
> > > > >
> > > > > cheers
> > > > > stuart
> > > > >
> > > > >
> > > > > --
> > > > > ...let us be heard from red core to black sky
> > > > >
> > > > > On Wed, 13 Mar 2019 at 08:37, Leila Zia <[hidden email]> wrote:
> > > > > >
> > > > > > Hi Pine,
> > > > > >
> > > > > > Haifeng has a simple question about how to sample editors other
> > than
> > > > > > via dumps. It would be great if someone who knows the answer to
> > help
> > > > > > them to move forward.
> > > > > >
> > > > > > If you are interested to learn more about their research, instead
> > of
> > > > > > answering their question, my recommendation would be to start the
> > > > > > conversation with: "can you tell us more about your research?"
> > kind of
> > > > > > question. I find the current way of communication very speculative,
> > > > > > and that is not good for making a vibrant research community that
> > can
> > > > > > help us address some of our big questions.
> > > > > >
> > > > > > Best,
> > > > > > Leila
> > > > > >
> > > > > > On Tue, Mar 12, 2019 at 12:08 PM Pine W <[hidden email]>
> > wrote:
> > > > > > >
> > > > > > > Hi, can you expand on what you mean by "sample"? If you're
> > referring
> > > > to
> > > > > > > analyzing users' edit histories then that should be fine.
> > However, if
> > > > > > > you're planning to send surveys or messages to them, sending them
> > > > > > > barnstars, or otherwise manipulating their on-wiki experience,
> > that
> > > > > would
> > > > > > > be problematic.
> > > > > > >
> > > > > > > Pine
> > > > > > > ( https://meta.wikimedia.org/wiki/User:Pine )
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Mar 12, 2019 at 6:19 PM Haifeng Zhang <
> > > > [hidden email]
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi folks,
> > > > > > > >
> > > > > > > > My work needs to randomly sample new editors in each month,
> > e.g.,
> > > > 100
> > > > > > > > editors per month.
> > > > > > > >
> > > > > > > > Do any of you have good suggestions for how to do this
> > efficiently?
> > > > > > > >
> > > > > > > > I could think of using the dump files, but wonder are there
> > other
> > > > > options?
> > > > > > > >
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Haifeng Zhang
> > > > > > > > _______________________________________________
> > > > > > > > Wiki-research-l mailing list
> > > > > > > > [hidden email]
> > > > > > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > > > > > >
> > > > > > > _______________________________________________
> > > > > > > Wiki-research-l mailing list
> > > > > > > [hidden email]
> > > > > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > > > >
> > > > > > _______________________________________________
> > > > > > Wiki-research-l mailing list
> > > > > > [hidden email]
> > > > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > > >
> > > > > _______________________________________________
> > > > > Wiki-research-l mailing list
> > > > > [hidden email]
> > > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > > > _______________________________________________
> > > > > Wiki-research-l mailing list
> > > > > [hidden email]
> > > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > > >
> > > > _______________________________________________
> > > > Wiki-research-l mailing list
> > > > [hidden email]
> > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > >
> > >
> > >
> > > --
> > > Isaac Johnson -- Research Scientist -- Wikimedia Foundation
> > > _______________________________________________
> > > Wiki-research-l mailing list
> > > [hidden email]
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> > _______________________________________________
> > Wiki-research-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
>
>
> --
> Isaac Johnson -- Research Scientist -- Wikimedia Foundation
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Sampling new editors in English Wikipedia

Pine W
In reply to this post by Leila Zia
Leila, can we discuss this off list?

Thanks,

Pine
( https://meta.wikimedia.org/wiki/User:Pine )


On Tue, Mar 12, 2019 at 9:29 PM Leila Zia <[hidden email]> wrote:

> On Tue, Mar 12, 2019 at 1:56 PM Pine W <[hidden email]> wrote:
> >
> > Hi Leila, I believe that I asked for more information regarding Heifeng's
> > work.
>
> You stated
>
> "However, if you're planning to send surveys or messages to them,
> sending them barnstars, or otherwise manipulating their on-wiki
> experience, that would be problematic."
>
> and I'm suggesting that you enter from a question angle, please.
>

> > There has been discussion on English Wikipedia regarding volunteers
> > being unhappy with the interventions or proposed interventions of
> > researchers. I think that asking about the nature of Haifeng's research
> is
> > legitimate, and I tried to provide some examples of possible types of
> > research.
>
> Please check your email. There was no question there in the part
> related to this discussion. Also, even if there was a question posed,
> I highly recommend you enter from a different angle to these
> conversations. There are many reasons someone may need the sampled
> data of newcomers. A few examples: they may want to test the
> assumption whether the arrivals (registrations) to a specific
> Wikipedia language follow a Poisson process or not, they may want to
> learn about the distribution of topics editors in a given language
> edit in the first 24 hours after they open the account, they may want
> to build a prediction model to predict whether the editor will make
> the n-th edit or not given that they have started at time x, they may
> want to see whether external events have strong correlations with
> account registration and Wikipedia activity, they may want to see if
> the change to HTTPS had impact on registrations, etc. There are
> literally millions of questions people may ask (given that the data is
> available to them) with respect to Wikipedia. The answer to some of
> them may require interaction with Wikipedia editors, the answer to
> some may not. So the safest bet to start having a fruitful
> conversation is to ask: can you tell us more about what you're trying
> to do?
>
> > I'm trying to protect the community from problematic
> > interventions, while also welcoming research that is accepted by the
> > community.
>
> I understand and I'm looking forward to having conversations with you
> all about how to achieve that.
>
> Best,
> Leila
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Sampling new editors in English Wikipedia

Leila Zia
Let's do it.


On Tue, Mar 12, 2019 at 3:04 PM Pine W <[hidden email]> wrote:

>
> Leila, can we discuss this off list?
>
> Thanks,
>
> Pine
> ( https://meta.wikimedia.org/wiki/User:Pine )
>
>
> On Tue, Mar 12, 2019 at 9:29 PM Leila Zia <[hidden email]> wrote:
>
> > On Tue, Mar 12, 2019 at 1:56 PM Pine W <[hidden email]> wrote:
> > >
> > > Hi Leila, I believe that I asked for more information regarding Heifeng's
> > > work.
> >
> > You stated
> >
> > "However, if you're planning to send surveys or messages to them,
> > sending them barnstars, or otherwise manipulating their on-wiki
> > experience, that would be problematic."
> >
> > and I'm suggesting that you enter from a question angle, please.
> >
>
> > > There has been discussion on English Wikipedia regarding volunteers
> > > being unhappy with the interventions or proposed interventions of
> > > researchers. I think that asking about the nature of Haifeng's research
> > is
> > > legitimate, and I tried to provide some examples of possible types of
> > > research.
> >
> > Please check your email. There was no question there in the part
> > related to this discussion. Also, even if there was a question posed,
> > I highly recommend you enter from a different angle to these
> > conversations. There are many reasons someone may need the sampled
> > data of newcomers. A few examples: they may want to test the
> > assumption whether the arrivals (registrations) to a specific
> > Wikipedia language follow a Poisson process or not, they may want to
> > learn about the distribution of topics editors in a given language
> > edit in the first 24 hours after they open the account, they may want
> > to build a prediction model to predict whether the editor will make
> > the n-th edit or not given that they have started at time x, they may
> > want to see whether external events have strong correlations with
> > account registration and Wikipedia activity, they may want to see if
> > the change to HTTPS had impact on registrations, etc. There are
> > literally millions of questions people may ask (given that the data is
> > available to them) with respect to Wikipedia. The answer to some of
> > them may require interaction with Wikipedia editors, the answer to
> > some may not. So the safest bet to start having a fruitful
> > conversation is to ask: can you tell us more about what you're trying
> > to do?
> >
> > > I'm trying to protect the community from problematic
> > > interventions, while also welcoming research that is accepted by the
> > > community.
> >
> > I understand and I'm looking forward to having conversations with you
> > all about how to achieve that.
> >
> > Best,
> > Leila
> >
> > _______________________________________________
> > Wiki-research-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
fn
Reply | Threaded
Open this post in threaded view
|

Re: Sampling new editors in English Wikipedia

fn
In reply to this post by Haifeng Zhang
Haifeng ,


While some suggests the dumps or notice boards, my immediate thought was
a database query, e.g., through Quarry. It just happens that Jonathan T.
Morgan has created a query there:

https://quarry.wmflabs.org/query/310

SELECT user_id, user_name, user_registration, user_editcount
        FROM enwiki_p.user
        WHERE user_registration > DATE_FORMAT(DATE_SUB(NOW(),INTERVAL 1
DAY),'%Y%m%d%H%i%s')
        AND user_editcount > 10
        AND user_id NOT IN (SELECT ug_user FROM enwiki_p.user_groups WHERE
ug_group = 'bot')
        AND user_name not in (SELECT REPLACE(log_title,"_"," ") from
enwiki_p.logging
                where log_type = "block" and log_action = "block"
                and log_timestamp >  DATE_FORMAT(DATE_SUB(NOW(),INTERVAL 2
DAY),'%Y%m%d%H%i%s'));


You may fork from that query. There is R. Stuart Geiger (Staeiou)'s fork
here https://quarry.wmflabs.org/query/34256 querying for month, - as
another example.



Finn Årup Nielsen
http://people.compute.dtu.dk/faan/


On 12/03/2019 19:18, Haifeng Zhang wrote:

> Hi folks,
>
> My work needs to randomly sample new editors in each month, e.g., 100 editors per month.
>
> Do any of you have good suggestions for how to do this efficiently?
>
> I could think of using the dump files, but wonder are there other options?
>
>
> Thanks,
>
> Haifeng Zhang
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Sampling new editors in English Wikipedia

Haifeng Zhang
Thanks for pointing me to Quarray, Finn.

I tried a couple queries, but not sure why all took forever to get result.

Is it possible to download relevant Media Wiki database tables (e.g., user, user_groups, logging) and run SQL in my local machine?


Thanks,

Haifeng Zhang
________________________________
From: Wiki-research-l <[hidden email]> on behalf of [hidden email] <[hidden email]>
Sent: Tuesday, March 12, 2019 7:25:53 PM
To: [hidden email]
Subject: Re: [Wiki-research-l] Sampling new editors in English Wikipedia

Haifeng ,


While some suggests the dumps or notice boards, my immediate thought was
a database query, e.g., through Quarry. It just happens that Jonathan T.
Morgan has created a query there:

https://quarry.wmflabs.org/query/310

SELECT user_id, user_name, user_registration, user_editcount
        FROM enwiki_p.user
        WHERE user_registration > DATE_FORMAT(DATE_SUB(NOW(),INTERVAL 1
DAY),'%Y%m%d%H%i%s')
        AND user_editcount > 10
        AND user_id NOT IN (SELECT ug_user FROM enwiki_p.user_groups WHERE
ug_group = 'bot')
        AND user_name not in (SELECT REPLACE(log_title,"_"," ") from
enwiki_p.logging
                where log_type = "block" and log_action = "block"
                and log_timestamp >  DATE_FORMAT(DATE_SUB(NOW(),INTERVAL 2
DAY),'%Y%m%d%H%i%s'));


You may fork from that query. There is R. Stuart Geiger (Staeiou)'s fork
here https://quarry.wmflabs.org/query/34256 querying for month, - as
another example.



Finn Årup Nielsen
http://people.compute.dtu.dk/faan/


On 12/03/2019 19:18, Haifeng Zhang wrote:

> Hi folks,
>
> My work needs to randomly sample new editors in each month, e.g., 100 editors per month.
>
> Do any of you have good suggestions for how to do this efficiently?
>
> I could think of using the dump files, but wonder are there other options?
>
>
> Thanks,
>
> Haifeng Zhang
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Sampling new editors in English Wikipedia

Haifeng Zhang
In reply to this post by Isaac Johnson
This can be a good option too. Thanks, Issac.


Haifeng Zhang

Postdoctoral Research Fellow
Human-Computer Interaction Institute
Carnegie Mellon University
________________________________
From: Wiki-research-l <[hidden email]> on behalf of Isaac Johnson <[hidden email]>
Sent: Tuesday, March 12, 2019 5:21:11 PM
To: Research into Wikimedia content and communities
Subject: Re: [Wiki-research-l] Sampling new editors in English Wikipedia

Hey Haifeng,
If you decide to process the dumps, you should be able to easily repurpose
some quick code that I wrote for a similar project:
https://github.com/geohci/miscellaneous-wikimedia/tree/master/editor-turnover

Notably, I'd suggest using the stub history dumps as they are much smaller
because they do not include the actual content. For instance, for March 1st
and English Wikipedia (https://dumps.wikimedia.org/enwiki/20190301/), this
file would be enwiki-20190301-stub-meta-history.xml.gz and is 57.9 GB.

Best,
Isaac

On Tue, Mar 12, 2019 at 3:56 PM Pine W <[hidden email]> wrote:

> Hi Haifeng, thanks for the information. I think that your idea of looking
> in the dumps makes sense. Am I understanding correctly that you would like
> advice regarding how to do that in the most efficient way?
>
> Hi Leila, I believe that I asked for more information regarding Heifeng's
> work. There has been discussion on English Wikipedia regarding volunteers
> being unhappy with the interventions or proposed interventions of
> researchers. I think that asking about the nature of Haifeng's research is
> legitimate, and I tried to provide some examples of possible types of
> research. I'm trying to protect the community from problematic
> interventions, while also welcoming research that is accepted by the
> community.
>
> Pine
> ( https://meta.wikimedia.org/wiki/User:Pine )
>
>
> On Tue, Mar 12, 2019 at 8:00 PM Haifeng Zhang <[hidden email]>
> wrote:
>
> > Pine and Stuart,
> >
> > I meant extracting a random sample of new editors (month by month) from
> > Wikipedia edit history.
> >
> > It is not about survey of new editors, but still thanks for your
> > suggestions.
> >
> >
> > Thanks,
> > Haifeng Zhang
> >
> > Postdoctoral Research Fellow
> > Human-Computer Interaction Institute
> > Carnegie Mellon University
> > ________________________________
> > From: Wiki-research-l <[hidden email]> on
> > behalf of Stuart A. Yeates <[hidden email]>
> > Sent: Tuesday, March 12, 2019 3:46:19 PM
> > To: Research into Wikimedia content and communities
> > Subject: Re: [Wiki-research-l] Sampling new editors in English Wikipedia
> >
> > There are a number of new-editor-heavy noticeboards. I would suggest
> > posting an invite there to your survey (or whatever) If you ask for
> > editor's usernames you can filter out those who don't meet your
> > definition of 'new'
> >
> > I'm thinking of places like:
> > https://en.wikipedia.org/wiki/Wikipedia:Teahouse and
> > https://en.wikipedia.org/wiki/Wikipedia:Help_desk
> >
> > cheers
> > stuart
> >
> >
> > --
> > ...let us be heard from red core to black sky
> >
> > On Wed, 13 Mar 2019 at 08:37, Leila Zia <[hidden email]> wrote:
> > >
> > > Hi Pine,
> > >
> > > Haifeng has a simple question about how to sample editors other than
> > > via dumps. It would be great if someone who knows the answer to help
> > > them to move forward.
> > >
> > > If you are interested to learn more about their research, instead of
> > > answering their question, my recommendation would be to start the
> > > conversation with: "can you tell us more about your research?" kind of
> > > question. I find the current way of communication very speculative,
> > > and that is not good for making a vibrant research community that can
> > > help us address some of our big questions.
> > >
> > > Best,
> > > Leila
> > >
> > > On Tue, Mar 12, 2019 at 12:08 PM Pine W <[hidden email]> wrote:
> > > >
> > > > Hi, can you expand on what you mean by "sample"? If you're referring
> to
> > > > analyzing users' edit histories then that should be fine. However, if
> > > > you're planning to send surveys or messages to them, sending them
> > > > barnstars, or otherwise manipulating their on-wiki experience, that
> > would
> > > > be problematic.
> > > >
> > > > Pine
> > > > ( https://meta.wikimedia.org/wiki/User:Pine )
> > > >
> > > >
> > > > On Tue, Mar 12, 2019 at 6:19 PM Haifeng Zhang <
> [hidden email]
> > >
> > > > wrote:
> > > >
> > > > > Hi folks,
> > > > >
> > > > > My work needs to randomly sample new editors in each month, e.g.,
> 100
> > > > > editors per month.
> > > > >
> > > > > Do any of you have good suggestions for how to do this efficiently?
> > > > >
> > > > > I could think of using the dump files, but wonder are there other
> > options?
> > > > >
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Haifeng Zhang
> > > > > _______________________________________________
> > > > > Wiki-research-l mailing list
> > > > > [hidden email]
> > > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > > >
> > > > _______________________________________________
> > > > Wiki-research-l mailing list
> > > > [hidden email]
> > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > >
> > > _______________________________________________
> > > Wiki-research-l mailing list
> > > [hidden email]
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> > _______________________________________________
> > Wiki-research-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > _______________________________________________
> > Wiki-research-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>


--
Isaac Johnson -- Research Scientist -- Wikimedia Foundation
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
fn
Reply | Threaded
Open this post in threaded view
|

Re: Sampling new editors in English Wikipedia

fn
In reply to this post by Haifeng Zhang
Haifeng,


On 13/03/2019 15:56, Haifeng Zhang wrote:
> Thanks for pointing me to Quarray, Finn.
>
> I tried a couple queries, but not sure why all took forever to get result.

I am not familiar with Quarry. It might have a timeout. The user table
associated with the English Wikipedia is quite large, so any operation
on that may take long time.

You might be able to get "timein" with a simplified SQL. For instance,
the query below takes 52.35 seconds:

USE enwiki_p;

SELECT user_id, user_name, user_registration, user_editcount
        FROM user
LIMIT 1000
OFFSET 32000000



> Is it possible to download relevant Media Wiki database tables (e.g., user, user_groups, logging) and run SQL in my local machine?

There are SQL files available here
https://dumps.wikimedia.org/enwiki/20190301/ but I do not think the user
table is there, - at least I cannot identify it. Perhaps other people
would know.

You might be able try the Toolforge https://tools.wmflabs.org/ You
should be able to access the tables via mysql on the prompt.

Login to dev.tools.wmflabs.org
Then do "sql enwiki"

Read more about Toolforge here:
https://wikitech.wikimedia.org/wiki/Help:Toolforge


/Finn

>
> Thanks,
>
> Haifeng Zhang
> ________________________________
> From: Wiki-research-l <[hidden email]> on behalf of [hidden email] <[hidden email]>
> Sent: Tuesday, March 12, 2019 7:25:53 PM
> To: [hidden email]
> Subject: Re: [Wiki-research-l] Sampling new editors in English Wikipedia
>
> Haifeng ,
>
>
> While some suggests the dumps or notice boards, my immediate thought was
> a database query, e.g., through Quarry. It just happens that Jonathan T.
> Morgan has created a query there:
>
> https://quarry.wmflabs.org/query/310
>
> SELECT user_id, user_name, user_registration, user_editcount
>          FROM enwiki_p.user
>          WHERE user_registration > DATE_FORMAT(DATE_SUB(NOW(),INTERVAL 1
> DAY),'%Y%m%d%H%i%s')
>          AND user_editcount > 10
>          AND user_id NOT IN (SELECT ug_user FROM enwiki_p.user_groups WHERE
> ug_group = 'bot')
>          AND user_name not in (SELECT REPLACE(log_title,"_"," ") from
> enwiki_p.logging
>                  where log_type = "block" and log_action = "block"
>                  and log_timestamp >  DATE_FORMAT(DATE_SUB(NOW(),INTERVAL 2
> DAY),'%Y%m%d%H%i%s'));
>
>
> You may fork from that query. There is R. Stuart Geiger (Staeiou)'s fork
> here https://quarry.wmflabs.org/query/34256 querying for month, - as
> another example.
>
>
>
> Finn Årup Nielsen
> http://people.compute.dtu.dk/faan/
>
>
> On 12/03/2019 19:18, Haifeng Zhang wrote:
>> Hi folks,
>>
>> My work needs to randomly sample new editors in each month, e.g., 100 editors per month.
>>
>> Do any of you have good suggestions for how to do this efficiently?
>>
>> I could think of using the dump files, but wonder are there other options?
>>
>>
>> Thanks,
>>
>> Haifeng Zhang
>> _______________________________________________
>> Wiki-research-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Sampling new editors in English Wikipedia

Haifeng Zhang
Thanks a lot for help, Finn. Now my query can draw sample of new registered editors.


Best,

Haifeng Zhang
________________________________
From: Wiki-research-l <[hidden email]> on behalf of [hidden email] <[hidden email]>
Sent: Wednesday, March 13, 2019 12:01:59 PM
To: [hidden email]
Subject: Re: [Wiki-research-l] Sampling new editors in English Wikipedia

Haifeng,


On 13/03/2019 15:56, Haifeng Zhang wrote:
> Thanks for pointing me to Quarray, Finn.
>
> I tried a couple queries, but not sure why all took forever to get result.

I am not familiar with Quarry. It might have a timeout. The user table
associated with the English Wikipedia is quite large, so any operation
on that may take long time.

You might be able to get "timein" with a simplified SQL. For instance,
the query below takes 52.35 seconds:

USE enwiki_p;

SELECT user_id, user_name, user_registration, user_editcount
        FROM user
LIMIT 1000
OFFSET 32000000



> Is it possible to download relevant Media Wiki database tables (e.g., user, user_groups, logging) and run SQL in my local machine?

There are SQL files available here
https://dumps.wikimedia.org/enwiki/20190301/ but I do not think the user
table is there, - at least I cannot identify it. Perhaps other people
would know.

You might be able try the Toolforge https://tools.wmflabs.org/ You
should be able to access the tables via mysql on the prompt.

Login to dev.tools.wmflabs.org
Then do "sql enwiki"

Read more about Toolforge here:
https://wikitech.wikimedia.org/wiki/Help:Toolforge


/Finn

>
> Thanks,
>
> Haifeng Zhang
> ________________________________
> From: Wiki-research-l <[hidden email]> on behalf of [hidden email] <[hidden email]>
> Sent: Tuesday, March 12, 2019 7:25:53 PM
> To: [hidden email]
> Subject: Re: [Wiki-research-l] Sampling new editors in English Wikipedia
>
> Haifeng ,
>
>
> While some suggests the dumps or notice boards, my immediate thought was
> a database query, e.g., through Quarry. It just happens that Jonathan T.
> Morgan has created a query there:
>
> https://quarry.wmflabs.org/query/310
>
> SELECT user_id, user_name, user_registration, user_editcount
>          FROM enwiki_p.user
>          WHERE user_registration > DATE_FORMAT(DATE_SUB(NOW(),INTERVAL 1
> DAY),'%Y%m%d%H%i%s')
>          AND user_editcount > 10
>          AND user_id NOT IN (SELECT ug_user FROM enwiki_p.user_groups WHERE
> ug_group = 'bot')
>          AND user_name not in (SELECT REPLACE(log_title,"_"," ") from
> enwiki_p.logging
>                  where log_type = "block" and log_action = "block"
>                  and log_timestamp >  DATE_FORMAT(DATE_SUB(NOW(),INTERVAL 2
> DAY),'%Y%m%d%H%i%s'));
>
>
> You may fork from that query. There is R. Stuart Geiger (Staeiou)'s fork
> here https://quarry.wmflabs.org/query/34256 querying for month, - as
> another example.
>
>
>
> Finn Årup Nielsen
> http://people.compute.dtu.dk/faan/
>
>
> On 12/03/2019 19:18, Haifeng Zhang wrote:
>> Hi folks,
>>
>> My work needs to randomly sample new editors in each month, e.g., 100 editors per month.
>>
>> Do any of you have good suggestions for how to do this efficiently?
>>
>> I could think of using the dump files, but wonder are there other options?
>>
>>
>> Thanks,
>>
>> Haifeng Zhang
>> _______________________________________________
>> Wiki-research-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Sampling new editors in English Wikipedia

Stuart A. Yeates
On Thu, 14 Mar 2019 at 09:16, Haifeng Zhang <[hidden email]> wrote:
>
> Thanks a lot for help, Finn. Now my query can draw sample of new registered editors.

To repeat a point I made earlier in the thread: this query deals with
accounts not editors. Many at the coalface consider this to be a very
important difference. You appear not to have shared enough of your
research project for us to tell whether it's going to matter for you.

cheers
stuart

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Sampling new editors in English Wikipedia

Haifeng Zhang
Stuart,

I'm building an agent-based simulation of Wikipedia collaboration.

I would like my model to be empirically grounded, so I need to collect data for new editors.

Alternative accounts can be an issue, but I wonder is there a way to identify editors who have multiple account?


Thanks,

Haifeng Zhang
________________________________
From: Wiki-research-l <[hidden email]> on behalf of Stuart A. Yeates <[hidden email]>
Sent: Wednesday, March 13, 2019 6:31:26 PM
To: Research into Wikimedia content and communities
Subject: Re: [Wiki-research-l] Sampling new editors in English Wikipedia

On Thu, 14 Mar 2019 at 09:16, Haifeng Zhang <[hidden email]> wrote:
>
> Thanks a lot for help, Finn. Now my query can draw sample of new registered editors.

To repeat a point I made earlier in the thread: this query deals with
accounts not editors. Many at the coalface consider this to be a very
important difference. You appear not to have shared enough of your
research project for us to tell whether it's going to matter for you.

cheers
stuart

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
12