"Regular contributor"

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

"Regular contributor"

erikzachte

> Statistics, with "Wikipedians", "active" and "very active users";

> like often, Zachte's Statistics are great, but easily misleading.

 

Also keep in mind that most figures in wikistats still include bot edits.

IMO it becomes more and more urgent to present separate counts for humans and bots.

 

For instance in eo: 54% of total edits for all time were bot edits, but most

of these will be from recent years, so the percentage will be even higher

for recent years.

 

http://stats.wikimedia.org/EN/BotActivityMatrix.htm

 

Erik Zachte

 


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
fn
Reply | Threaded
Open this post in threaded view
|

Re: "Regular contributor"

fn


Dear Erik,


On Wed, 22 Oct 2008, Erik Zachte wrote:

> [...]
>
> For instance in eo: 54% of total edits for all time were bot edits, but most
> of these will be from recent years, so the percentage will be even higher
> for recent years.
>
> http://stats.wikimedia.org/EN/BotActivityMatrix.htm

Interesting!

I wonder why there is a discrepancy between the summary for the total
number. "Sigma total edits" are 119M but "Sigma manual edits are higher:
193M. As far as I skimmed the figures are ok for the individual languages.


best regards
Finn

___________________________________________________________________

          Finn Aarup Nielsen, DTU Informatics, Denmark
  Lundbeck Foundation Center for Integrated Molecular Brain Imaging
    http://www.imm.dtu.dk/~fn/      http://nru.dk/staff/fnielsen/
___________________________________________________________________


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: "Regular contributor"

erikzachte
Finn, thanks for your attentiveness.

Figure 'Sigma total edits' (top left cell) was copied from an earlier
calculation, unlike the other totals, which were calculated while building
this table. But unlike this table the other table did not calculate monthly
totals for months where a major language (in casu English) was not yet
processed.
See http://stats.wikimedia.org/EN/TablesWikipediaZZ.htm and you get my
point.

So to be precise: 'Sigma total edits' is actually 'Sigma total edits for all
languages for which counts are available'.

Fixed report is online. Someday we will have figures for the English
Wikipedia, fingers crossed :)

Cheers, Erik

> -----Original Message-----
> From: [hidden email] [mailto:wiki-
> [hidden email]] On Behalf Of Finn Aarup Nielsen
> Sent: Thursday, October 23, 2008 13:12
> To: Research into Wikimedia content and communities
> Subject: Re: [Wiki-research-l] "Regular contributor"
>
>
>
> Dear Erik,
>
>
> On Wed, 22 Oct 2008, Erik Zachte wrote:
>
> > [...]
> >
> > For instance in eo: 54% of total edits for all time were bot edits,
> but most
> > of these will be from recent years, so the percentage will be even
> higher
> > for recent years.
> >
> > http://stats.wikimedia.org/EN/BotActivityMatrix.htm
>
> Interesting!
>
> I wonder why there is a discrepancy between the summary for the total
> number. "Sigma total edits" are 119M but "Sigma manual edits are
> higher:
> 193M. As far as I skimmed the figures are ok for the individual
> languages.
>
>
> best regards
> Finn
>
> ___________________________________________________________________
>
>           Finn Aarup Nielsen, DTU Informatics, Denmark
>   Lundbeck Foundation Center for Integrated Molecular Brain Imaging
>     http://www.imm.dtu.dk/~fn/      http://nru.dk/staff/fnielsen/
> ___________________________________________________________________
>
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: "Regular contributor"

Felipe Ortega
In reply to this post by erikzachte
Hi, Erik, and all.

IMHO, it would be a good idea...but not definitely an urgent one. In our analyses on the top-ten Wikipedias, we found that bots contributions introduced very few noise in data (to be precise statistically, it was not significant at all).

You also have the additional problem that some bots are not identified in the users_group table.

My "practical impression" is that when you deal with overall figures, then bots are irrelevant. However, if you want to focus in special metrics like concentration indexes then their contribution DOES MATTER, since a very active bot in one month may ruin your measurments.

Regards,

Felipe.


--- El mié, 22/10/08, Erik Zachte <[hidden email]> escribió:

> De: Erik Zachte <[hidden email]>
> Asunto: [Wiki-research-l] "Regular contributor"
> Para: [hidden email]
> Fecha: miércoles, 22 octubre, 2008 9:55
> > Statistics, with "Wikipedians",
> "active" and "very active users";
>
> > like often, Zachte's Statistics are great, but
> easily misleading.
>
>  
>
> Also keep in mind that most figures in wikistats still
> include bot edits.
>
> IMO it becomes more and more urgent to present separate
> counts for humans
> and bots.
>
>  
>
> For instance in eo: 54% of total edits for all time were
> bot edits, but most
>
> of these will be from recent years, so the percentage will
> be even higher
>
> for recent years.
>
>  
>
> http://stats.wikimedia.org/EN/BotActivityMatrix.htm
>
>  
>
> Erik Zachte
>
>  
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


     

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: "Regular contributor"

Ziko van Dijk
Hello Felipe,

Maybe we speak about different things now. At
http://stats.wikimedia.org/EN/BotActivityMatrix.htm

de ja fr it pl es nl pt ru zh sv fi
8%6%22%25%26%15%29% 30%26%15%23%22%

The bot share of all edits is not that insignificant.

Ziko


2008/11/13 Felipe Ortega <[hidden email]>
Hi, Erik, and all.

IMHO, it would be a good idea...but not definitely an urgent one. In our analyses on the top-ten Wikipedias, we found that bots contributions introduced very few noise in data (to be precise statistically, it was not significant at all).

You also have the additional problem that some bots are not identified in the users_group table.

My "practical impression" is that when you deal with overall figures, then bots are irrelevant. However, if you want to focus in special metrics like concentration indexes then their contribution DOES MATTER, since a very active bot in one month may ruin your measurments.

Regards,

Felipe.


--- El mié, 22/10/08, Erik Zachte <[hidden email]> escribió:

> De: Erik Zachte <[hidden email]>
> Asunto: [Wiki-research-l] "Regular contributor"
> Para: [hidden email]
> Fecha: miércoles, 22 octubre, 2008 9:55
> > Statistics, with "Wikipedians",
> "active" and "very active users";
>
> > like often, Zachte's Statistics are great, but
> easily misleading.
>
>
>
> Also keep in mind that most figures in wikistats still
> include bot edits.
>
> IMO it becomes more and more urgent to present separate
> counts for humans
> and bots.
>
>
>
> For instance in eo: 54% of total edits for all time were
> bot edits, but most
>
> of these will be from recent years, so the percentage will
> be even higher
>
> for recent years.
>
>
>
> http://stats.wikimedia.org/EN/BotActivityMatrix.htm
>
>
>
> Erik Zachte
>
>
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l




_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



--
Ziko van Dijk
NL-Silvolde

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: "Regular contributor"

erikzachte

Hi Felipe,

 

I can’t follow your reasoning how bots are insignificant.

Just as  Ziko pointed out, the matrix of bot contributions (and our general experience) tells otherwise.

On larger wikipedias bots account for 5-30% of edits on smaller wikis anything up to 50-70% or even more in rare cases.

 

Think of the bots that add interwiki links as primary example of activities that account for massive amount of edits.

These may be insignificant on popular articles with 1000’s of edits, but most articles have very few edits, ‘the long tail’ one might call it and there it adds up.

 

Cheers, Erik

 

 

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Ziko van Dijk
Sent: Thursday, November 13, 2008 23:37
To: [hidden email]; Research into Wikimedia content and communities
Subject: Re: [Wiki-research-l] "Regular contributor"

 

Hello Felipe,

Maybe we speak about different things now. At
http://stats.wikimedia.org/EN/BotActivityMatrix.htm

de

ja

fr

it

pl

es

nl

pt

ru

zh

sv

fi

 

8%

6%

22%

25%

26%

15%

29%

30%

26%

15%

23%

22%


The bot share of all edits is not that insignificant.

Ziko

2008/11/13 Felipe Ortega <[hidden email]>

Hi, Erik, and all.

IMHO, it would be a good idea...but not definitely an urgent one. In our analyses on the top-ten Wikipedias, we found that bots contributions introduced very few noise in data (to be precise statistically, it was not significant at all).

You also have the additional problem that some bots are not identified in the users_group table.

My "practical impression" is that when you deal with overall figures, then bots are irrelevant. However, if you want to focus in special metrics like concentration indexes then their contribution DOES MATTER, since a very active bot in one month may ruin your measurments.

Regards,

Felipe.


--- El mié, 22/10/08, Erik Zachte <[hidden email]> escribió:

> De: Erik Zachte <[hidden email]>
> Asunto: [Wiki-research-l] "Regular contributor"
> Para: [hidden email]
> Fecha: miércoles, 22 octubre, 2008 9:55

> > Statistics, with "Wikipedians",
> "active" and "very active users";
>
> > like often, Zachte's Statistics are great, but
> easily misleading.
>
>
>
> Also keep in mind that most figures in wikistats still
> include bot edits.
>
> IMO it becomes more and more urgent to present separate
> counts for humans
> and bots.
>
>
>
> For instance in eo: 54% of total edits for all time were
> bot edits, but most
>
> of these will be from recent years, so the percentage will
> be even higher
>
> for recent years.
>
>
>
> http://stats.wikimedia.org/EN/BotActivityMatrix.htm
>
>
>
> Erik Zachte
>
>
>

> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l




_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l




--
Ziko van Dijk
NL-Silvolde


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: "Regular contributor"

erikzachte
In reply to this post by Ziko van Dijk

Felipe, about you second argument, that not all bots are registered as such that (or not anymore, it may change): yes that is a problem.

I can only hope that really active bots are ‘caught’ and registered on large wikis.

 

Many bots that are active on many wikis are not registered as such on smaller wikis.

Therefore I treat any user name that is registered as bot on 10+ wikis as bot on all wikis.

It is of course again an correction which is not 100% accurate, but close I might hope.

Single User Logon can help in this respect some day.

 

In theory we could spot some bots by their behavior, say a user that edits 24 hours per day, of manages 5 updates per second for a long time, or added thousands of articles in a short period.

But I’m not sure it would be worth the effort, and it would low priority in any case.

 

Erik

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Ziko van Dijk
Sent: Thursday, November 13, 2008 23:37
To: [hidden email]; Research into Wikimedia content and communities
Subject: Re: [Wiki-research-l] "Regular contributor"

 

Hello Felipe,

Maybe we speak about different things now. At
http://stats.wikimedia.org/EN/BotActivityMatrix.htm

de

ja

fr

it

pl

es

nl

pt

ru

zh

sv

fi

 

8%

6%

22%

25%

26%

15%

29%

30%

26%

15%

23%

22%


The bot share of all edits is not that insignificant.

Ziko

2008/11/13 Felipe Ortega <[hidden email]>

Hi, Erik, and all.

IMHO, it would be a good idea...but not definitely an urgent one. In our analyses on the top-ten Wikipedias, we found that bots contributions introduced very few noise in data (to be precise statistically, it was not significant at all).

You also have the additional problem that some bots are not identified in the users_group table.

My "practical impression" is that when you deal with overall figures, then bots are irrelevant. However, if you want to focus in special metrics like concentration indexes then their contribution DOES MATTER, since a very active bot in one month may ruin your measurments.

Regards,

Felipe.


--- El mié, 22/10/08, Erik Zachte <[hidden email]> escribió:

> De: Erik Zachte <[hidden email]>
> Asunto: [Wiki-research-l] "Regular contributor"
> Para: [hidden email]
> Fecha: miércoles, 22 octubre, 2008 9:55

> > Statistics, with "Wikipedians",
> "active" and "very active users";
>
> > like often, Zachte's Statistics are great, but
> easily misleading.
>
>
>
> Also keep in mind that most figures in wikistats still
> include bot edits.
>
> IMO it becomes more and more urgent to present separate
> counts for humans
> and bots.
>
>
>
> For instance in eo: 54% of total edits for all time were
> bot edits, but most
>
> of these will be from recent years, so the percentage will
> be even higher
>
> for recent years.
>
>
>
> http://stats.wikimedia.org/EN/BotActivityMatrix.htm
>
>
>
> Erik Zachte
>
>
>

> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l




_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l




--
Ziko van Dijk
NL-Silvolde


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: "Regular contributor"

Felipe Ortega
In reply to this post by erikzachte
--- El vie, 14/11/08, Erik Zachte <[hidden email]> escribió:

> De: Erik Zachte <[hidden email]>
> Asunto: RE: [Wiki-research-l] "Regular contributor"
> Para: "'Research into Wikimedia content and communities'" <[hidden email]>, [hidden email]
> Fecha: viernes, 14 noviembre, 2008 2:29
> Hi Felipe,
>
>  
>
> I can’t follow your reasoning how bots are insignificant.
>
> Just as  Ziko pointed out, the matrix of bot contributions
> (and our general
> experience) tells otherwise.
>
> On larger wikipedias bots account for 5-30% of edits on
> smaller wikis
> anything up to 50-70% or even more in rare cases.
>
>

Mmmm, then we have something really strange going on here. I thought I had a graph of the evolution of bots edits share with respect to the total number of edits by month, but I think I have to generate it again. However, my "impression" looking at temporal tables and results was not that high.

Actually, I'm not the only one who stated that. Nikki Kittur, in another good paper:
http://www.parc.com/research/publications/files/5904.pdf

Pointed out the same, though for enwiki (and we haven't got figures to compare that).

All in all, I think this does not affect our results or model since, as a bare minimum, I always add a "where rev_user not in (select ug_user from user_groups where ug_group='bot')" in my base queries.

I will try to post a graph soon to have quantitative arguments, rather than mere "impressions". Perhaps I'm missing something, but if so, I could not say, right now, what.
>
> Think of the bots that add interwiki links as primary
> example of activities
> that account for massive amount of edits.
>

That's precisely why I was quite suprised/concerned about my findings. They are counterintuitive.

> These may be insignificant on popular articles with
> 1000’s of edits, but
> most articles have very few edits, ‘the long tail’ one
> might call it and
> there it adds up.
>

Yep, dead right. Just right now, I'm not concentrating on "per article" statistics but "per user" ones.

Best,

F.
 

>  
>
> Cheers, Erik
>
>  
>
>  
>
>  
>
> From: [hidden email]
> [mailto:[hidden email]] On
> Behalf Of Ziko van
> Dijk
> Sent: Thursday, November 13, 2008 23:37
> To: [hidden email]; Research into Wikimedia
> content and
> communities
> Subject: Re: [Wiki-research-l] "Regular
> contributor"
>
>  
>
> Hello Felipe,
>
> Maybe we speak about different things now. At
> http://stats.wikimedia.org/EN/BotActivityMatrix.htm
>
>
> de
> <http://stats.wikimedia.org/EN/TablesWikipediaDE.htm>
>
> ja
> <http://stats.wikimedia.org/EN/TablesWikipediaJA.htm>
>
> fr
> <http://stats.wikimedia.org/EN/TablesWikipediaFR.htm>
>
> it
> <http://stats.wikimedia.org/EN/TablesWikipediaIT.htm>
>
> pl
> <http://stats.wikimedia.org/EN/TablesWikipediaPL.htm>
>
> es
> <http://stats.wikimedia.org/EN/TablesWikipediaES.htm>
>
> nl
> <http://stats.wikimedia.org/EN/TablesWikipediaNL.htm>
>
> pt
> <http://stats.wikimedia.org/EN/TablesWikipediaPT.htm>
>
> ru
> <http://stats.wikimedia.org/EN/TablesWikipediaRU.htm>
>
> zh
> <http://stats.wikimedia.org/EN/TablesWikipediaZH.htm>
>
> sv
> <http://stats.wikimedia.org/EN/TablesWikipediaSV.htm>
>
> fi
> <http://stats.wikimedia.org/EN/TablesWikipediaFI.htm>
>
>  
>
>
> 8%
>
> 6%
>
> 22%
>
> 25%
>
> 26%
>
> 15%
>
> 29%
>
> 30%
>
> 26%
>
> 15%
>
> 23%
>
> 22%
>
>
> The bot share of all edits is not that insignificant.
>
> Ziko
>
>
>
> 2008/11/13 Felipe Ortega <[hidden email]>
>
> Hi, Erik, and all.
>
> IMHO, it would be a good idea...but not definitely an
> urgent one. In our
> analyses on the top-ten Wikipedias, we found that bots
> contributions
> introduced very few noise in data (to be precise
> statistically, it was not
> significant at all).
>
> You also have the additional problem that some bots are not
> identified in
> the users_group table.
>
> My "practical impression" is that when you deal
> with overall figures, then
> bots are irrelevant. However, if you want to focus in
> special metrics like
> concentration indexes then their contribution DOES MATTER,
> since a very
> active bot in one month may ruin your measurments.
>
> Regards,
>
> Felipe.
>
>
> --- El mié, 22/10/08, Erik Zachte
> <[hidden email]> escribió:
>
> > De: Erik Zachte <[hidden email]>
> > Asunto: [Wiki-research-l] "Regular
> contributor"
> > Para: [hidden email]
> > Fecha: miércoles, 22 octubre, 2008 9:55
>
> > > Statistics, with "Wikipedians",
> > "active" and "very active users";
> >
> > > like often, Zachte's Statistics are great,
> but
> > easily misleading.
> >
> >
> >
> > Also keep in mind that most figures in wikistats still
> > include bot edits.
> >
> > IMO it becomes more and more urgent to present
> separate
> > counts for humans
> > and bots.
> >
> >
> >
> > For instance in eo: 54% of total edits for all time
> were
> > bot edits, but most
> >
> > of these will be from recent years, so the percentage
> will
> > be even higher
> >
> > for recent years.
> >
> >
> >
> > http://stats.wikimedia.org/EN/BotActivityMatrix.htm
> >
> >
> >
> > Erik Zachte
> >
> >
> >
>
> > _______________________________________________
> > Wiki-research-l mailing list
> > [hidden email]
> >
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
>
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
>
>
> --
> Ziko van Dijk
> NL-Silvolde


     

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: "Regular contributor"

Felipe Ortega
In reply to this post by erikzachte
--- El vie, 14/11/08, Erik Zachte <[hidden email]> escribió:

> De: Erik Zachte <[hidden email]>
> Asunto: RE: [Wiki-research-l] "Regular contributor"
> Para: "'Research into Wikimedia content and communities'" <[hidden email]>, [hidden email]
> Fecha: viernes, 14 noviembre, 2008 2:40
>
> Many bots that are active on many wikis are not registered
> as such on
> smaller wikis.
>
> Therefore I treat any user name that is registered as bot
> on 10+ wikis as
> bot on all wikis.
>

Seems very reasonable :).

> It is of course again an correction which is not 100%
> accurate, but close I
> might hope.
>

Paraphrasing one of my research colleagues: it's better
something than nothing at all :).

> Single User Logon can help in this respect some day.
>

Wow, man. That would let my model jump to the speedlight.
If only I were capable of tracing users among different
languages...

>  
>
> In theory we could spot some bots by their behavior, say a
> user that edits
> 24 hours per day, of manages 5 updates per second for a
> long time, or added
> thousands of articles in a short period.
>
> But I’m not sure it would be worth the effort, and it
> would low priority in
> any case.

I also have my doubts about the filtering conditions. For
instance, in eswiki, 'BOTpolicia' is not registered as such
and it's responsible for more than 90.000 edits, so far. On
the other hand, a famous user in eswiki (retired for this
moment, id=13770 to be precise) is responsible for
100.000 edits, and was erroneously identified as a
bot many times :). We have similar cases in other
languages.

Filtering by number of edits/hour or similar may require
a lot of time/resources, specially in larger Wikipedias,
(sorry, but for my thesis I'm mainly focused on the top-ten
Wikipedias :) ).

Honestly, I don't have a good answer for this right now.

Best.

F.

>
>  
>
> Erik
>
>  
>
> From: [hidden email]
> [mailto:[hidden email]] On
> Behalf Of Ziko van
> Dijk
> Sent: Thursday, November 13, 2008 23:37
> To: [hidden email]; Research into Wikimedia
> content and
> communities
> Subject: Re: [Wiki-research-l] "Regular
> contributor"
>
>  
>
> Hello Felipe,
>
> Maybe we speak about different things now. At
> http://stats.wikimedia.org/EN/BotActivityMatrix.htm
>
>
> de
> <http://stats.wikimedia.org/EN/TablesWikipediaDE.htm>
>
> ja
> <http://stats.wikimedia.org/EN/TablesWikipediaJA.htm>
>
> fr
> <http://stats.wikimedia.org/EN/TablesWikipediaFR.htm>
>
> it
> <http://stats.wikimedia.org/EN/TablesWikipediaIT.htm>
>
> pl
> <http://stats.wikimedia.org/EN/TablesWikipediaPL.htm>
>
> es
> <http://stats.wikimedia.org/EN/TablesWikipediaES.htm>
>
> nl
> <http://stats.wikimedia.org/EN/TablesWikipediaNL.htm>
>
> pt
> <http://stats.wikimedia.org/EN/TablesWikipediaPT.htm>
>
> ru
> <http://stats.wikimedia.org/EN/TablesWikipediaRU.htm>
>
> zh
> <http://stats.wikimedia.org/EN/TablesWikipediaZH.htm>
>
> sv
> <http://stats.wikimedia.org/EN/TablesWikipediaSV.htm>
>
> fi
> <http://stats.wikimedia.org/EN/TablesWikipediaFI.htm>
>
>  
>
>
> 8%
>
> 6%
>
> 22%
>
> 25%
>
> 26%
>
> 15%
>
> 29%
>
> 30%
>
> 26%
>
> 15%
>
> 23%
>
> 22%
>
>
> The bot share of all edits is not that insignificant.
>
> Ziko
>
>
>
> 2008/11/13 Felipe Ortega <[hidden email]>
>
> Hi, Erik, and all.
>
> IMHO, it would be a good idea...but not definitely an
> urgent one. In our
> analyses on the top-ten Wikipedias, we found that bots
> contributions
> introduced very few noise in data (to be precise
> statistically, it was not
> significant at all).
>
> You also have the additional problem that some bots are not
> identified in
> the users_group table.
>
> My "practical impression" is that when you deal
> with overall figures, then
> bots are irrelevant. However, if you want to focus in
> special metrics like
> concentration indexes then their contribution DOES MATTER,
> since a very
> active bot in one month may ruin your measurments.
>
> Regards,
>
> Felipe.
>
>
> --- El mié, 22/10/08, Erik Zachte
> <[hidden email]> escribió:
>
> > De: Erik Zachte <[hidden email]>
> > Asunto: [Wiki-research-l] "Regular
> contributor"
> > Para: [hidden email]
> > Fecha: miércoles, 22 octubre, 2008 9:55
>
> > > Statistics, with "Wikipedians",
> > "active" and "very active users";
> >
> > > like often, Zachte's Statistics are great,
> but
> > easily misleading.
> >
> >
> >
> > Also keep in mind that most figures in wikistats still
> > include bot edits.
> >
> > IMO it becomes more and more urgent to present
> separate
> > counts for humans
> > and bots.
> >
> >
> >
> > For instance in eo: 54% of total edits for all time
> were
> > bot edits, but most
> >
> > of these will be from recent years, so the percentage
> will
> > be even higher
> >
> > for recent years.
> >
> >
> >
> > http://stats.wikimedia.org/EN/BotActivityMatrix.htm
> >
> >
> >
> > Erik Zachte
> >
> >
> >
>
> > _______________________________________________
> > Wiki-research-l mailing list
> > [hidden email]
> >
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
>
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
>
>
> --
> Ziko van Dijk
> NL-Silvolde


     

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Share of monthly bot edits

Felipe Ortega
In reply to this post by Felipe Ortega
Hi all.

Maybe tomorrow, if I'm not too busy, I will have a detailed graph of what's going on, at least in the top-10 (or top-20) editions.

In the meantime, some basic spells on MySQL show this for eswiki:
http://pastebin.com/m19859c24

Where perc_logged_revs shows % of bots edits over total number of logged users edits (by month) and perc_revs shows % of bots edits over total number of revs (by month, including annonymous users).

The magic of numbers is that they are (most of times) completely objective. Erik, your numbers seems to be fine, also looking at the monthly trends. Maybe my impression (and other past graphs) are only valid for enwiki, but I begin to have my doubts looking at these results. Tomorrow, we will see.

All the same, we all agree in that bots should be filtered out, but it was a curious question for me.

Best,

F.


     

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Share of monthly bot edits

Felipe Ortega
Done, link:

http://meta.wikimedia.org/wiki/WikiXRay#Share_of_bots_edits

As you can see, enwiki and jawiki have much lower rates of bot edits than other language editions. Perhaps it deserves some attention, and for sure, I will metion it in my thesis :).

Thanks to Erik and Ziko for pointing this out.

Best,

F.


--- El sáb, 15/11/08, Felipe Ortega <[hidden email]> escribió:

> De: Felipe Ortega <[hidden email]>
> Asunto: [Wiki-research-l] Share of monthly bot edits
> Para: "'Research into Wikimedia content and communities'" <[hidden email]>, "Erik Zachte" <[hidden email]>
> Fecha: sábado, 15 noviembre, 2008 8:49
> Hi all.
>
> Maybe tomorrow, if I'm not too busy, I will have a
> detailed graph of what's going on, at least in the
> top-10 (or top-20) editions.
>
> In the meantime, some basic spells on MySQL show this for
> eswiki:
> http://pastebin.com/m19859c24
>
> Where perc_logged_revs shows % of bots edits over total
> number of logged users edits (by month) and perc_revs shows
> % of bots edits over total number of revs (by month,
> including annonymous users).
>
> The magic of numbers is that they are (most of times)
> completely objective. Erik, your numbers seems to be fine,
> also looking at the monthly trends. Maybe my impression (and
> other past graphs) are only valid for enwiki, but I begin to
> have my doubts looking at these results. Tomorrow, we will
> see.
>
> All the same, we all agree in that bots should be filtered
> out, but it was a curious question for me.
>
> Best,
>
> F.
>
>
>      
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


     

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: "Regular contributor"

Platonides
In reply to this post by Felipe Ortega
Felipe Ortega wrote:
> I also have my doubts about the filtering conditions. For
> instance, in eswiki, 'BOTpolicia' is not registered as such
> and it's responsible for more than 90.000 edits, so far. On
> the other hand, a famous user in eswiki (retired for this
> moment, id=13770 to be precise)

He has returned, ~500 edits this week ;)


> Filtering by number of edits/hour or similar may require
> a lot of time/resources, specially in larger Wikipedias,
> (sorry, but for my thesis I'm mainly focused on the top-ten
> Wikipedias :) ).

The problem is that here you need the edits *per user*, not per page.
I understand from the WikiXRay page that you're recreating the mediawiki
tables. It'd just to query each user contributions and check the time
difference.
With indexes in place, you would get a time good enough.

When it may get terribly slow is if applying to all users, as you would
make the algorithm quadratic.


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: "Regular contributor"

Felipe Ortega



--- El lun, 17/11/08, Platonides <[hidden email]> escribió:

> De: Platonides <[hidden email]>
> Asunto: Re: [Wiki-research-l] "Regular contributor"
> Para: [hidden email]
> Fecha: lunes, 17 noviembre, 2008 9:42
> Felipe Ortega wrote:
> > I also have my doubts about the filtering conditions.
> For
> > instance, in eswiki, 'BOTpolicia' is not
> registered as such
> > and it's responsible for more than 90.000 edits,
> so far. On
> > the other hand, a famous user in eswiki (retired for
> this
> > moment, id=13770 to be precise)
>
> He has returned, ~500 edits this week ;)
>

Wow, this is getting interesting :D

>
> > Filtering by number of edits/hour or similar may
> require
> > a lot of time/resources, specially in larger
> Wikipedias,
> > (sorry, but for my thesis I'm mainly focused on
> the top-ten
> > Wikipedias :) ).
>
> The problem is that here you need the edits *per user*, not
> per page.
> I understand from the WikiXRay page that you're
> recreating the mediawiki
> tables.

Yeap, but only as an initial stage. Then I create some new
intermediate tables to speed up the data mining.

It'd just to query each user contributions and
> check the time
> difference.
> With indexes in place, you would get a time good enough.
>
> When it may get terribly slow is if applying to all users,
> as you would
> make the algorithm quadratic.

I agree, but then, we still would need some basic criteria to decide which
users to probe to identify hidden bots. I suppose a good starting
point would be looking for BOT patterns in the name ¿? Mmmm, or
perhaps directly with the number of revisions.

I will try to have a closer look at this after the thesis
(I need to plan my next "entertainments" :) ).

Cheers,

F.

>
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


     

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l