¿Model to automatically classify if one user is bot or not?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

¿Model to automatically classify if one user is bot or not?

ABEL SERRANO JUSTE
Hello fellow wiki investigators!

I have observed that, very often in wikis, users not in the bot groups are
actually behaving like bots. Since the mediawiki api doesn't restrict
normal users to automatize tasks through its API, you might have a "normal"
user, actually doing bot things. I would like to identify those and
consider them as bots.

Is anyone aware if there's any implemented model already to classify
whether an user is a bot or not?

Thanks and nice weekend!

--
Saludos,
Abel.
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: ¿Model to automatically classify if one user is bot or not?

Physikerwelt
Hi Abel,

I think you need a third category cyborg ;-)

More seriously, there is research on identifying contributor types.
See our review in http://wikiworkshop.org/2017/papers/p1627-dahm.pdf
section 2.2.3 on this topic.
For example, Ron Meier is an account that makes use of scripts
excessively. However, up on manual investigation, we got the
impression that this was an actual human.

However, I have not looked at the literature on bot and vandalism
detection recently. That's probably a good starting point.

All the best
physikerwelt

On Sat, Jan 19, 2019 at 11:25 AM ABEL SERRANO JUSTE <[hidden email]> wrote:

>
> Hello fellow wiki investigators!
>
> I have observed that, very often in wikis, users not in the bot groups are
> actually behaving like bots. Since the mediawiki api doesn't restrict
> normal users to automatize tasks through its API, you might have a "normal"
> user, actually doing bot things. I would like to identify those and
> consider them as bots.
>
> Is anyone aware if there's any implemented model already to classify
> whether an user is a bot or not?
>
> Thanks and nice weekend!
>
> --
> Saludos,
> Abel.
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: ¿Model to automatically classify if one user is bot or not?

WereSpielChequers-2
In reply to this post by ABEL SERRANO JUSTE
Aside from the sensitivities of this, and yes if there wasn't any doubt
calling an editor a bot is not something one should do lightly, it isn't an
easy thing to either define or identify. Doing bot edits from a non bot
account is a big deal on Wikipedia, I have seen an admin desysopped and
then blocked for this. Please be aware that labelling goodfaith non bot
editors as bots is unethical and liable to cause another clash between the
community and researchers..

Edits per minute might at first glance look like a safe way to go, but then
you realise that some people will spend a long time manually building up to
a situation where they click a button and that completes dozens of edits
almost simultaneously.

Type of edit and similarity of a series of edits might look like a good way
to go, but what you will have difficulty identifying is that the person who
seems to be making a series of edits without individual consideration may
be working their way through a list of possible edits and clicking save or
skip on each of them as a manual decision. Judging the results from the
edits saved without knowing what led up to saving those edits won't tell
you if an edit was a bot edit.

What you can do is look for dormant accounts that are no longer flagged as
bots. On the English language Wikipedia we have a list of them at
https://en.wikipedia.org/wiki/Wikipedia:List_of_Wikipedians_by_number_of_edits/Unflagged_bots
other language versions may have similar lists and are likely to have the
same process of removing bot flags from bot accounts that retire.

Regards

Jonathan

On Sat, 19 Jan 2019 at 10:24, ABEL SERRANO JUSTE <[hidden email]> wrote:

> Hello fellow wiki investigators!
>
> I have observed that, very often in wikis, users not in the bot groups are
> actually behaving like bots. Since the mediawiki api doesn't restrict
> normal users to automatize tasks through its API, you might have a "normal"
> user, actually doing bot things. I would like to identify those and
> consider them as bots.
>
> Is anyone aware if there's any implemented model already to classify
> whether an user is a bot or not?
>
> Thanks and nice weekend!
>
> --
> Saludos,
> Abel.
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: ¿Model to automatically classify if one user is bot or not?

ABEL SERRANO JUSTE
I want to remove bot users from my research since they inject a lot of
noise on the data and do not represent human collaboration or community
actual status. The aim of the model would be to detect actual (or
mostly-behaving-as) bot users but not flagged as *'bot'* in the mediawiki
*bot* group; just to get rid them off from my analysis in this way, and it
would not meant to be used to label users within the mediawiki communities.

I came up with this question since I was studying the wiki:
https://cocktails.wikia.com and I found that, one of the most prolific
users is "IngredientSortBot" which, besides its name, has a history of
edits very characteristic for a bot user:
https://cocktails.wikia.com/wiki/Special:Contributions/IngredientSortBot;
but it's not included in any bot group and, because of that, it was
included in my analysis and thus, biasing it.

El sáb., 19 ene. 2019 a las 20:42, WereSpielChequers (<
[hidden email]>) escribió:

> Aside from the sensitivities of this, and yes if there wasn't any doubt
> calling an editor a bot is not something one should do lightly, it isn't an
> easy thing to either define or identify. Doing bot edits from a non bot
> account is a big deal on Wikipedia, I have seen an admin desysopped and
> then blocked for this. Please be aware that labelling goodfaith non bot
> editors as bots is unethical and liable to cause another clash between the
> community and researchers..
>
> Edits per minute might at first glance look like a safe way to go, but then
> you realise that some people will spend a long time manually building up to
> a situation where they click a button and that completes dozens of edits
> almost simultaneously.
>
> Type of edit and similarity of a series of edits might look like a good way
> to go, but what you will have difficulty identifying is that the person who
> seems to be making a series of edits without individual consideration may
> be working their way through a list of possible edits and clicking save or
> skip on each of them as a manual decision. Judging the results from the
> edits saved without knowing what led up to saving those edits won't tell
> you if an edit was a bot edit.
>
> What you can do is look for dormant accounts that are no longer flagged as
> bots. On the English language Wikipedia we have a list of them at
>
> https://en.wikipedia.org/wiki/Wikipedia:List_of_Wikipedians_by_number_of_edits/Unflagged_bots
> other language versions may have similar lists and are likely to have the
> same process of removing bot flags from bot accounts that retire.
>
> Regards
>
> Jonathan
>
> On Sat, 19 Jan 2019 at 10:24, ABEL SERRANO JUSTE <[hidden email]> wrote:
>
> > Hello fellow wiki investigators!
> >
> > I have observed that, very often in wikis, users not in the bot groups
> are
> > actually behaving like bots. Since the mediawiki api doesn't restrict
> > normal users to automatize tasks through its API, you might have a
> "normal"
> > user, actually doing bot things. I would like to identify those and
> > consider them as bots.
> >
> > Is anyone aware if there's any implemented model already to classify
> > whether an user is a bot or not?
> >
> > Thanks and nice weekend!
> >
> > --
> > Saludos,
> > Abel.
> > _______________________________________________
> > Wiki-research-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
--
Saludos,
Abel.
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: ¿Model to automatically classify if one user is bot or not?

WereSpielChequers-2
The most recent of  IngredientSortBot's 764 edits was in 2007, so if that
wiki has a bot flagging system the bot flag would have likely been removed
in the last decade. But if 764 edits makes them significant on that wiki I
doubt that wiki ever introduced bot flagging.

You can make the assumption that editors with names ending Bot are bots and
on English language  wikis you are pretty safe. If you made the assumption
that accounts ending bot were bots you would lose a bit, three of the 5,000
most active accounts on the English wikipedia are longstanding accounts
that include bot but were created before the rule about usernames ending
bot being reserved for bots.

If you want to filter out edits that *do not represent human collaboration
or community actual status *then you might also want to filter out, or
better give a low weighting to edits flagged as "minor". That feature is
heavily used on wikipedia.

Jonathan



On Fri, 25 Jan 2019 at 21:08, ABEL SERRANO JUSTE <[hidden email]> wrote:

> I want to remove bot users from my research since they inject a lot of
> noise on the data and do not represent human collaboration or community
> actual status. The aim of the model would be to detect actual (or
> mostly-behaving-as) bot users but not flagged as *'bot'* in the mediawiki
> *bot* group; just to get rid them off from my analysis in this way, and it
> would not meant to be used to label users within the mediawiki communities.
>
> I came up with this question since I was studying the wiki:
> https://cocktails.wikia.com and I found that, one of the most prolific
> users is "IngredientSortBot" which, besides its name, has a history of
> edits very characteristic for a bot user:
> https://cocktails.wikia.com/wiki/Special:Contributions/IngredientSortBot;
> but it's not included in any bot group and, because of that, it was
> included in my analysis and thus, biasing it.
>
> El sáb., 19 ene. 2019 a las 20:42, WereSpielChequers (<
> [hidden email]>) escribió:
>
> > Aside from the sensitivities of this, and yes if there wasn't any doubt
> > calling an editor a bot is not something one should do lightly, it isn't
> an
> > easy thing to either define or identify. Doing bot edits from a non bot
> > account is a big deal on Wikipedia, I have seen an admin desysopped and
> > then blocked for this. Please be aware that labelling goodfaith non bot
> > editors as bots is unethical and liable to cause another clash between
> the
> > community and researchers..
> >
> > Edits per minute might at first glance look like a safe way to go, but
> then
> > you realise that some people will spend a long time manually building up
> to
> > a situation where they click a button and that completes dozens of edits
> > almost simultaneously.
> >
> > Type of edit and similarity of a series of edits might look like a good
> way
> > to go, but what you will have difficulty identifying is that the person
> who
> > seems to be making a series of edits without individual consideration may
> > be working their way through a list of possible edits and clicking save
> or
> > skip on each of them as a manual decision. Judging the results from the
> > edits saved without knowing what led up to saving those edits won't tell
> > you if an edit was a bot edit.
> >
> > What you can do is look for dormant accounts that are no longer flagged
> as
> > bots. On the English language Wikipedia we have a list of them at
> >
> >
> https://en.wikipedia.org/wiki/Wikipedia:List_of_Wikipedians_by_number_of_edits/Unflagged_bots
> > other language versions may have similar lists and are likely to have the
> > same process of removing bot flags from bot accounts that retire.
> >
> > Regards
> >
> > Jonathan
> >
> > On Sat, 19 Jan 2019 at 10:24, ABEL SERRANO JUSTE <[hidden email]>
> wrote:
> >
> > > Hello fellow wiki investigators!
> > >
> > > I have observed that, very often in wikis, users not in the bot groups
> > are
> > > actually behaving like bots. Since the mediawiki api doesn't restrict
> > > normal users to automatize tasks through its API, you might have a
> > "normal"
> > > user, actually doing bot things. I would like to identify those and
> > > consider them as bots.
> > >
> > > Is anyone aware if there's any implemented model already to classify
> > > whether an user is a bot or not?
> > >
> > > Thanks and nice weekend!
> > >
> > > --
> > > Saludos,
> > > Abel.
> > > _______________________________________________
> > > Wiki-research-l mailing list
> > > [hidden email]
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > >
> > _______________________________________________
> > Wiki-research-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> --
> Saludos,
> Abel.
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: ¿Model to automatically classify if one user is bot or not?

ABEL SERRANO JUSTE
El vie., 25 ene. 2019 a las 22:24, WereSpielChequers (<
[hidden email]>) escribió:

> The most recent of  IngredientSortBot's 764 edits was in 2007, so if that
> wiki has a bot flagging system the bot flag would have likely been removed
> in the last decade. But if 764 edits makes them significant on that wiki I
> doubt that wiki ever introduced bot flagging.
>

What is that bot flagging system about? How does it work?
in Wikia there are users within the "bot" group or the "bot-global" group,
(see for instance: bots for Cocktails wiki
<https://cocktails.wikia.com/api.php?action=query&list=groupmembers&gmgroups=bot|bot-global&gmlimit=500>
),
and these have the capabilities corresponding to bots for mediawiki API,
but I don't know of any other flagging system :S


>
> You can make the assumption that editors with names ending Bot are bots and
> on English language  wikis you are pretty safe. If you made the assumption
> that accounts ending bot were bots you would lose a bit, three of the 5,000
> most active accounts on the English wikipedia are longstanding accounts
> that include bot but were created before the rule about usernames ending
> bot being reserved for bots.
>

Not really, I just successfully created an account ending in "bot". I found
this criteria to filter out bots quite naive and not accurate, also it does
not consider non-flagged "bots" without the substring "bot" in their name.


>
> If you want to filter out edits that *do not represent human collaboration
> or community actual status *then you might also want to filter out, or
> better give a low weighting to edits flagged as "minor". That feature is
> heavily used on wikipedia.
>

Hum, I am not sure how popular is that feature used in Wikia. It might
depend much on the experience or policy of every specific wiki and, for me,
a minor edit could be still be a indicator of human collaboration, so I
will rather leave them in.


>
> Jonathan
>

Thank you all for your answers!
--
Saludos,
Abel.
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l