sockpuppets and how to find them sooner

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

sockpuppets and how to find them sooner

Kerry Raymond
Currently, to open a sockpuppet investigation, you must name the two (or
more) accounts that you believe to be sockpuppets with "clear, behavioural
evidence of sock puppetry" which is typically in the form of pairs of edits
that demonstrate similar edit behaviours that are unlikely to naturally
occur. Now if you spend enough time on-wiki, you develop an intuition about
behaviours you see on your watchlist and in article edit histories. Often I
am highly suspicious that an account is a sockpuppet, but I cannot report
them because I don't know which other account is involved.

 

As a example, I recently encounted User:Shelati an account about 1 day old
at that time with nearly 100 edits in that day all about 1-2 minutes apart,
mostly making a similar change to a large number of Australian place
infoboxes.

 

https://en.wikipedia.org/w/index.php?title=Special:Contributions/Shelati
<https://en.wikipedia.org/w/index.php?title=Special:Contributions/Shelati&of
fset=20190728053057&limit=100&target=Shelati>
&offset=20190728053057&limit=100&target=Shelati

 

Genuine new users do not edit that quickly, do not use templates and do not
mess structurally with infoboxes (at most they try to change the values). It
"smelled" like a sockpuppet. However, as I did not recognise that pattern of
edit behaviour as being that of any other user I was familiar with, it
wasn't something I could report for sockpuppet investigation. Anyhow after
about 2 weeks, the user was blocked as a sockpuppet. Someone must have
noticed and figured out the other account:

 

https://en.wikipedia.org/wiki/Wikipedia:Sockpuppet_investigations/Meganesia/
Archive

 

Two weeks and 1,279 edits later . that's over 1000 possibly problematic
edits after I first suspected them. But that's nothing compared with another
ongoing situation in which a very large number of different IPs are engaged
in a pattern of problem edits on mostly Australian articles (a few different
types of edits but an obvious "quack like a duck" situation). The IP number
changes frequently (and one assumes deliberately). The edits potentially go
back to 2013 but appear to have intensified in 2018/2019. Here's one user's
summary of all the IP addresses involved, and the extent to which they have
been cleaned up, given many thousands of edits are involved, see:

 

https://en.wikipedia.org/wiki/User:IamNotU/History_cleanup

 

As well as the damage done to the content (which harms the readers), these
IP sockpuppets are consuming enormous amounts of effort to track them down
and revert them, which could be more productively used to improve the
content. We need better tools to foil these pests. So I want to put that
challenge out to this list.

 

Kerry

 

 

 

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: sockpuppets and how to find them sooner

Kerry Raymond
To reply to my own question .

 

Can we find a way to create a "signature" of an account's pattern of
editing? Perhaps it might be a set of signatures, maybe one for the
categories that the account appears to be active in, another for the type of
edit, etc. Then if these signatures were calculated for all banned accounts
or currently blocked accounts (or at least ones with a long enough
contribution history to make it worthwhile - we're not interested in
one-edit vandals), then we could have a tool that could be run to quickly
compare one account against the signatures of banned/blocked accounts as
well as the cumulative edits of a set of known sockpuppets (i.e. treat them
as a single account) to determine if this may be a sockpuppet case meriting
further investigation. I imagine that it would be too expensive
computationally to actually run comparisons of the contribution histories of
all "bad guy" accounts against the suspicious account which is why I propose
a "signature" approach (but I'm happy to be told otherwise).

 

If we had such a tool and it proves reasonably reliable in identifying
likely sockpuppets (not asking for guarantees but close enough not to be a
waste of time to investigate), then we could routinely use it on new
accounts or reactivating accounts (i.e. possible sleeper accounts) once they
have a long enough editing history to enable the tool to operate effectively
to provide automated early warning of new/reactivating accounts appearing
suspiciously similar to "bad guy" accounts.

 

But this is a hard problem, both technically and socially (Assume Good
Faith, Privacy, etc), so I welcome the thoughts of others.

 

Kerry





 

 

 

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: sockpuppets and how to find them sooner

RhinosF1 Wikipedia
In reply to this post by Kerry Raymond
Just a note that you can still go through warnings for vandalism etc. and
report to AIV.

Or at that edit speed, you may have a chance at AN at reporting for
bot-like edits which will draw attention to the account.

If you ever need help, things like #wikipedia-en-help on Freenode IRC exist
so you can ask other users.

RhinosF1
Miraheze Volunteer

On Fri, 23 Aug 2019 at 06:57, Kerry Raymond <[hidden email]> wrote:

> Currently, to open a sockpuppet investigation, you must name the two (or
> more) accounts that you believe to be sockpuppets with "clear, behavioural
> evidence of sock puppetry" which is typically in the form of pairs of edits
> that demonstrate similar edit behaviours that are unlikely to naturally
> occur. Now if you spend enough time on-wiki, you develop an intuition about
> behaviours you see on your watchlist and in article edit histories. Often I
> am highly suspicious that an account is a sockpuppet, but I cannot report
> them because I don't know which other account is involved.
>
>
>
> As a example, I recently encounted User:Shelati an account about 1 day old
> at that time with nearly 100 edits in that day all about 1-2 minutes apart,
> mostly making a similar change to a large number of Australian place
> infoboxes.
>
>
>
> https://en.wikipedia.org/w/index.php?title=Special:Contributions/Shelati
> <
> https://en.wikipedia.org/w/index.php?title=Special:Contributions/Shelati&of
> fset=20190728053057&limit=100&target=Shelati
> <https://en.wikipedia.org/w/index.php?title=Special:Contributions/Shelati&offset=20190728053057&limit=100&target=Shelati>
> >
> &offset=20190728053057&limit=100&target=Shelati
>
>
>
> Genuine new users do not edit that quickly, do not use templates and do not
> mess structurally with infoboxes (at most they try to change the values).
> It
> "smelled" like a sockpuppet. However, as I did not recognise that pattern
> of
> edit behaviour as being that of any other user I was familiar with, it
> wasn't something I could report for sockpuppet investigation. Anyhow after
> about 2 weeks, the user was blocked as a sockpuppet. Someone must have
> noticed and figured out the other account:
>
>
>
>
> https://en.wikipedia.org/wiki/Wikipedia:Sockpuppet_investigations/Meganesia/
> Archive
>
>
>
> Two weeks and 1,279 edits later . that's over 1000 possibly problematic
> edits after I first suspected them. But that's nothing compared with
> another
> ongoing situation in which a very large number of different IPs are engaged
> in a pattern of problem edits on mostly Australian articles (a few
> different
> types of edits but an obvious "quack like a duck" situation). The IP number
> changes frequently (and one assumes deliberately). The edits potentially go
> back to 2013 but appear to have intensified in 2018/2019. Here's one user's
> summary of all the IP addresses involved, and the extent to which they have
> been cleaned up, given many thousands of edits are involved, see:
>
>
>
> https://en.wikipedia.org/wiki/User:IamNotU/History_cleanup
>
>
>
> As well as the damage done to the content (which harms the readers), these
> IP sockpuppets are consuming enormous amounts of effort to track them down
> and revert them, which could be more productively used to improve the
> content. We need better tools to foil these pests. So I want to put that
> challenge out to this list.
>
>
>
> Kerry
>
>
>
>
>
>
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
--
RhinosF1
Miraheze Volunteer
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: sockpuppets and how to find them sooner

Timothy Wood
You are correct that in all but the most obvious cases, filing an SPI can
be exceptionally time consuming. I'm afraid there is no obvious technical
solution there that would not involve a complicated AI that is probably
beyond the ability of the foundation to produce.

There is quite a bit of data available in the form of years of SPIs, but it
seems like you're talking about Facebook or Google levels of machine
learning, and even years of SPIs is tiny compared to the amount of data
they work with.

On a separate note, frequently changing IP adresses is most often an
indicator of nothing more than someone who is editing on a mobile
connection. This can usually be easily verified with an online IP lookup.

V/r
TJW/GMG



On Fri, Aug 23, 2019, 02:44 RhinosF1 <[hidden email]> wrote:

> Just a note that you can still go through warnings for vandalism etc. and
> report to AIV.
>
> Or at that edit speed, you may have a chance at AN at reporting for
> bot-like edits which will draw attention to the account.
>
> If you ever need help, things like #wikipedia-en-help on Freenode IRC exist
> so you can ask other users.
>
> RhinosF1
> Miraheze Volunteer
>
> On Fri, 23 Aug 2019 at 06:57, Kerry Raymond <[hidden email]>
> wrote:
>
> > Currently, to open a sockpuppet investigation, you must name the two (or
> > more) accounts that you believe to be sockpuppets with "clear,
> behavioural
> > evidence of sock puppetry" which is typically in the form of pairs of
> edits
> > that demonstrate similar edit behaviours that are unlikely to naturally
> > occur. Now if you spend enough time on-wiki, you develop an intuition
> about
> > behaviours you see on your watchlist and in article edit histories.
> Often I
> > am highly suspicious that an account is a sockpuppet, but I cannot report
> > them because I don't know which other account is involved.
> >
> >
> >
> > As a example, I recently encounted User:Shelati an account about 1 day
> old
> > at that time with nearly 100 edits in that day all about 1-2 minutes
> apart,
> > mostly making a similar change to a large number of Australian place
> > infoboxes.
> >
> >
> >
> > https://en.wikipedia.org/w/index.php?title=Special:Contributions/Shelati
> > <
> >
> https://en.wikipedia.org/w/index.php?title=Special:Contributions/Shelati&of
> > fset=20190728053057&limit=100&target=Shelati
> > <
> https://en.wikipedia.org/w/index.php?title=Special:Contributions/Shelati&offset=20190728053057&limit=100&target=Shelati
> >
> > >
> > &offset=20190728053057&limit=100&target=Shelati
> >
> >
> >
> > Genuine new users do not edit that quickly, do not use templates and do
> not
> > mess structurally with infoboxes (at most they try to change the values).
> > It
> > "smelled" like a sockpuppet. However, as I did not recognise that pattern
> > of
> > edit behaviour as being that of any other user I was familiar with, it
> > wasn't something I could report for sockpuppet investigation. Anyhow
> after
> > about 2 weeks, the user was blocked as a sockpuppet. Someone must have
> > noticed and figured out the other account:
> >
> >
> >
> >
> >
> https://en.wikipedia.org/wiki/Wikipedia:Sockpuppet_investigations/Meganesia/
> > Archive
> >
> >
> >
> > Two weeks and 1,279 edits later . that's over 1000 possibly problematic
> > edits after I first suspected them. But that's nothing compared with
> > another
> > ongoing situation in which a very large number of different IPs are
> engaged
> > in a pattern of problem edits on mostly Australian articles (a few
> > different
> > types of edits but an obvious "quack like a duck" situation). The IP
> number
> > changes frequently (and one assumes deliberately). The edits potentially
> go
> > back to 2013 but appear to have intensified in 2018/2019. Here's one
> user's
> > summary of all the IP addresses involved, and the extent to which they
> have
> > been cleaned up, given many thousands of edits are involved, see:
> >
> >
> >
> > https://en.wikipedia.org/wiki/User:IamNotU/History_cleanup
> >
> >
> >
> > As well as the damage done to the content (which harms the readers),
> these
> > IP sockpuppets are consuming enormous amounts of effort to track them
> down
> > and revert them, which could be more productively used to improve the
> > content. We need better tools to foil these pests. So I want to put that
> > challenge out to this list.
> >
> >
> >
> > Kerry
> >
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > Wiki-research-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> --
> RhinosF1
> Miraheze Volunteer
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: sockpuppets and how to find them sooner

Aaron Halfaker-3
In reply to this post by Kerry Raymond
I think embeddings[1] would be a nice way to create a signature.
Essentially, we could dump data about a person's activities into it (words
added, namespaces edited, time of day of edits, temporal frequency of
editing, # of revisions per session, frequency of citation by type, etc.)
and get a signature that could represent several aspects of behavior.   The
vectors that come out of an embedding would allow us to provide a distance
measure between one editor's behavior and another editor's behavior.

That said, I think it is more likely that we would be able to match
behaviors that look more like experienced editors generally than one
specific editor who might be the primary account of the sock puppet.
Still, this might be useful for many aspects of newcomer support and
patrolling work.  E.g. if a new account looks like an experienced editor,
they might not need an invite to the Teahouse.  In fact such an editor
account may be a sock or a legitimate alternative account.  On the other
hand, if a newcomer account is getting reverted or warned a lot but doesn't
behave like an advertiser or a POV-pusher, we probably want to reach out to
them to help.

I'm really interested in investing in embedding-based strategies for
tracking the topic-space of content and clustering behaviors but I don't
have the resources on the Scoring Platform team[2] to do any sort of
serious engineering work with embeddings right now.  In the meantime, I'm
interested in talking to external researchers about collaborations and
possibly even short term contracts to dig into these types of modeling
problems.  If anyone out there is interested in that, please reach out.

In the meantime, we're working on more rudimentary AIs that can help us
sort vandals from everyone else. :)

1. https://en.wikipedia.org/wiki/Embedding
2. https://www.mediawiki.org/wiki/Wikimedia_Scoring_Platform_team

-Aaron

On Fri, Aug 23, 2019 at 12:27 AM Kerry Raymond <[hidden email]>
wrote:

> To reply to my own question .
>
>
>
> Can we find a way to create a "signature" of an account's pattern of
> editing? Perhaps it might be a set of signatures, maybe one for the
> categories that the account appears to be active in, another for the type
> of
> edit, etc. Then if these signatures were calculated for all banned accounts
> or currently blocked accounts (or at least ones with a long enough
> contribution history to make it worthwhile - we're not interested in
> one-edit vandals), then we could have a tool that could be run to quickly
> compare one account against the signatures of banned/blocked accounts as
> well as the cumulative edits of a set of known sockpuppets (i.e. treat them
> as a single account) to determine if this may be a sockpuppet case meriting
> further investigation. I imagine that it would be too expensive
> computationally to actually run comparisons of the contribution histories
> of
> all "bad guy" accounts against the suspicious account which is why I
> propose
> a "signature" approach (but I'm happy to be told otherwise).
>
>
>
> If we had such a tool and it proves reasonably reliable in identifying
> likely sockpuppets (not asking for guarantees but close enough not to be a
> waste of time to investigate), then we could routinely use it on new
> accounts or reactivating accounts (i.e. possible sleeper accounts) once
> they
> have a long enough editing history to enable the tool to operate
> effectively
> to provide automated early warning of new/reactivating accounts appearing
> suspiciously similar to "bad guy" accounts.
>
>
>
> But this is a hard problem, both technically and socially (Assume Good
> Faith, Privacy, etc), so I welcome the thoughts of others.
>
>
>
> Kerry
>
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>


--

Aaron Halfaker

Principal Research Scientist

Head of the Scoring Platform team
Wikimedia Foundation
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: sockpuppets and how to find them sooner

Kerry Raymond
In reply to this post by Timothy Wood
That's why I think we need "signatures" which is my shorthand for things like a hash function or a bounding box, a means by which many non-matching accounts can be eliminated at low cost, reserving the high cost comparisons (machine or human) only for high probability candidates. It is machine-computed and *stored* on the banning/blocking of a user. When a suspect user is presented, it calculates their signature and then compares them against the pre-calculated signatures of the bad users. I don't think it is too expensive if we can find the right "signature". CPU cycles are pretty fast. I only have an average laptop CPU-wise but I burn through loads of comparisons of geographic boundaries (complex polygons with many points) thanks to bounding boxes which reduce the complex shape to the smallest rectangle that contains it. Testing intersection of polygons is expensive, testing the intersection of rectangles is trivial.

I think we can probably ignore the myriad of trivial bad guys for the purposes of signature collecting, eg blocked for vandalism after their first few edits. Sock puppets or their masters don't immediately appear as bad guys on individual edits. It's often more about long-term behaviours like POV pushing, refusal to engage in consensus building, slow burning edit wars, etc, that does not show on individual edits.

Kerry

Sent from my iPad

> On 23 Aug 2019, at 11:42 pm, Timothy Wood <[hidden email]> wrote:
>
> You are correct that in all but the most obvious cases, filing an SPI can be exceptionally time consuming. I'm afraid there is no obvious technical solution there that would not involve a complicated AI that is probably beyond the ability of the foundation to produce.
>
> There is quite a bit of data available in the form of years of SPIs, but it seems like you're talking about Facebook or Google levels of machine learning, and even years of SPIs is tiny compared to the amount of data they work with.
>
> On a separate note, frequently changing IP adresses is most often an indicator of nothing more than someone who is editing on a mobile connection. This can usually be easily verified with an online IP lookup.
>
> V/r
> TJW/GMG
>
>
>
>> On Fri, Aug 23, 2019, 02:44 RhinosF1 <[hidden email]> wrote:
>> Just a note that you can still go through warnings for vandalism etc. and
>> report to AIV.
>>
>> Or at that edit speed, you may have a chance at AN at reporting for
>> bot-like edits which will draw attention to the account.
>>
>> If you ever need help, things like #wikipedia-en-help on Freenode IRC exist
>> so you can ask other users.
>>
>> RhinosF1
>> Miraheze Volunteer
>>
>> On Fri, 23 Aug 2019 at 06:57, Kerry Raymond <[hidden email]> wrote:
>>
>> > Currently, to open a sockpuppet investigation, you must name the two (or
>> > more) accounts that you believe to be sockpuppets with "clear, behavioural
>> > evidence of sock puppetry" which is typically in the form of pairs of edits
>> > that demonstrate similar edit behaviours that are unlikely to naturally
>> > occur. Now if you spend enough time on-wiki, you develop an intuition about
>> > behaviours you see on your watchlist and in article edit histories. Often I
>> > am highly suspicious that an account is a sockpuppet, but I cannot report
>> > them because I don't know which other account is involved.
>> >
>> >
>> >
>> > As a example, I recently encounted User:Shelati an account about 1 day old
>> > at that time with nearly 100 edits in that day all about 1-2 minutes apart,
>> > mostly making a similar change to a large number of Australian place
>> > infoboxes.
>> >
>> >
>> >
>> > https://en.wikipedia.org/w/index.php?title=Special:Contributions/Shelati
>> > <
>> > https://en.wikipedia.org/w/index.php?title=Special:Contributions/Shelati&of
>> > fset=20190728053057&limit=100&target=Shelati
>> > <https://en.wikipedia.org/w/index.php?title=Special:Contributions/Shelati&offset=20190728053057&limit=100&target=Shelati>
>> > >
>> > &offset=20190728053057&limit=100&target=Shelati
>> >
>> >
>> >
>> > Genuine new users do not edit that quickly, do not use templates and do not
>> > mess structurally with infoboxes (at most they try to change the values).
>> > It
>> > "smelled" like a sockpuppet. However, as I did not recognise that pattern
>> > of
>> > edit behaviour as being that of any other user I was familiar with, it
>> > wasn't something I could report for sockpuppet investigation. Anyhow after
>> > about 2 weeks, the user was blocked as a sockpuppet. Someone must have
>> > noticed and figured out the other account:
>> >
>> >
>> >
>> >
>> > https://en.wikipedia.org/wiki/Wikipedia:Sockpuppet_investigations/Meganesia/
>> > Archive
>> >
>> >
>> >
>> > Two weeks and 1,279 edits later . that's over 1000 possibly problematic
>> > edits after I first suspected them. But that's nothing compared with
>> > another
>> > ongoing situation in which a very large number of different IPs are engaged
>> > in a pattern of problem edits on mostly Australian articles (a few
>> > different
>> > types of edits but an obvious "quack like a duck" situation). The IP number
>> > changes frequently (and one assumes deliberately). The edits potentially go
>> > back to 2013 but appear to have intensified in 2018/2019. Here's one user's
>> > summary of all the IP addresses involved, and the extent to which they have
>> > been cleaned up, given many thousands of edits are involved, see:
>> >
>> >
>> >
>> > https://en.wikipedia.org/wiki/User:IamNotU/History_cleanup
>> >
>> >
>> >
>> > As well as the damage done to the content (which harms the readers), these
>> > IP sockpuppets are consuming enormous amounts of effort to track them down
>> > and revert them, which could be more productively used to improve the
>> > content. We need better tools to foil these pests. So I want to put that
>> > challenge out to this list.
>> >
>> >
>> >
>> > Kerry
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > Wiki-research-l mailing list
>> > [hidden email]
>> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>> >
>> --
>> RhinosF1
>> Miraheze Volunteer
>> _______________________________________________
>> Wiki-research-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: sockpuppets and how to find them sooner

Timothy Wood
Then again, apparently the Foundation has a PR team whose only job is to
compile the latest marketing buzzwords, and they seem to really love AI.
You might get some buy in. Never know.

V/r
TJW/GMG

On Fri, Aug 23, 2019, 11:23 Kerry Raymond <[hidden email]> wrote:

> That's why I think we need "signatures" which is my shorthand for things
> like a hash function or a bounding box, a means by which many non-matching
> accounts can be eliminated at low cost, reserving the high cost comparisons
> (machine or human) only for high probability candidates. It is
> machine-computed and *stored* on the banning/blocking of a user. When a
> suspect user is presented, it calculates their signature and then compares
> them against the pre-calculated signatures of the bad users. I don't think
> it is too expensive if we can find the right "signature". CPU cycles are
> pretty fast. I only have an average laptop CPU-wise but I burn through
> loads of comparisons of geographic boundaries (complex polygons with many
> points) thanks to bounding boxes which reduce the complex shape to the
> smallest rectangle that contains it. Testing intersection of polygons is
> expensive, testing the intersection of rectangles is trivial.
>
> I think we can probably ignore the myriad of trivial bad guys for the
> purposes of signature collecting, eg blocked for vandalism after their
> first few edits. Sock puppets or their masters don't immediately appear as
> bad guys on individual edits. It's often more about long-term behaviours
> like POV pushing, refusal to engage in consensus building, slow burning
> edit wars, etc, that does not show on individual edits.
>
> Kerry
>
> Sent from my iPad
>
> On 23 Aug 2019, at 11:42 pm, Timothy Wood <[hidden email]>
> wrote:
>
> You are correct that in all but the most obvious cases, filing an SPI can
> be exceptionally time consuming. I'm afraid there is no obvious technical
> solution there that would not involve a complicated AI that is probably
> beyond the ability of the foundation to produce.
>
> There is quite a bit of data available in the form of years of SPIs, but
> it seems like you're talking about Facebook or Google levels of machine
> learning, and even years of SPIs is tiny compared to the amount of data
> they work with.
>
> On a separate note, frequently changing IP adresses is most often an
> indicator of nothing more than someone who is editing on a mobile
> connection. This can usually be easily verified with an online IP lookup.
>
> V/r
> TJW/GMG
>
>
>
> On Fri, Aug 23, 2019, 02:44 RhinosF1 <[hidden email]> wrote:
>
>> Just a note that you can still go through warnings for vandalism etc. and
>> report to AIV.
>>
>> Or at that edit speed, you may have a chance at AN at reporting for
>> bot-like edits which will draw attention to the account.
>>
>> If you ever need help, things like #wikipedia-en-help on Freenode IRC
>> exist
>> so you can ask other users.
>>
>> RhinosF1
>> Miraheze Volunteer
>>
>> On Fri, 23 Aug 2019 at 06:57, Kerry Raymond <[hidden email]>
>> wrote:
>>
>> > Currently, to open a sockpuppet investigation, you must name the two (or
>> > more) accounts that you believe to be sockpuppets with "clear,
>> behavioural
>> > evidence of sock puppetry" which is typically in the form of pairs of
>> edits
>> > that demonstrate similar edit behaviours that are unlikely to naturally
>> > occur. Now if you spend enough time on-wiki, you develop an intuition
>> about
>> > behaviours you see on your watchlist and in article edit histories.
>> Often I
>> > am highly suspicious that an account is a sockpuppet, but I cannot
>> report
>> > them because I don't know which other account is involved.
>> >
>> >
>> >
>> > As a example, I recently encounted User:Shelati an account about 1 day
>> old
>> > at that time with nearly 100 edits in that day all about 1-2 minutes
>> apart,
>> > mostly making a similar change to a large number of Australian place
>> > infoboxes.
>> >
>> >
>> >
>> >
>> https://en.wikipedia.org/w/index.php?title=Special:Contributions/Shelati
>> > <
>> >
>> https://en.wikipedia.org/w/index.php?title=Special:Contributions/Shelati&of
>> > fset=20190728053057&limit=100&target=Shelati
>> > <
>> https://en.wikipedia.org/w/index.php?title=Special:Contributions/Shelati&offset=20190728053057&limit=100&target=Shelati
>> >
>> > >
>> > &offset=20190728053057&limit=100&target=Shelati
>> >
>> >
>> >
>> > Genuine new users do not edit that quickly, do not use templates and do
>> not
>> > mess structurally with infoboxes (at most they try to change the
>> values).
>> > It
>> > "smelled" like a sockpuppet. However, as I did not recognise that
>> pattern
>> > of
>> > edit behaviour as being that of any other user I was familiar with, it
>> > wasn't something I could report for sockpuppet investigation. Anyhow
>> after
>> > about 2 weeks, the user was blocked as a sockpuppet. Someone must have
>> > noticed and figured out the other account:
>> >
>> >
>> >
>> >
>> >
>> https://en.wikipedia.org/wiki/Wikipedia:Sockpuppet_investigations/Meganesia/
>> > Archive
>> >
>> >
>> >
>> > Two weeks and 1,279 edits later . that's over 1000 possibly problematic
>> > edits after I first suspected them. But that's nothing compared with
>> > another
>> > ongoing situation in which a very large number of different IPs are
>> engaged
>> > in a pattern of problem edits on mostly Australian articles (a few
>> > different
>> > types of edits but an obvious "quack like a duck" situation). The IP
>> number
>> > changes frequently (and one assumes deliberately). The edits
>> potentially go
>> > back to 2013 but appear to have intensified in 2018/2019. Here's one
>> user's
>> > summary of all the IP addresses involved, and the extent to which they
>> have
>> > been cleaned up, given many thousands of edits are involved, see:
>> >
>> >
>> >
>> > https://en.wikipedia.org/wiki/User:IamNotU/History_cleanup
>> >
>> >
>> >
>> > As well as the damage done to the content (which harms the readers),
>> these
>> > IP sockpuppets are consuming enormous amounts of effort to track them
>> down
>> > and revert them, which could be more productively used to improve the
>> > content. We need better tools to foil these pests. So I want to put that
>> > challenge out to this list.
>> >
>> >
>> >
>> > Kerry
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > Wiki-research-l mailing list
>> > [hidden email]
>> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>> >
>> --
>> RhinosF1
>> Miraheze Volunteer
>> _______________________________________________
>> Wiki-research-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: sockpuppets and how to find them sooner

Nick Wilson (Quiddity)
On Fri, Aug 23, 2019 at 5:23 PM Kerry Raymond <[hidden email]>
wrote:

> That's why I think we need "signatures" which is my shorthand for things
> like a hash function or a bounding box, a means by which many non-matching
> accounts can be eliminated at low cost, reserving the high cost comparisons
> (machine or human) only for high probability candidates. [...]
>

The https://www.mediawiki.org/wiki/Wikimedia_Scoring_Platform_team might
have some insights into these questions, although I believe they (current
and some former members) are active on this mailing list, so might chime in
here.

On Fri, Aug 23, 2019 at 11:52 PM Timothy Wood <[hidden email]>
wrote:

> Then again, apparently the Foundation has a PR team whose only job is to
> [...]
>

Please do not denigrate groups of people. Communicating about the
movement's mission and activities with large parts of the outside world,
and helping others in the movement to also do so, is an important role (and
is just part of their role). Similar to your own role in OTRS. However that
is all off-topic in this thread.

I hope everyone has a pleasant weekend.
Quiddity
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: sockpuppets and how to find them sooner

Timothy Wood
Is that what they do? I thought we mostly did that.

TJW/GMG

On Sat, Aug 24, 2019, 06:20 Nick Wilson (Quiddity) <[hidden email]>
wrote:

> On Fri, Aug 23, 2019 at 5:23 PM Kerry Raymond <[hidden email]>
> wrote:
>
> > That's why I think we need "signatures" which is my shorthand for things
> > like a hash function or a bounding box, a means by which many
> non-matching
> > accounts can be eliminated at low cost, reserving the high cost
> comparisons
> > (machine or human) only for high probability candidates. [...]
> >
>
> The https://www.mediawiki.org/wiki/Wikimedia_Scoring_Platform_team might
> have some insights into these questions, although I believe they (current
> and some former members) are active on this mailing list, so might chime in
> here.
>
> On Fri, Aug 23, 2019 at 11:52 PM Timothy Wood <[hidden email]
> >
> wrote:
>
> > Then again, apparently the Foundation has a PR team whose only job is to
> > [...]
> >
>
> Please do not denigrate groups of people. Communicating about the
> movement's mission and activities with large parts of the outside world,
> and helping others in the movement to also do so, is an important role (and
> is just part of their role). Similar to your own role in OTRS. However that
> is all off-topic in this thread.
>
> I hope everyone has a pleasant weekend.
> Quiddity
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: sockpuppets and how to find them sooner

Federico Leva (Nemo)
In reply to this post by Aaron Halfaker-3
Please everyone avoid using jargon specific to the English Wikipedia on
this cross-language and cross-wiki mailing list.

Aaron Halfaker, 23/08/19 17:36:
> I think embeddings[1] would be a nice way to create a signature.

There is some discussion of acceptable user fingerprinting (presumably
to be available to CheckUsers only), other than the usual over-reliance
on IP addresses, in particular at
<https://meta.wikimedia.org/wiki/Talk:IP_Editing:_Privacy_Enhancement_and_Abuse_Mitigation>.

Federico

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: sockpuppets and how to find them sooner

Jonathan Morgan
Nemo,

Can you please elaborate on what use of language, and whose use of
language, you are criticizing? It is not clear from your email what
"jargon" you refer to, and why you feel it is inappropriate.

Jonathan

On Mon, Aug 26, 2019 at 12:59 AM Federico Leva (Nemo) <[hidden email]>
wrote:

> Please everyone avoid using jargon specific to the English Wikipedia on
> this cross-language and cross-wiki mailing list.
>
> Aaron Halfaker, 23/08/19 17:36:
> > I think embeddings[1] would be a nice way to create a signature.
>
> There is some discussion of acceptable user fingerprinting (presumably
> to be available to CheckUsers only), other than the usual over-reliance
> on IP addresses, in particular at
> <
> https://meta.wikimedia.org/wiki/Talk:IP_Editing:_Privacy_Enhancement_and_Abuse_Mitigation
> >.
>
> Federico
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>


--
Jonathan T. Morgan
Senior Design Researcher
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
(Uses He/Him)
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: sockpuppets and how to find them sooner

Leila Zia
In reply to this post by Timothy Wood
Kerry, thanks for kicking this off. One update on our end:

There is a general alignment between a few different teams/departments
in WMF that this is an important problem to support chekcusers with in
a better way than what we do today.

I gave a presentation in Wikimania about the research on sockpuppet
detection [1] which is primarily conducted by Srijan Kumar. The goal
of the research is to build models that use public data to identify
accounts that are predicted to be sockpuppets as soon as possible.
Srijan has made significant progress on this front and we'll be
presenting the results of the model to checkusers shortly to get their
feedback. Check out the slide deck [3] if you're interested to learn
more. More updates about the project will appear in [3].

Best,
Leila

[1] https://wikimania.wikimedia.org/wiki/2019:Research/Sockpuppet_detection_in_the_English_Wikipedia
[2] https://wikimania.wikimedia.org/wiki/File:Wikimania2019_research_presentation_sockpuppetDetection.pdf
[3] https://meta.wikimedia.org/wiki/Research:Sockpuppet_detection_in_Wikimedia_projects

On Sat, Aug 24, 2019 at 6:58 PM Timothy Wood
<[hidden email]> wrote:

>
> Is that what they do? I thought we mostly did that.
>
> TJW/GMG
>
> On Sat, Aug 24, 2019, 06:20 Nick Wilson (Quiddity) <[hidden email]>
> wrote:
>
> > On Fri, Aug 23, 2019 at 5:23 PM Kerry Raymond <[hidden email]>
> > wrote:
> >
> > > That's why I think we need "signatures" which is my shorthand for things
> > > like a hash function or a bounding box, a means by which many
> > non-matching
> > > accounts can be eliminated at low cost, reserving the high cost
> > comparisons
> > > (machine or human) only for high probability candidates. [...]
> > >
> >
> > The https://www.mediawiki.org/wiki/Wikimedia_Scoring_Platform_team might
> > have some insights into these questions, although I believe they (current
> > and some former members) are active on this mailing list, so might chime in
> > here.
> >
> > On Fri, Aug 23, 2019 at 11:52 PM Timothy Wood <[hidden email]
> > >
> > wrote:
> >
> > > Then again, apparently the Foundation has a PR team whose only job is to
> > > [...]
> > >
> >
> > Please do not denigrate groups of people. Communicating about the
> > movement's mission and activities with large parts of the outside world,
> > and helping others in the movement to also do so, is an important role (and
> > is just part of their role). Similar to your own role in OTRS. However that
> > is all off-topic in this thread.
> >
> > I hope everyone has a pleasant weekend.
> > Quiddity
> > _______________________________________________
> > Wiki-research-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l