A potential new way to deal with spambots

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

A potential new way to deal with spambots

Pine W
This sounds like an interesting potential approach to deal with spambots,
and hopefully to deter the people who make them.

https://techcrunch.com/2019/02/05/kasada-bots/

I don't know how practical it would be to implement an approach like this
in the Wikiverse, and whether licensing proprietary technology would be
required.

I would be interested in decreasing the quantity and effectiveness of
spambots that misuse WMF infrastructure, damage the quality of Wikimedia
content, and drain significant cumulative time from the limited supply of
good faith contributors.

Pine
( https://meta.wikimedia.org/wiki/User:Pine )
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: A potential new way to deal with spambots

Gergo Tisza
On Fri, Feb 8, 2019 at 6:20 PM Pine W <[hidden email]> wrote:

> I don't know how practical it would be to implement an approach like this
> in the Wikiverse, and whether licensing proprietary technology would be
> required.
>

They are talking about Polyform [1], a reverse proxy that filters traffic
with a combination of browser fingerprinting, behavior analysis and proof
of work.
Proof of work is not really useful unless you have huge levels of bot
traffic from a single bot operator (also it means locking out users with no
Javascript); browser and behavior analysis very likely cannot be outsourced
to a third party for privacy reasons. Maybe we could do it ourselves
(although it would still bring up interesting questions privacy-wise) but
it would be a huge undertaking.


[1] https://www.kasada.io/product/
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: A potential new way to deal with spambots

Pine W
OK. Yesterday I was looking with a few other ENWP people at what I think
was a series of edits by either a vandal bot or an inadequately designed
and unapproved good faith bot. I read that it made approximately 500 edits
before someone who knew enough about ENWP saw what was happening and did
something about it. I don't know how many problematic bots we have, in
addition to vandal bots, but I am confident that they drain a nontrivial
amount of time from stewards, admins, and patrollers.

I don't know how much of a priority WMF places on detecting and stopping
unwelcome bots, but I think that the question of how to decrease the
numbers and effectiveness of unwelcome bots would be a good topic for WMF
to research.

Pine
( https://meta.wikimedia.org/wiki/User:Pine )


On Sat, Feb 9, 2019 at 9:24 PM Gergo Tisza <[hidden email]> wrote:

> On Fri, Feb 8, 2019 at 6:20 PM Pine W <[hidden email]> wrote:
>
> > I don't know how practical it would be to implement an approach like this
> > in the Wikiverse, and whether licensing proprietary technology would be
> > required.
> >
>
> They are talking about Polyform [1], a reverse proxy that filters traffic
> with a combination of browser fingerprinting, behavior analysis and proof
> of work.
> Proof of work is not really useful unless you have huge levels of bot
> traffic from a single bot operator (also it means locking out users with no
> Javascript); browser and behavior analysis very likely cannot be outsourced
> to a third party for privacy reasons. Maybe we could do it ourselves
> (although it would still bring up interesting questions privacy-wise) but
> it would be a huge undertaking.
>
>
> [1] https://www.kasada.io/product/
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: A potential new way to deal with spambots

Pine W
To clarify the types of unwelcome bots that we have, here are the ones that
I think are most common:

1) Spambots

2) Vandalbots

3) Unauthorized bots which may be intended to act in good faith but which
may cause problems that could probably have been identified during standard
testing in Wikimedia communities which have a relatively well developed bot
approval process. (See
https://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval.)

Maybe unwelcome bots are not a priority for WMF at the moment, in which
case I could add this subject into a backlog. I am sorry if I sound grumpy
at WMF regarding this subject; this is a problem but I know that there are
millions of problems and I don't expect a different project to be dropped
in order to address this one.

While it is a rough analogy, I think that this movie clip helps to
illustrate a problem of bad bots. Although the clip is amusing, I am not
amused by unwelcome bots causing problems on ENWP or anywhere else in the
Wikiverse. https://www.youtube.com/watch?v=lokKpSrNqDA

Thanks,

Pine
( https://meta.wikimedia.org/wiki/User:Pine )



On Sat, Feb 9, 2019, 1:40 PM Pine W <[hidden email] wrote:

> OK. Yesterday I was looking with a few other ENWP people at what I think
> was a series of edits by either a vandal bot or an inadequately designed
> and unapproved good faith bot. I read that it made approximately 500 edits
> before someone who knew enough about ENWP saw what was happening and did
> something about it. I don't know how many problematic bots we have, in
> addition to vandal bots, but I am confident that they drain a nontrivial
> amount of time from stewards, admins, and patrollers.
>
> I don't know how much of a priority WMF places on detecting and stopping
> unwelcome bots, but I think that the question of how to decrease the
> numbers and effectiveness of unwelcome bots would be a good topic for WMF
> to research.
>
> Pine
> ( https://meta.wikimedia.org/wiki/User:Pine )
>
>
> On Sat, Feb 9, 2019 at 9:24 PM Gergo Tisza <[hidden email]> wrote:
>
>> On Fri, Feb 8, 2019 at 6:20 PM Pine W <[hidden email]> wrote:
>>
>> > I don't know how practical it would be to implement an approach like
>> this
>> > in the Wikiverse, and whether licensing proprietary technology would be
>> > required.
>> >
>>
>> They are talking about Polyform [1], a reverse proxy that filters traffic
>> with a combination of browser fingerprinting, behavior analysis and proof
>> of work.
>> Proof of work is not really useful unless you have huge levels of bot
>> traffic from a single bot operator (also it means locking out users with
>> no
>> Javascript); browser and behavior analysis very likely cannot be
>> outsourced
>> to a third party for privacy reasons. Maybe we could do it ourselves
>> (although it would still bring up interesting questions privacy-wise) but
>> it would be a huge undertaking.
>>
>>
>> [1] https://www.kasada.io/product/
>> _______________________________________________
>> Wikitech-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: A potential new way to deal with spambots

bawolff
Sure its certainly a front we can do better on.

I don't think Kasada is a product that's appropriate at this time. Ignoring
the ideological aspect of it being non-free software, there's a lot of easy
things we could and should try first.

However, I'd caution against viewing this as purely a technical problem.
Wikimedia is not like other websites - we have allowable bots. For many
commercial websites, the only good bot is a dead bot. Wikimedia has many
good bots. On enwiki usually they have to be approved, I don't think that's
true on all wikis. We also consider it perfectly ok to do limited testing
of bots before it is approved. We also encourage the creation of
alternative "clients", which from a server perspective looks like a bot.
Unlike other websites where anything non-human is evil, here we need to
ensure our blocking corresponds to social norms of the community. This may
sound not that hard, but I think it complicates botblocking more than is
obvious at first glance.

Second, this sort of thing is something that tends to far through the
cracks at WMF. AFAIK the last time there was a team responsible for admin
tools & anti-abuse was 2013 (
https://www.mediawiki.org/wiki/Admin_tools_development). I believe (correct
me if I'm wrong) that anti-harrasment team is all about human harassment
and not anti-abuse in this sense. Security is adjacent to this problem, but
traditionally has not considered this problem in scope. Even core tools
like checkuser have been largely ignored by the foundation for many many
years.

I guess this is a long winded way of saying - I think there should be a
team responsible for this sort of stuff at WMF, but there isn't one. I
think there's a lot of rather easy things we can try (Off the top of my
head: Better captchas. More adaptive rate limits that adjust based on how
evilish you look, etc), but they definitely require close involvement with
the community to ensure that we do the actual right thing.

--
Brian
(p.s. Consider this a volunteer hat email)

On Sun, Feb 10, 2019 at 6:06 AM Pine W <[hidden email]> wrote:

> To clarify the types of unwelcome bots that we have, here are the ones that
> I think are most common:
>
> 1) Spambots
>
> 2) Vandalbots
>
> 3) Unauthorized bots which may be intended to act in good faith but which
> may cause problems that could probably have been identified during standard
> testing in Wikimedia communities which have a relatively well developed bot
> approval process. (See
> https://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval.)
>
> Maybe unwelcome bots are not a priority for WMF at the moment, in which
> case I could add this subject into a backlog. I am sorry if I sound grumpy
> at WMF regarding this subject; this is a problem but I know that there are
> millions of problems and I don't expect a different project to be dropped
> in order to address this one.
>
> While it is a rough analogy, I think that this movie clip helps to
> illustrate a problem of bad bots. Although the clip is amusing, I am not
> amused by unwelcome bots causing problems on ENWP or anywhere else in the
> Wikiverse. https://www.youtube.com/watch?v=lokKpSrNqDA
>
> Thanks,
>
> Pine
> ( https://meta.wikimedia.org/wiki/User:Pine )
>
>
>
> On Sat, Feb 9, 2019, 1:40 PM Pine W <[hidden email] wrote:
>
> > OK. Yesterday I was looking with a few other ENWP people at what I think
> > was a series of edits by either a vandal bot or an inadequately designed
> > and unapproved good faith bot. I read that it made approximately 500
> edits
> > before someone who knew enough about ENWP saw what was happening and did
> > something about it. I don't know how many problematic bots we have, in
> > addition to vandal bots, but I am confident that they drain a nontrivial
> > amount of time from stewards, admins, and patrollers.
> >
> > I don't know how much of a priority WMF places on detecting and stopping
> > unwelcome bots, but I think that the question of how to decrease the
> > numbers and effectiveness of unwelcome bots would be a good topic for WMF
> > to research.
> >
> > Pine
> > ( https://meta.wikimedia.org/wiki/User:Pine )
> >
> >
> > On Sat, Feb 9, 2019 at 9:24 PM Gergo Tisza <[hidden email]> wrote:
> >
> >> On Fri, Feb 8, 2019 at 6:20 PM Pine W <[hidden email]> wrote:
> >>
> >> > I don't know how practical it would be to implement an approach like
> >> this
> >> > in the Wikiverse, and whether licensing proprietary technology would
> be
> >> > required.
> >> >
> >>
> >> They are talking about Polyform [1], a reverse proxy that filters
> traffic
> >> with a combination of browser fingerprinting, behavior analysis and
> proof
> >> of work.
> >> Proof of work is not really useful unless you have huge levels of bot
> >> traffic from a single bot operator (also it means locking out users with
> >> no
> >> Javascript); browser and behavior analysis very likely cannot be
> >> outsourced
> >> to a third party for privacy reasons. Maybe we could do it ourselves
> >> (although it would still bring up interesting questions privacy-wise)
> but
> >> it would be a huge undertaking.
> >>
> >>
> >> [1] https://www.kasada.io/product/
> >> _______________________________________________
> >> Wikitech-l mailing list
> >> [hidden email]
> >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >>
> >
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: A potential new way to deal with spambots

Jonathan Morgan
This may be naive, but... isn't the wishlist filling this need? And if not
through a consensus-driven method like the wishlist, how should a WMF team
prioritize which power user tools it needs to focus on?

Or is just a matter of "Yes, wishlist, but more of it"?

- Jonathan

On Mon, Feb 11, 2019 at 2:34 AM bawolff <[hidden email]> wrote:

> Sure its certainly a front we can do better on.
>
> I don't think Kasada is a product that's appropriate at this time. Ignoring
> the ideological aspect of it being non-free software, there's a lot of easy
> things we could and should try first.
>
> However, I'd caution against viewing this as purely a technical problem.
> Wikimedia is not like other websites - we have allowable bots. For many
> commercial websites, the only good bot is a dead bot. Wikimedia has many
> good bots. On enwiki usually they have to be approved, I don't think that's
> true on all wikis. We also consider it perfectly ok to do limited testing
> of bots before it is approved. We also encourage the creation of
> alternative "clients", which from a server perspective looks like a bot.
> Unlike other websites where anything non-human is evil, here we need to
> ensure our blocking corresponds to social norms of the community. This may
> sound not that hard, but I think it complicates botblocking more than is
> obvious at first glance.
>
> Second, this sort of thing is something that tends to far through the
> cracks at WMF. AFAIK the last time there was a team responsible for admin
> tools & anti-abuse was 2013 (
> https://www.mediawiki.org/wiki/Admin_tools_development). I believe
> (correct
> me if I'm wrong) that anti-harrasment team is all about human harassment
> and not anti-abuse in this sense. Security is adjacent to this problem, but
> traditionally has not considered this problem in scope. Even core tools
> like checkuser have been largely ignored by the foundation for many many
> years.
>
> I guess this is a long winded way of saying - I think there should be a
> team responsible for this sort of stuff at WMF, but there isn't one. I
> think there's a lot of rather easy things we can try (Off the top of my
> head: Better captchas. More adaptive rate limits that adjust based on how
> evilish you look, etc), but they definitely require close involvement with
> the community to ensure that we do the actual right thing.
>
> --
> Brian
> (p.s. Consider this a volunteer hat email)
>
> On Sun, Feb 10, 2019 at 6:06 AM Pine W <[hidden email]> wrote:
>
> > To clarify the types of unwelcome bots that we have, here are the ones
> that
> > I think are most common:
> >
> > 1) Spambots
> >
> > 2) Vandalbots
> >
> > 3) Unauthorized bots which may be intended to act in good faith but which
> > may cause problems that could probably have been identified during
> standard
> > testing in Wikimedia communities which have a relatively well developed
> bot
> > approval process. (See
> > https://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval.)
> >
> > Maybe unwelcome bots are not a priority for WMF at the moment, in which
> > case I could add this subject into a backlog. I am sorry if I sound
> grumpy
> > at WMF regarding this subject; this is a problem but I know that there
> are
> > millions of problems and I don't expect a different project to be dropped
> > in order to address this one.
> >
> > While it is a rough analogy, I think that this movie clip helps to
> > illustrate a problem of bad bots. Although the clip is amusing, I am not
> > amused by unwelcome bots causing problems on ENWP or anywhere else in the
> > Wikiverse. https://www.youtube.com/watch?v=lokKpSrNqDA
> >
> > Thanks,
> >
> > Pine
> > ( https://meta.wikimedia.org/wiki/User:Pine )
> >
> >
> >
> > On Sat, Feb 9, 2019, 1:40 PM Pine W <[hidden email] wrote:
> >
> > > OK. Yesterday I was looking with a few other ENWP people at what I
> think
> > > was a series of edits by either a vandal bot or an inadequately
> designed
> > > and unapproved good faith bot. I read that it made approximately 500
> > edits
> > > before someone who knew enough about ENWP saw what was happening and
> did
> > > something about it. I don't know how many problematic bots we have, in
> > > addition to vandal bots, but I am confident that they drain a
> nontrivial
> > > amount of time from stewards, admins, and patrollers.
> > >
> > > I don't know how much of a priority WMF places on detecting and
> stopping
> > > unwelcome bots, but I think that the question of how to decrease the
> > > numbers and effectiveness of unwelcome bots would be a good topic for
> WMF
> > > to research.
> > >
> > > Pine
> > > ( https://meta.wikimedia.org/wiki/User:Pine )
> > >
> > >
> > > On Sat, Feb 9, 2019 at 9:24 PM Gergo Tisza <[hidden email]>
> wrote:
> > >
> > >> On Fri, Feb 8, 2019 at 6:20 PM Pine W <[hidden email]> wrote:
> > >>
> > >> > I don't know how practical it would be to implement an approach like
> > >> this
> > >> > in the Wikiverse, and whether licensing proprietary technology would
> > be
> > >> > required.
> > >> >
> > >>
> > >> They are talking about Polyform [1], a reverse proxy that filters
> > traffic
> > >> with a combination of browser fingerprinting, behavior analysis and
> > proof
> > >> of work.
> > >> Proof of work is not really useful unless you have huge levels of bot
> > >> traffic from a single bot operator (also it means locking out users
> with
> > >> no
> > >> Javascript); browser and behavior analysis very likely cannot be
> > >> outsourced
> > >> to a third party for privacy reasons. Maybe we could do it ourselves
> > >> (although it would still bring up interesting questions privacy-wise)
> > but
> > >> it would be a huge undertaking.
> > >>
> > >>
> > >> [1] https://www.kasada.io/product/
> > >> _______________________________________________
> > >> Wikitech-l mailing list
> > >> [hidden email]
> > >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > >>
> > >
> > _______________________________________________
> > Wikitech-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



--
Jonathan T. Morgan
Senior Design Researcher
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: A potential new way to deal with spambots

Aaron Halfaker-3
We've been working on unflagged bot detection on my team.  It's far from a
real product integration, but we have shown that it works in practice.  We
tested this in Wikidata, but I don't see a good reason why a similar
strategy wouldn't work for English Wikipedia.

Hall, A., Terveen, L., & Halfaker, A. (2018). Bot Detection in Wikidata
Using Behavioral and Other Informal Cues.
*Proceedings of the ACM on Human-Computer Interaction*, *2*(CSCW), 64.  pdf
<https://dl.acm.org/ft_gateway.cfm?id=3274333&type=pdf>

In theory, we could get this into ORES if there was strong demand.  As Pine
points out, we'd need to delay some other projects.  For reference, the
next thing on the backlog that I'm looking at is setting article quality
prediction for Swedish Wikipedia.

-Aaron

On Mon, Feb 11, 2019 at 11:19 AM Jonathan Morgan <[hidden email]>
wrote:

> This may be naive, but... isn't the wishlist filling this need? And if not
> through a consensus-driven method like the wishlist, how should a WMF team
> prioritize which power user tools it needs to focus on?
>
> Or is just a matter of "Yes, wishlist, but more of it"?
>
> - Jonathan
>
> On Mon, Feb 11, 2019 at 2:34 AM bawolff <[hidden email]> wrote:
>
> > Sure its certainly a front we can do better on.
> >
> > I don't think Kasada is a product that's appropriate at this time.
> Ignoring
> > the ideological aspect of it being non-free software, there's a lot of
> easy
> > things we could and should try first.
> >
> > However, I'd caution against viewing this as purely a technical problem.
> > Wikimedia is not like other websites - we have allowable bots. For many
> > commercial websites, the only good bot is a dead bot. Wikimedia has many
> > good bots. On enwiki usually they have to be approved, I don't think
> that's
> > true on all wikis. We also consider it perfectly ok to do limited testing
> > of bots before it is approved. We also encourage the creation of
> > alternative "clients", which from a server perspective looks like a bot.
> > Unlike other websites where anything non-human is evil, here we need to
> > ensure our blocking corresponds to social norms of the community. This
> may
> > sound not that hard, but I think it complicates botblocking more than is
> > obvious at first glance.
> >
> > Second, this sort of thing is something that tends to far through the
> > cracks at WMF. AFAIK the last time there was a team responsible for admin
> > tools & anti-abuse was 2013 (
> > https://www.mediawiki.org/wiki/Admin_tools_development). I believe
> > (correct
> > me if I'm wrong) that anti-harrasment team is all about human harassment
> > and not anti-abuse in this sense. Security is adjacent to this problem,
> but
> > traditionally has not considered this problem in scope. Even core tools
> > like checkuser have been largely ignored by the foundation for many many
> > years.
> >
> > I guess this is a long winded way of saying - I think there should be a
> > team responsible for this sort of stuff at WMF, but there isn't one. I
> > think there's a lot of rather easy things we can try (Off the top of my
> > head: Better captchas. More adaptive rate limits that adjust based on how
> > evilish you look, etc), but they definitely require close involvement
> with
> > the community to ensure that we do the actual right thing.
> >
> > --
> > Brian
> > (p.s. Consider this a volunteer hat email)
> >
> > On Sun, Feb 10, 2019 at 6:06 AM Pine W <[hidden email]> wrote:
> >
> > > To clarify the types of unwelcome bots that we have, here are the ones
> > that
> > > I think are most common:
> > >
> > > 1) Spambots
> > >
> > > 2) Vandalbots
> > >
> > > 3) Unauthorized bots which may be intended to act in good faith but
> which
> > > may cause problems that could probably have been identified during
> > standard
> > > testing in Wikimedia communities which have a relatively well developed
> > bot
> > > approval process. (See
> > > https://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval.)
> > >
> > > Maybe unwelcome bots are not a priority for WMF at the moment, in which
> > > case I could add this subject into a backlog. I am sorry if I sound
> > grumpy
> > > at WMF regarding this subject; this is a problem but I know that there
> > are
> > > millions of problems and I don't expect a different project to be
> dropped
> > > in order to address this one.
> > >
> > > While it is a rough analogy, I think that this movie clip helps to
> > > illustrate a problem of bad bots. Although the clip is amusing, I am
> not
> > > amused by unwelcome bots causing problems on ENWP or anywhere else in
> the
> > > Wikiverse. https://www.youtube.com/watch?v=lokKpSrNqDA
> > >
> > > Thanks,
> > >
> > > Pine
> > > ( https://meta.wikimedia.org/wiki/User:Pine )
> > >
> > >
> > >
> > > On Sat, Feb 9, 2019, 1:40 PM Pine W <[hidden email] wrote:
> > >
> > > > OK. Yesterday I was looking with a few other ENWP people at what I
> > think
> > > > was a series of edits by either a vandal bot or an inadequately
> > designed
> > > > and unapproved good faith bot. I read that it made approximately 500
> > > edits
> > > > before someone who knew enough about ENWP saw what was happening and
> > did
> > > > something about it. I don't know how many problematic bots we have,
> in
> > > > addition to vandal bots, but I am confident that they drain a
> > nontrivial
> > > > amount of time from stewards, admins, and patrollers.
> > > >
> > > > I don't know how much of a priority WMF places on detecting and
> > stopping
> > > > unwelcome bots, but I think that the question of how to decrease the
> > > > numbers and effectiveness of unwelcome bots would be a good topic for
> > WMF
> > > > to research.
> > > >
> > > > Pine
> > > > ( https://meta.wikimedia.org/wiki/User:Pine )
> > > >
> > > >
> > > > On Sat, Feb 9, 2019 at 9:24 PM Gergo Tisza <[hidden email]>
> > wrote:
> > > >
> > > >> On Fri, Feb 8, 2019 at 6:20 PM Pine W <[hidden email]> wrote:
> > > >>
> > > >> > I don't know how practical it would be to implement an approach
> like
> > > >> this
> > > >> > in the Wikiverse, and whether licensing proprietary technology
> would
> > > be
> > > >> > required.
> > > >> >
> > > >>
> > > >> They are talking about Polyform [1], a reverse proxy that filters
> > > traffic
> > > >> with a combination of browser fingerprinting, behavior analysis and
> > > proof
> > > >> of work.
> > > >> Proof of work is not really useful unless you have huge levels of
> bot
> > > >> traffic from a single bot operator (also it means locking out users
> > with
> > > >> no
> > > >> Javascript); browser and behavior analysis very likely cannot be
> > > >> outsourced
> > > >> to a third party for privacy reasons. Maybe we could do it ourselves
> > > >> (although it would still bring up interesting questions
> privacy-wise)
> > > but
> > > >> it would be a huge undertaking.
> > > >>
> > > >>
> > > >> [1] https://www.kasada.io/product/
> > > >> _______________________________________________
> > > >> Wikitech-l mailing list
> > > >> [hidden email]
> > > >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > > >>
> > > >
> > > _______________________________________________
> > > Wikitech-l mailing list
> > > [hidden email]
> > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > _______________________________________________
> > Wikitech-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
>
> --
> Jonathan T. Morgan
> Senior Design Researcher
> Wikimedia Foundation
> User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



--

Aaron Halfaker

Principal Research Scientist

Head of the Scoring Platform team
Wikimedia Foundation
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: A potential new way to deal with spambots

Wikipedia Developers mailing list
In reply to this post by Jonathan Morgan
Stewards are just 34 people and are not enough to be a big voting power at the wishlist like enwiki people. What we actually need cannot get it thru that way.

--
Yongmin
Sent from my iPhone

Text licensed under CC BY ND 2.0 KR
Please note that this address is list-only address and any non-mailing list mails will be treated as spam.
Please use https://encrypt.to/0x947f156f16250de39788c3c35b625da5beff197a

2019. 2. 12. 02:18, Jonathan Morgan <[hidden email]> 작성:

> This may be naive, but... isn't the wishlist filling this need? And if not
> through a consensus-driven method like the wishlist, how should a WMF team
> prioritize which power user tools it needs to focus on?
>
> Or is just a matter of "Yes, wishlist, but more of it"?
>
> - Jonathan
>
>> On Mon, Feb 11, 2019 at 2:34 AM bawolff <[hidden email]> wrote:
>>
>> Sure its certainly a front we can do better on.
>>
>> I don't think Kasada is a product that's appropriate at this time. Ignoring
>> the ideological aspect of it being non-free software, there's a lot of easy
>> things we could and should try first.
>>
>> However, I'd caution against viewing this as purely a technical problem.
>> Wikimedia is not like other websites - we have allowable bots. For many
>> commercial websites, the only good bot is a dead bot. Wikimedia has many
>> good bots. On enwiki usually they have to be approved, I don't think that's
>> true on all wikis. We also consider it perfectly ok to do limited testing
>> of bots before it is approved. We also encourage the creation of
>> alternative "clients", which from a server perspective looks like a bot.
>> Unlike other websites where anything non-human is evil, here we need to
>> ensure our blocking corresponds to social norms of the community. This may
>> sound not that hard, but I think it complicates botblocking more than is
>> obvious at first glance.
>>
>> Second, this sort of thing is something that tends to far through the
>> cracks at WMF. AFAIK the last time there was a team responsible for admin
>> tools & anti-abuse was 2013 (
>> https://www.mediawiki.org/wiki/Admin_tools_development). I believe
>> (correct
>> me if I'm wrong) that anti-harrasment team is all about human harassment
>> and not anti-abuse in this sense. Security is adjacent to this problem, but
>> traditionally has not considered this problem in scope. Even core tools
>> like checkuser have been largely ignored by the foundation for many many
>> years.
>>
>> I guess this is a long winded way of saying - I think there should be a
>> team responsible for this sort of stuff at WMF, but there isn't one. I
>> think there's a lot of rather easy things we can try (Off the top of my
>> head: Better captchas. More adaptive rate limits that adjust based on how
>> evilish you look, etc), but they definitely require close involvement with
>> the community to ensure that we do the actual right thing.
>>
>> --
>> Brian
>> (p.s. Consider this a volunteer hat email)
>>
>>> On Sun, Feb 10, 2019 at 6:06 AM Pine W <[hidden email]> wrote:
>>>
>>> To clarify the types of unwelcome bots that we have, here are the ones
>> that
>>> I think are most common:
>>>
>>> 1) Spambots
>>>
>>> 2) Vandalbots
>>>
>>> 3) Unauthorized bots which may be intended to act in good faith but which
>>> may cause problems that could probably have been identified during
>> standard
>>> testing in Wikimedia communities which have a relatively well developed
>> bot
>>> approval process. (See
>>> https://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval.)
>>>
>>> Maybe unwelcome bots are not a priority for WMF at the moment, in which
>>> case I could add this subject into a backlog. I am sorry if I sound
>> grumpy
>>> at WMF regarding this subject; this is a problem but I know that there
>> are
>>> millions of problems and I don't expect a different project to be dropped
>>> in order to address this one.
>>>
>>> While it is a rough analogy, I think that this movie clip helps to
>>> illustrate a problem of bad bots. Although the clip is amusing, I am not
>>> amused by unwelcome bots causing problems on ENWP or anywhere else in the
>>> Wikiverse. https://www.youtube.com/watch?v=lokKpSrNqDA
>>>
>>> Thanks,
>>>
>>> Pine
>>> ( https://meta.wikimedia.org/wiki/User:Pine )
>>>
>>>
>>>
>>>> On Sat, Feb 9, 2019, 1:40 PM Pine W <[hidden email] wrote:
>>>>
>>>> OK. Yesterday I was looking with a few other ENWP people at what I
>> think
>>>> was a series of edits by either a vandal bot or an inadequately
>> designed
>>>> and unapproved good faith bot. I read that it made approximately 500
>>> edits
>>>> before someone who knew enough about ENWP saw what was happening and
>> did
>>>> something about it. I don't know how many problematic bots we have, in
>>>> addition to vandal bots, but I am confident that they drain a
>> nontrivial
>>>> amount of time from stewards, admins, and patrollers.
>>>>
>>>> I don't know how much of a priority WMF places on detecting and
>> stopping
>>>> unwelcome bots, but I think that the question of how to decrease the
>>>> numbers and effectiveness of unwelcome bots would be a good topic for
>> WMF
>>>> to research.
>>>>
>>>> Pine
>>>> ( https://meta.wikimedia.org/wiki/User:Pine )
>>>>
>>>>
>>>> On Sat, Feb 9, 2019 at 9:24 PM Gergo Tisza <[hidden email]>
>> wrote:
>>>>
>>>>>> On Fri, Feb 8, 2019 at 6:20 PM Pine W <[hidden email]> wrote:
>>>>>>
>>>>>> I don't know how practical it would be to implement an approach like
>>>>> this
>>>>>> in the Wikiverse, and whether licensing proprietary technology would
>>> be
>>>>>> required.
>>>>>>
>>>>>
>>>>> They are talking about Polyform [1], a reverse proxy that filters
>>> traffic
>>>>> with a combination of browser fingerprinting, behavior analysis and
>>> proof
>>>>> of work.
>>>>> Proof of work is not really useful unless you have huge levels of bot
>>>>> traffic from a single bot operator (also it means locking out users
>> with
>>>>> no
>>>>> Javascript); browser and behavior analysis very likely cannot be
>>>>> outsourced
>>>>> to a third party for privacy reasons. Maybe we could do it ourselves
>>>>> (although it would still bring up interesting questions privacy-wise)
>>> but
>>>>> it would be a huge undertaking.
>>>>>
>>>>>
>>>>> [1] https://www.kasada.io/product/
>>>>> _______________________________________________
>>>>> Wikitech-l mailing list
>>>>> [hidden email]
>>>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>>>
>>>>
>>> _______________________________________________
>>> Wikitech-l mailing list
>>> [hidden email]
>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>> _______________________________________________
>> Wikitech-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
>
> --
> Jonathan T. Morgan
> Senior Design Researcher
> Wikimedia Foundation
> User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: A potential new way to deal with spambots

Pine W
Thanks for the replies.

I think that detailed discussion of the pros and cons of the Tech Wishlist
should be separate from this thread, but I agree that one way to get a
subject like unflagged bot detection addressed could be through the Tech
Wishlist assuming that WMF is willing to devote resources to that topic if
it ranked in the top X places.

It sounds like there are a few different ways that work in this area could
be resourced:

1. As mentioned above, making it be a tech wishlist item and having
Community Tech work on it;
2. Having the Anti-Harrassment Tools team work on it;
3. Having the Security team work on it;
4. Having the ORES team work on it;
5. Funding work through a WMF grants program;
6. Funding through a mentorship program like GSOC. I believe that GSOC
previously supported work on CAPTCHA improvements.

Of the above options I suggest first considering 2 and 4. Having AHAT staff
work on unflagged bot detection might be scope creep under the existing
AHAT charter but perhaps AHAT's charter could be modified into something
that would resemble the charter for an "Administrators' Tools Team". And if
the ORES team has already done some work on unflagged bot detection then
perhaps ORES and AHAT staff could collaborate on this topic.

In the first half of the next WMF fiscal year, I think that planning for an
existing WMF team or combination of staff from existing teams to work on
unflagged bot detection would be good. If WMF does not resource this topic,
then if community people want unflagged bot detection be resourced, we can
consider other options such as 1 and 5.

Pine
( https://meta.wikimedia.org/wiki/User:Pine )
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: A potential new way to deal with spambots

David Barratt
http://gph.is/2lnp32Z

On Mon, Feb 11, 2019 at 5:19 PM Pine W <[hidden email]> wrote:

> Thanks for the replies.
>
> I think that detailed discussion of the pros and cons of the Tech Wishlist
> should be separate from this thread, but I agree that one way to get a
> subject like unflagged bot detection addressed could be through the Tech
> Wishlist assuming that WMF is willing to devote resources to that topic if
> it ranked in the top X places.
>
> It sounds like there are a few different ways that work in this area could
> be resourced:
>
> 1. As mentioned above, making it be a tech wishlist item and having
> Community Tech work on it;
> 2. Having the Anti-Harrassment Tools team work on it;
> 3. Having the Security team work on it;
> 4. Having the ORES team work on it;
> 5. Funding work through a WMF grants program;
> 6. Funding through a mentorship program like GSOC. I believe that GSOC
> previously supported work on CAPTCHA improvements.
>
> Of the above options I suggest first considering 2 and 4. Having AHAT staff
> work on unflagged bot detection might be scope creep under the existing
> AHAT charter but perhaps AHAT's charter could be modified into something
> that would resemble the charter for an "Administrators' Tools Team". And if
> the ORES team has already done some work on unflagged bot detection then
> perhaps ORES and AHAT staff could collaborate on this topic.
>
> In the first half of the next WMF fiscal year, I think that planning for an
> existing WMF team or combination of staff from existing teams to work on
> unflagged bot detection would be good. If WMF does not resource this topic,
> then if community people want unflagged bot detection be resourced, we can
> consider other options such as 1 and 5.
>
> Pine
> ( https://meta.wikimedia.org/wiki/User:Pine )
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: A potential new way to deal with spambots

bawolff
In reply to this post by Jonathan Morgan
The tech wishlist is awesome, and they do a lot of great work.

However, I don't think this type of democratic-driven development is
appropriate for all things. If it were we would just get rid of all the
other dev teams and just have a wish-list. In this case what is needed is
an anti-abuse strategy, not just a one-off feature. This involves
development of many features over the long term, maintenance, long-term
product management, integration into the whole etc. Even in real life,
nobody ever votes for maintenance until its way too late and everything is
about to explode. Not to mention the product research aspect of it -
wishlist inherently encourages people to think inside the box as it is
basically asking the question of what's wrong with the current box. You
can't vote for something if you don't realize its a choice.

As other's have mentioned, majority rules is also sometimes not the
appropriate way to choose what to do. Sometimes there are things that only
affect a minority, but its an important minority. Sometimes there are
things that affect everyone slightly and they win over things that affect a
small class significantly (Of course both types of things are important).
Sometimes there are things that are long term important but short term
unimportant [Not saying that people can't vote rationally for long term
tasks, just that the wishlist is mostly developed around the idea of short
term tasks, short enough you can do about 10 of them in a year].

--
Brian

On Mon, Feb 11, 2019 at 5:18 PM Jonathan Morgan <[hidden email]>
wrote:

> This may be naive, but... isn't the wishlist filling this need? And if not
> through a consensus-driven method like the wishlist, how should a WMF team
> prioritize which power user tools it needs to focus on?
>
> Or is just a matter of "Yes, wishlist, but more of it"?
>
> - Jonathan
>
> On Mon, Feb 11, 2019 at 2:34 AM bawolff <[hidden email]> wrote:
>
>> Sure its certainly a front we can do better on.
>>
>> I don't think Kasada is a product that's appropriate at this time.
>> Ignoring
>> the ideological aspect of it being non-free software, there's a lot of
>> easy
>> things we could and should try first.
>>
>> However, I'd caution against viewing this as purely a technical problem.
>> Wikimedia is not like other websites - we have allowable bots. For many
>> commercial websites, the only good bot is a dead bot. Wikimedia has many
>> good bots. On enwiki usually they have to be approved, I don't think
>> that's
>> true on all wikis. We also consider it perfectly ok to do limited testing
>> of bots before it is approved. We also encourage the creation of
>> alternative "clients", which from a server perspective looks like a bot.
>> Unlike other websites where anything non-human is evil, here we need to
>> ensure our blocking corresponds to social norms of the community. This may
>> sound not that hard, but I think it complicates botblocking more than is
>> obvious at first glance.
>>
>> Second, this sort of thing is something that tends to far through the
>> cracks at WMF. AFAIK the last time there was a team responsible for admin
>> tools & anti-abuse was 2013 (
>> https://www.mediawiki.org/wiki/Admin_tools_development). I believe
>> (correct
>> me if I'm wrong) that anti-harrasment team is all about human harassment
>> and not anti-abuse in this sense. Security is adjacent to this problem,
>> but
>> traditionally has not considered this problem in scope. Even core tools
>> like checkuser have been largely ignored by the foundation for many many
>> years.
>>
>> I guess this is a long winded way of saying - I think there should be a
>> team responsible for this sort of stuff at WMF, but there isn't one. I
>> think there's a lot of rather easy things we can try (Off the top of my
>> head: Better captchas. More adaptive rate limits that adjust based on how
>> evilish you look, etc), but they definitely require close involvement with
>> the community to ensure that we do the actual right thing.
>>
>> --
>> Brian
>> (p.s. Consider this a volunteer hat email)
>>
>> On Sun, Feb 10, 2019 at 6:06 AM Pine W <[hidden email]> wrote:
>>
>> > To clarify the types of unwelcome bots that we have, here are the ones
>> that
>> > I think are most common:
>> >
>> > 1) Spambots
>> >
>> > 2) Vandalbots
>> >
>> > 3) Unauthorized bots which may be intended to act in good faith but
>> which
>> > may cause problems that could probably have been identified during
>> standard
>> > testing in Wikimedia communities which have a relatively well developed
>> bot
>> > approval process. (See
>> > https://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval.)
>> >
>> > Maybe unwelcome bots are not a priority for WMF at the moment, in which
>> > case I could add this subject into a backlog. I am sorry if I sound
>> grumpy
>> > at WMF regarding this subject; this is a problem but I know that there
>> are
>> > millions of problems and I don't expect a different project to be
>> dropped
>> > in order to address this one.
>> >
>> > While it is a rough analogy, I think that this movie clip helps to
>> > illustrate a problem of bad bots. Although the clip is amusing, I am not
>> > amused by unwelcome bots causing problems on ENWP or anywhere else in
>> the
>> > Wikiverse. https://www.youtube.com/watch?v=lokKpSrNqDA
>> >
>> > Thanks,
>> >
>> > Pine
>> > ( https://meta.wikimedia.org/wiki/User:Pine )
>> >
>> >
>> >
>> > On Sat, Feb 9, 2019, 1:40 PM Pine W <[hidden email] wrote:
>> >
>> > > OK. Yesterday I was looking with a few other ENWP people at what I
>> think
>> > > was a series of edits by either a vandal bot or an inadequately
>> designed
>> > > and unapproved good faith bot. I read that it made approximately 500
>> > edits
>> > > before someone who knew enough about ENWP saw what was happening and
>> did
>> > > something about it. I don't know how many problematic bots we have, in
>> > > addition to vandal bots, but I am confident that they drain a
>> nontrivial
>> > > amount of time from stewards, admins, and patrollers.
>> > >
>> > > I don't know how much of a priority WMF places on detecting and
>> stopping
>> > > unwelcome bots, but I think that the question of how to decrease the
>> > > numbers and effectiveness of unwelcome bots would be a good topic for
>> WMF
>> > > to research.
>> > >
>> > > Pine
>> > > ( https://meta.wikimedia.org/wiki/User:Pine )
>> > >
>> > >
>> > > On Sat, Feb 9, 2019 at 9:24 PM Gergo Tisza <[hidden email]>
>> wrote:
>> > >
>> > >> On Fri, Feb 8, 2019 at 6:20 PM Pine W <[hidden email]> wrote:
>> > >>
>> > >> > I don't know how practical it would be to implement an approach
>> like
>> > >> this
>> > >> > in the Wikiverse, and whether licensing proprietary technology
>> would
>> > be
>> > >> > required.
>> > >> >
>> > >>
>> > >> They are talking about Polyform [1], a reverse proxy that filters
>> > traffic
>> > >> with a combination of browser fingerprinting, behavior analysis and
>> > proof
>> > >> of work.
>> > >> Proof of work is not really useful unless you have huge levels of bot
>> > >> traffic from a single bot operator (also it means locking out users
>> with
>> > >> no
>> > >> Javascript); browser and behavior analysis very likely cannot be
>> > >> outsourced
>> > >> to a third party for privacy reasons. Maybe we could do it ourselves
>> > >> (although it would still bring up interesting questions privacy-wise)
>> > but
>> > >> it would be a huge undertaking.
>> > >>
>> > >>
>> > >> [1] https://www.kasada.io/product/
>> > >> _______________________________________________
>> > >> Wikitech-l mailing list
>> > >> [hidden email]
>> > >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>> > >>
>> > >
>> > _______________________________________________
>> > Wikitech-l mailing list
>> > [hidden email]
>> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>> _______________________________________________
>> Wikitech-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
>
> --
> Jonathan T. Morgan
> Senior Design Researcher
> Wikimedia Foundation
> User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
>
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: A potential new way to deal with spambots

Pine W
In reply to this post by David Barratt
Hi David, do you have a question? I saw the GIF but I don't know how to
interpret it in the context of this conversation.

Thanks,

Pine
( https://meta.wikimedia.org/wiki/User:Pine )


On Tue, Feb 12, 2019 at 5:49 AM David Barratt <[hidden email]>
wrote:

> http://gph.is/2lnp32Z
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: A potential new way to deal with spambots

Jonathan Morgan
In reply to this post by Pine W
Couple thoughts:

1. ORES platform (ores.wikimedia.org) was designed to host a wide range of
machine learning models, not just the ones built by Aaron Halfaker himself.
So, if there is a computer scientist out there who is interested in
training and maintaining a new bot-detection model, it can be hosted on and
surfaced through ORES. Then anyone with some bot- or web-development skills
can build tools on top of that model. Noting this because that's one of the
main points of having a "scoring platform": it separates the (necessarily
WMF-led) work of production platform development from the development of
purpose-built tools.
2. If anyone knows a computer scientist who is interested in developing and
piloting a model like this please send them our way. Members of the
Research team, or Aaron, *may* have capacity to support a formal
collaboration
3. This seems way too complex for a GSOC project to me, but I'd love to be
wrong about that. If there are students who are interested in working on
this, please send them our way (no promises, obvs).
4. Modifying the charter of an existing WMF product team seems somewhat out
of scope for this ask, task, and venue. :)

- J

On Mon, Feb 11, 2019 at 2:19 PM Pine W <[hidden email]> wrote:

> Thanks for the replies.
>
> I think that detailed discussion of the pros and cons of the Tech Wishlist
> should be separate from this thread, but I agree that one way to get a
> subject like unflagged bot detection addressed could be through the Tech
> Wishlist assuming that WMF is willing to devote resources to that topic if
> it ranked in the top X places.
>
> It sounds like there are a few different ways that work in this area could
> be resourced:
>
> 1. As mentioned above, making it be a tech wishlist item and having
> Community Tech work on it;
> 2. Having the Anti-Harrassment Tools team work on it;
> 3. Having the Security team work on it;
> 4. Having the ORES team work on it;
> 5. Funding work through a WMF grants program;
> 6. Funding through a mentorship program like GSOC. I believe that GSOC
> previously supported work on CAPTCHA improvements.
>
> Of the above options I suggest first considering 2 and 4. Having AHAT staff
> work on unflagged bot detection might be scope creep under the existing
> AHAT charter but perhaps AHAT's charter could be modified into something
> that would resemble the charter for an "Administrators' Tools Team". And if
> the ORES team has already done some work on unflagged bot detection then
> perhaps ORES and AHAT staff could collaborate on this topic.
>
> In the first half of the next WMF fiscal year, I think that planning for an
> existing WMF team or combination of staff from existing teams to work on
> unflagged bot detection would be good. If WMF does not resource this topic,
> then if community people want unflagged bot detection be resourced, we can
> consider other options such as 1 and 5.
>
> Pine
> ( https://meta.wikimedia.org/wiki/User:Pine )
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



--
Jonathan T. Morgan
Senior Design Researcher
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: A potential new way to deal with spambots

Pine W
In reply to this post by bawolff
Since we're discussing how the Tech Wishlist works then I will comment on a
few points specifically regarding that wishlist.

1. A gentle correction: the recommendations are ranked by vote, not by
consensus. This has pros and cons.

2a. If memory serves me correctly, the wishlist process was designed by WMF
rather than designed by community consensus. I may be wrong about this, but
in my search of historical records I have not found evidence to the
contrary. I think that redesigning the process would be worth considering,
and I hope that a redesign would help to account for the types of needs
that bawolff described in his second paragraph.

2b.. I think that it's an overstatement to say that "nobody ever votes for
maintenance until its way too late and everything is about to explode". I
think that many non-WMF people are aware of our backlogs, the endless
requests for help and conflict resolution, and the many challenges of
maintaining what we have with the current population of skilled and good
faith non-WMF people. However, I have the impression that there is a common
*tendency* among humans in general to chase shiny new features instead of
doing mostly thankless work, and I agree that the tech wishlist is unlikely
even in a redesigned form to be well suited for long term planning. I think
that WMF's strategy process may be a better way to plan for the long term,
including for maintenance activities that are mostly thankless and do not
necessarily correlate with increasing someone's personal power, making
their resume look better, or having fun. Fortunately the volunteer
mentality of many non-WMF people means that we do have people who are
willing to do mostly thankless, mundane, and/or stressful work, and I think
that some of us feel that our work is important for maintaining the
encyclopedia even when we do not enjoy it, but we have a finite supply of
time from such people.

Pine
( https://meta.wikimedia.org/wiki/User:Pine )
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: A potential new way to deal with spambots

bawolff
I actually meant a different type of maintenance.

Maintaining the encyclopedia (and other wiki projects) is of course an
activity that needs software support.

But software is also something that needs maintenance. Technology,
standards, circumstances change over time. Software left alone will
"bitrot" over time. A long term technical strategy to do anything needs to
account for that, plan for that. One off feature development does not.
Democratically directed one-off feature development accounts for that even
less.

In response to Johnathan:
So lets say that ORES/magic AI detects something is a bot. Then what?
That's a small part of the picture. In fact you don't even need AI to do
this, plenty of the vandal bots have generic programming language
user-agents (AI could of course be useful for long-tail here, but there's
much simpler stuff to start off with). Do we expose this to abusefilter
somehow? Do we add a tag to mark it in RC/watchlist? Do we block it? Do we
rate limit it? What amount of false positives are acceptable? What is the
UI for all this? To what extent is this hard coded, and to what extent do
communities control the feature? etc

We don't need products to detect bots. Making products to detect bots is
easy. We need product managers to come up with socio-technical systems that
make sense in our special context.

--
Brian

On Tue, Feb 12, 2019 at 8:36 PM Pine W <[hidden email]> wrote:

> Since we're discussing how the Tech Wishlist works then I will comment on a
> few points specifically regarding that wishlist.
>
> 1. A gentle correction: the recommendations are ranked by vote, not by
> consensus. This has pros and cons.
>
> 2a. If memory serves me correctly, the wishlist process was designed by WMF
> rather than designed by community consensus. I may be wrong about this, but
> in my search of historical records I have not found evidence to the
> contrary. I think that redesigning the process would be worth considering,
> and I hope that a redesign would help to account for the types of needs
> that bawolff described in his second paragraph.
>
> 2b.. I think that it's an overstatement to say that "nobody ever votes for
> maintenance until its way too late and everything is about to explode". I
> think that many non-WMF people are aware of our backlogs, the endless
> requests for help and conflict resolution, and the many challenges of
> maintaining what we have with the current population of skilled and good
> faith non-WMF people. However, I have the impression that there is a common
> *tendency* among humans in general to chase shiny new features instead of
> doing mostly thankless work, and I agree that the tech wishlist is unlikely
> even in a redesigned form to be well suited for long term planning. I think
> that WMF's strategy process may be a better way to plan for the long term,
> including for maintenance activities that are mostly thankless and do not
> necessarily correlate with increasing someone's personal power, making
> their resume look better, or having fun. Fortunately the volunteer
> mentality of many non-WMF people means that we do have people who are
> willing to do mostly thankless, mundane, and/or stressful work, and I think
> that some of us feel that our work is important for maintaining the
> encyclopedia even when we do not enjoy it, but we have a finite supply of
> time from such people.
>
> Pine
> ( https://meta.wikimedia.org/wiki/User:Pine )
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: A potential new way to deal with spambots

John Erling Blad
In reply to this post by Aaron Halfaker-3
It is extremely easy to detect a bot unless the bot operator chose to make
it hard. Just make a model for how the user interacts with the input
devices, and do anomaly detection. That imply use of Javascript though, but
users not using JS are either very dubious or quite well-known. There are
nearly no new users that does not use JS.

Reused a previous tex-file, and did not clean it up? "Magnetic Normal Modes
of Bi-Component Permalloy Structures" ;)


On Mon, Feb 11, 2019 at 6:47 PM Aaron Halfaker <[hidden email]>
wrote:
>
> We've been working on unflagged bot detection on my team.  It's far from a
> real product integration, but we have shown that it works in practice.  We
> tested this in Wikidata, but I don't see a good reason why a similar
> strategy wouldn't work for English Wikipedia.
>
> Hall, A., Terveen, L., & Halfaker, A. (2018). Bot Detection in Wikidata
> Using Behavioral and Other Informal Cues.
> *Proceedings of the ACM on Human-Computer Interaction*, *2*(CSCW), 64.
pdf
> <https://dl.acm.org/ft_gateway.cfm?id=3274333&type=pdf>
>
> In theory, we could get this into ORES if there was strong demand.  As
Pine

> points out, we'd need to delay some other projects.  For reference, the
> next thing on the backlog that I'm looking at is setting article quality
> prediction for Swedish Wikipedia.
>
> -Aaron
>
> On Mon, Feb 11, 2019 at 11:19 AM Jonathan Morgan <[hidden email]>
> wrote:
>
> > This may be naive, but... isn't the wishlist filling this need? And if
not
> > through a consensus-driven method like the wishlist, how should a WMF
team

> > prioritize which power user tools it needs to focus on?
> >
> > Or is just a matter of "Yes, wishlist, but more of it"?
> >
> > - Jonathan
> >
> > On Mon, Feb 11, 2019 at 2:34 AM bawolff <[hidden email]> wrote:
> >
> > > Sure its certainly a front we can do better on.
> > >
> > > I don't think Kasada is a product that's appropriate at this time.
> > Ignoring
> > > the ideological aspect of it being non-free software, there's a lot of
> > easy
> > > things we could and should try first.
> > >
> > > However, I'd caution against viewing this as purely a technical
problem.
> > > Wikimedia is not like other websites - we have allowable bots. For
many
> > > commercial websites, the only good bot is a dead bot. Wikimedia has
many
> > > good bots. On enwiki usually they have to be approved, I don't think
> > that's
> > > true on all wikis. We also consider it perfectly ok to do limited
testing
> > > of bots before it is approved. We also encourage the creation of
> > > alternative "clients", which from a server perspective looks like a
bot.
> > > Unlike other websites where anything non-human is evil, here we need
to
> > > ensure our blocking corresponds to social norms of the community. This
> > may
> > > sound not that hard, but I think it complicates botblocking more than
is
> > > obvious at first glance.
> > >
> > > Second, this sort of thing is something that tends to far through the
> > > cracks at WMF. AFAIK the last time there was a team responsible for
admin
> > > tools & anti-abuse was 2013 (
> > > https://www.mediawiki.org/wiki/Admin_tools_development). I believe
> > > (correct
> > > me if I'm wrong) that anti-harrasment team is all about human
harassment
> > > and not anti-abuse in this sense. Security is adjacent to this
problem,
> > but
> > > traditionally has not considered this problem in scope. Even core
tools
> > > like checkuser have been largely ignored by the foundation for many
many
> > > years.
> > >
> > > I guess this is a long winded way of saying - I think there should be
a
> > > team responsible for this sort of stuff at WMF, but there isn't one. I
> > > think there's a lot of rather easy things we can try (Off the top of
my
> > > head: Better captchas. More adaptive rate limits that adjust based on
how

> > > evilish you look, etc), but they definitely require close involvement
> > with
> > > the community to ensure that we do the actual right thing.
> > >
> > > --
> > > Brian
> > > (p.s. Consider this a volunteer hat email)
> > >
> > > On Sun, Feb 10, 2019 at 6:06 AM Pine W <[hidden email]> wrote:
> > >
> > > > To clarify the types of unwelcome bots that we have, here are the
ones

> > > that
> > > > I think are most common:
> > > >
> > > > 1) Spambots
> > > >
> > > > 2) Vandalbots
> > > >
> > > > 3) Unauthorized bots which may be intended to act in good faith but
> > which
> > > > may cause problems that could probably have been identified during
> > > standard
> > > > testing in Wikimedia communities which have a relatively well
developed
> > > bot
> > > > approval process. (See
> > > > https://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval.)
> > > >
> > > > Maybe unwelcome bots are not a priority for WMF at the moment, in
which
> > > > case I could add this subject into a backlog. I am sorry if I sound
> > > grumpy
> > > > at WMF regarding this subject; this is a problem but I know that
there
> > > are
> > > > millions of problems and I don't expect a different project to be
> > dropped
> > > > in order to address this one.
> > > >
> > > > While it is a rough analogy, I think that this movie clip helps to
> > > > illustrate a problem of bad bots. Although the clip is amusing, I am
> > not
> > > > amused by unwelcome bots causing problems on ENWP or anywhere else
in

> > the
> > > > Wikiverse. https://www.youtube.com/watch?v=lokKpSrNqDA
> > > >
> > > > Thanks,
> > > >
> > > > Pine
> > > > ( https://meta.wikimedia.org/wiki/User:Pine )
> > > >
> > > >
> > > >
> > > > On Sat, Feb 9, 2019, 1:40 PM Pine W <[hidden email] wrote:
> > > >
> > > > > OK. Yesterday I was looking with a few other ENWP people at what I
> > > think
> > > > > was a series of edits by either a vandal bot or an inadequately
> > > designed
> > > > > and unapproved good faith bot. I read that it made approximately
500
> > > > edits
> > > > > before someone who knew enough about ENWP saw what was happening
and
> > > did
> > > > > something about it. I don't know how many problematic bots we
have,
> > in
> > > > > addition to vandal bots, but I am confident that they drain a
> > > nontrivial
> > > > > amount of time from stewards, admins, and patrollers.
> > > > >
> > > > > I don't know how much of a priority WMF places on detecting and
> > > stopping
> > > > > unwelcome bots, but I think that the question of how to decrease
the
> > > > > numbers and effectiveness of unwelcome bots would be a good topic
for

> > > WMF
> > > > > to research.
> > > > >
> > > > > Pine
> > > > > ( https://meta.wikimedia.org/wiki/User:Pine )
> > > > >
> > > > >
> > > > > On Sat, Feb 9, 2019 at 9:24 PM Gergo Tisza <[hidden email]>
> > > wrote:
> > > > >
> > > > >> On Fri, Feb 8, 2019 at 6:20 PM Pine W <[hidden email]>
wrote:

> > > > >>
> > > > >> > I don't know how practical it would be to implement an approach
> > like
> > > > >> this
> > > > >> > in the Wikiverse, and whether licensing proprietary technology
> > would
> > > > be
> > > > >> > required.
> > > > >> >
> > > > >>
> > > > >> They are talking about Polyform [1], a reverse proxy that filters
> > > > traffic
> > > > >> with a combination of browser fingerprinting, behavior analysis
and
> > > > proof
> > > > >> of work.
> > > > >> Proof of work is not really useful unless you have huge levels of
> > bot
> > > > >> traffic from a single bot operator (also it means locking out
users
> > > with
> > > > >> no
> > > > >> Javascript); browser and behavior analysis very likely cannot be
> > > > >> outsourced
> > > > >> to a third party for privacy reasons. Maybe we could do it
ourselves

> > > > >> (although it would still bring up interesting questions
> > privacy-wise)
> > > > but
> > > > >> it would be a huge undertaking.
> > > > >>
> > > > >>
> > > > >> [1] https://www.kasada.io/product/
> > > > >> _______________________________________________
> > > > >> Wikitech-l mailing list
> > > > >> [hidden email]
> > > > >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > > > >>
> > > > >
> > > > _______________________________________________
> > > > Wikitech-l mailing list
> > > > [hidden email]
> > > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > > _______________________________________________
> > > Wikitech-l mailing list
> > > [hidden email]
> > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >
> >
> >
> > --
> > Jonathan T. Morgan
> > Senior Design Researcher
> > Wikimedia Foundation
> > User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
> > _______________________________________________
> > Wikitech-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
>
> --
>
> Aaron Halfaker
>
> Principal Research Scientist
>
> Head of the Scoring Platform team
> Wikimedia Foundation
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: A potential new way to deal with spambots

Jonathan Morgan
In reply to this post by bawolff
Brian,

I think we may be talking past each other. I'm Mr. Socio-technical systems.
I thought what was being requested was a way to detect bots.

I maintain my own bots, work extensively with product teams, and have a
deep and abiding familiarity with the complexity of designing effective
tools for WIkipedia.

- J

On Wed, Feb 13, 2019 at 4:14 AM bawolff <[hidden email]> wrote:

> I actually meant a different type of maintenance.
>
> Maintaining the encyclopedia (and other wiki projects) is of course an
> activity that needs software support.
>
> But software is also something that needs maintenance. Technology,
> standards, circumstances change over time. Software left alone will
> "bitrot" over time. A long term technical strategy to do anything needs to
> account for that, plan for that. One off feature development does not.
> Democratically directed one-off feature development accounts for that even
> less.
>
> In response to Johnathan:
> So lets say that ORES/magic AI detects something is a bot. Then what?
> That's a small part of the picture. In fact you don't even need AI to do
> this, plenty of the vandal bots have generic programming language
> user-agents (AI could of course be useful for long-tail here, but there's
> much simpler stuff to start off with). Do we expose this to abusefilter
> somehow? Do we add a tag to mark it in RC/watchlist? Do we block it? Do we
> rate limit it? What amount of false positives are acceptable? What is the
> UI for all this? To what extent is this hard coded, and to what extent do
> communities control the feature? etc
>
> We don't need products to detect bots. Making products to detect bots is
> easy. We need product managers to come up with socio-technical systems that
> make sense in our special context.
>
> --
> Brian
>
> On Tue, Feb 12, 2019 at 8:36 PM Pine W <[hidden email]> wrote:
>
> > Since we're discussing how the Tech Wishlist works then I will comment
> on a
> > few points specifically regarding that wishlist.
> >
> > 1. A gentle correction: the recommendations are ranked by vote, not by
> > consensus. This has pros and cons.
> >
> > 2a. If memory serves me correctly, the wishlist process was designed by
> WMF
> > rather than designed by community consensus. I may be wrong about this,
> but
> > in my search of historical records I have not found evidence to the
> > contrary. I think that redesigning the process would be worth
> considering,
> > and I hope that a redesign would help to account for the types of needs
> > that bawolff described in his second paragraph.
> >
> > 2b.. I think that it's an overstatement to say that "nobody ever votes
> for
> > maintenance until its way too late and everything is about to explode". I
> > think that many non-WMF people are aware of our backlogs, the endless
> > requests for help and conflict resolution, and the many challenges of
> > maintaining what we have with the current population of skilled and good
> > faith non-WMF people. However, I have the impression that there is a
> common
> > *tendency* among humans in general to chase shiny new features instead of
> > doing mostly thankless work, and I agree that the tech wishlist is
> unlikely
> > even in a redesigned form to be well suited for long term planning. I
> think
> > that WMF's strategy process may be a better way to plan for the long
> term,
> > including for maintenance activities that are mostly thankless and do not
> > necessarily correlate with increasing someone's personal power, making
> > their resume look better, or having fun. Fortunately the volunteer
> > mentality of many non-WMF people means that we do have people who are
> > willing to do mostly thankless, mundane, and/or stressful work, and I
> think
> > that some of us feel that our work is important for maintaining the
> > encyclopedia even when we do not enjoy it, but we have a finite supply of
> > time from such people.
> >
> > Pine
> > ( https://meta.wikimedia.org/wiki/User:Pine )
> > _______________________________________________
> > Wikitech-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



--
Jonathan T. Morgan
Senior Design Researcher
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: A potential new way to deal with spambots

Pine W
In reply to this post by bawolff
On Wed, Feb 13, 2019 at 12:13 PM bawolff <[hidden email]> wrote:

> I actually meant a different type of maintenance.
>
> Maintaining the encyclopedia (and other wiki projects) is of course an
> activity that needs software support.
>
> But software is also something that needs maintenance. Technology,
> standards, circumstances change over time. Software left alone will
> "bitrot" over time. A long term technical strategy to do anything needs to
> account for that, plan for that. One off feature development does not.
> Democratically directed one-off feature development accounts for that even
> less.
>

I understand. I was intending to comment on maintenance activities in
general, whether that be maintenance of a city's water system, maintenance
of the text of encyclopedia articles, or maintenance of software. My train
of thought proceeded into a somewhat detailed commentary detail regarding
maintenance of non-software Wikimedia elements. I think that the tendency
to under-resource maintenance in favor of novelties is similar in many
domains of human activity, but I also think that humans collectively are
not so unwise that we will prefer novelties over maintenance every time
that there is a referendum on whether to maintain an existing service or to
create something new. {{Citation needed}}

I think that multiple good points have been raised in this thread regarding
the subjects of technical and human systems for detecting and intervening
against possible unflagged bots. I am wondering what a good way would be to
get a WMF product manager or someone similar to dedicate time to this
topic. My preference remains that one or more WMF people, or teams, add
this to their list of topics to address in a future quarter such as Q1 of
the WMF 2019-2020 fiscal year. I don't know how the WMF Community Tech team
plans for maintenance of features after the features are initially built,
debugged, and deployed, and based on the current state of this discussion I
don't currently have a strong opinion regarding whether Community Tech or a
different team would be best suited to work on the topic of unflagged bots.
I also don't know how WMF makes decisions about what goals are for teams
other than Community Tech for future quarters, but that information could
be helpful to have for this conversation.

Thanks,

Pine
( https://meta.wikimedia.org/wiki/User:Pine )
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l