This sounds like an interesting potential approach to deal with spambots,
and hopefully to deter the people who make them. https://techcrunch.com/2019/02/05/kasada-bots/ I don't know how practical it would be to implement an approach like this in the Wikiverse, and whether licensing proprietary technology would be required. I would be interested in decreasing the quantity and effectiveness of spambots that misuse WMF infrastructure, damage the quality of Wikimedia content, and drain significant cumulative time from the limited supply of good faith contributors. Pine ( https://meta.wikimedia.org/wiki/User:Pine ) _______________________________________________ Wikitech-l mailing list [hidden email] https://lists.wikimedia.org/mailman/listinfo/wikitech-l |
On Fri, Feb 8, 2019 at 6:20 PM Pine W <[hidden email]> wrote:
> I don't know how practical it would be to implement an approach like this > in the Wikiverse, and whether licensing proprietary technology would be > required. > They are talking about Polyform [1], a reverse proxy that filters traffic with a combination of browser fingerprinting, behavior analysis and proof of work. Proof of work is not really useful unless you have huge levels of bot traffic from a single bot operator (also it means locking out users with no Javascript); browser and behavior analysis very likely cannot be outsourced to a third party for privacy reasons. Maybe we could do it ourselves (although it would still bring up interesting questions privacy-wise) but it would be a huge undertaking. [1] https://www.kasada.io/product/ _______________________________________________ Wikitech-l mailing list [hidden email] https://lists.wikimedia.org/mailman/listinfo/wikitech-l |
OK. Yesterday I was looking with a few other ENWP people at what I think
was a series of edits by either a vandal bot or an inadequately designed and unapproved good faith bot. I read that it made approximately 500 edits before someone who knew enough about ENWP saw what was happening and did something about it. I don't know how many problematic bots we have, in addition to vandal bots, but I am confident that they drain a nontrivial amount of time from stewards, admins, and patrollers. I don't know how much of a priority WMF places on detecting and stopping unwelcome bots, but I think that the question of how to decrease the numbers and effectiveness of unwelcome bots would be a good topic for WMF to research. Pine ( https://meta.wikimedia.org/wiki/User:Pine ) On Sat, Feb 9, 2019 at 9:24 PM Gergo Tisza <[hidden email]> wrote: > On Fri, Feb 8, 2019 at 6:20 PM Pine W <[hidden email]> wrote: > > > I don't know how practical it would be to implement an approach like this > > in the Wikiverse, and whether licensing proprietary technology would be > > required. > > > > They are talking about Polyform [1], a reverse proxy that filters traffic > with a combination of browser fingerprinting, behavior analysis and proof > of work. > Proof of work is not really useful unless you have huge levels of bot > traffic from a single bot operator (also it means locking out users with no > Javascript); browser and behavior analysis very likely cannot be outsourced > to a third party for privacy reasons. Maybe we could do it ourselves > (although it would still bring up interesting questions privacy-wise) but > it would be a huge undertaking. > > > [1] https://www.kasada.io/product/ > _______________________________________________ > Wikitech-l mailing list > [hidden email] > https://lists.wikimedia.org/mailman/listinfo/wikitech-l Wikitech-l mailing list [hidden email] https://lists.wikimedia.org/mailman/listinfo/wikitech-l |
To clarify the types of unwelcome bots that we have, here are the ones that
I think are most common: 1) Spambots 2) Vandalbots 3) Unauthorized bots which may be intended to act in good faith but which may cause problems that could probably have been identified during standard testing in Wikimedia communities which have a relatively well developed bot approval process. (See https://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval.) Maybe unwelcome bots are not a priority for WMF at the moment, in which case I could add this subject into a backlog. I am sorry if I sound grumpy at WMF regarding this subject; this is a problem but I know that there are millions of problems and I don't expect a different project to be dropped in order to address this one. While it is a rough analogy, I think that this movie clip helps to illustrate a problem of bad bots. Although the clip is amusing, I am not amused by unwelcome bots causing problems on ENWP or anywhere else in the Wikiverse. https://www.youtube.com/watch?v=lokKpSrNqDA Thanks, Pine ( https://meta.wikimedia.org/wiki/User:Pine ) On Sat, Feb 9, 2019, 1:40 PM Pine W <[hidden email] wrote: > OK. Yesterday I was looking with a few other ENWP people at what I think > was a series of edits by either a vandal bot or an inadequately designed > and unapproved good faith bot. I read that it made approximately 500 edits > before someone who knew enough about ENWP saw what was happening and did > something about it. I don't know how many problematic bots we have, in > addition to vandal bots, but I am confident that they drain a nontrivial > amount of time from stewards, admins, and patrollers. > > I don't know how much of a priority WMF places on detecting and stopping > unwelcome bots, but I think that the question of how to decrease the > numbers and effectiveness of unwelcome bots would be a good topic for WMF > to research. > > Pine > ( https://meta.wikimedia.org/wiki/User:Pine ) > > > On Sat, Feb 9, 2019 at 9:24 PM Gergo Tisza <[hidden email]> wrote: > >> On Fri, Feb 8, 2019 at 6:20 PM Pine W <[hidden email]> wrote: >> >> > I don't know how practical it would be to implement an approach like >> this >> > in the Wikiverse, and whether licensing proprietary technology would be >> > required. >> > >> >> They are talking about Polyform [1], a reverse proxy that filters traffic >> with a combination of browser fingerprinting, behavior analysis and proof >> of work. >> Proof of work is not really useful unless you have huge levels of bot >> traffic from a single bot operator (also it means locking out users with >> no >> Javascript); browser and behavior analysis very likely cannot be >> outsourced >> to a third party for privacy reasons. Maybe we could do it ourselves >> (although it would still bring up interesting questions privacy-wise) but >> it would be a huge undertaking. >> >> >> [1] https://www.kasada.io/product/ >> _______________________________________________ >> Wikitech-l mailing list >> [hidden email] >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l >> > Wikitech-l mailing list [hidden email] https://lists.wikimedia.org/mailman/listinfo/wikitech-l |
Sure its certainly a front we can do better on.
I don't think Kasada is a product that's appropriate at this time. Ignoring the ideological aspect of it being non-free software, there's a lot of easy things we could and should try first. However, I'd caution against viewing this as purely a technical problem. Wikimedia is not like other websites - we have allowable bots. For many commercial websites, the only good bot is a dead bot. Wikimedia has many good bots. On enwiki usually they have to be approved, I don't think that's true on all wikis. We also consider it perfectly ok to do limited testing of bots before it is approved. We also encourage the creation of alternative "clients", which from a server perspective looks like a bot. Unlike other websites where anything non-human is evil, here we need to ensure our blocking corresponds to social norms of the community. This may sound not that hard, but I think it complicates botblocking more than is obvious at first glance. Second, this sort of thing is something that tends to far through the cracks at WMF. AFAIK the last time there was a team responsible for admin tools & anti-abuse was 2013 ( https://www.mediawiki.org/wiki/Admin_tools_development). I believe (correct me if I'm wrong) that anti-harrasment team is all about human harassment and not anti-abuse in this sense. Security is adjacent to this problem, but traditionally has not considered this problem in scope. Even core tools like checkuser have been largely ignored by the foundation for many many years. I guess this is a long winded way of saying - I think there should be a team responsible for this sort of stuff at WMF, but there isn't one. I think there's a lot of rather easy things we can try (Off the top of my head: Better captchas. More adaptive rate limits that adjust based on how evilish you look, etc), but they definitely require close involvement with the community to ensure that we do the actual right thing. -- Brian (p.s. Consider this a volunteer hat email) On Sun, Feb 10, 2019 at 6:06 AM Pine W <[hidden email]> wrote: > To clarify the types of unwelcome bots that we have, here are the ones that > I think are most common: > > 1) Spambots > > 2) Vandalbots > > 3) Unauthorized bots which may be intended to act in good faith but which > may cause problems that could probably have been identified during standard > testing in Wikimedia communities which have a relatively well developed bot > approval process. (See > https://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval.) > > Maybe unwelcome bots are not a priority for WMF at the moment, in which > case I could add this subject into a backlog. I am sorry if I sound grumpy > at WMF regarding this subject; this is a problem but I know that there are > millions of problems and I don't expect a different project to be dropped > in order to address this one. > > While it is a rough analogy, I think that this movie clip helps to > illustrate a problem of bad bots. Although the clip is amusing, I am not > amused by unwelcome bots causing problems on ENWP or anywhere else in the > Wikiverse. https://www.youtube.com/watch?v=lokKpSrNqDA > > Thanks, > > Pine > ( https://meta.wikimedia.org/wiki/User:Pine ) > > > > On Sat, Feb 9, 2019, 1:40 PM Pine W <[hidden email] wrote: > > > OK. Yesterday I was looking with a few other ENWP people at what I think > > was a series of edits by either a vandal bot or an inadequately designed > > and unapproved good faith bot. I read that it made approximately 500 > edits > > before someone who knew enough about ENWP saw what was happening and did > > something about it. I don't know how many problematic bots we have, in > > addition to vandal bots, but I am confident that they drain a nontrivial > > amount of time from stewards, admins, and patrollers. > > > > I don't know how much of a priority WMF places on detecting and stopping > > unwelcome bots, but I think that the question of how to decrease the > > numbers and effectiveness of unwelcome bots would be a good topic for WMF > > to research. > > > > Pine > > ( https://meta.wikimedia.org/wiki/User:Pine ) > > > > > > On Sat, Feb 9, 2019 at 9:24 PM Gergo Tisza <[hidden email]> wrote: > > > >> On Fri, Feb 8, 2019 at 6:20 PM Pine W <[hidden email]> wrote: > >> > >> > I don't know how practical it would be to implement an approach like > >> this > >> > in the Wikiverse, and whether licensing proprietary technology would > be > >> > required. > >> > > >> > >> They are talking about Polyform [1], a reverse proxy that filters > traffic > >> with a combination of browser fingerprinting, behavior analysis and > proof > >> of work. > >> Proof of work is not really useful unless you have huge levels of bot > >> traffic from a single bot operator (also it means locking out users with > >> no > >> Javascript); browser and behavior analysis very likely cannot be > >> outsourced > >> to a third party for privacy reasons. Maybe we could do it ourselves > >> (although it would still bring up interesting questions privacy-wise) > but > >> it would be a huge undertaking. > >> > >> > >> [1] https://www.kasada.io/product/ > >> _______________________________________________ > >> Wikitech-l mailing list > >> [hidden email] > >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l > >> > > > _______________________________________________ > Wikitech-l mailing list > [hidden email] > https://lists.wikimedia.org/mailman/listinfo/wikitech-l Wikitech-l mailing list [hidden email] https://lists.wikimedia.org/mailman/listinfo/wikitech-l |
This may be naive, but... isn't the wishlist filling this need? And if not
through a consensus-driven method like the wishlist, how should a WMF team prioritize which power user tools it needs to focus on? Or is just a matter of "Yes, wishlist, but more of it"? - Jonathan On Mon, Feb 11, 2019 at 2:34 AM bawolff <[hidden email]> wrote: > Sure its certainly a front we can do better on. > > I don't think Kasada is a product that's appropriate at this time. Ignoring > the ideological aspect of it being non-free software, there's a lot of easy > things we could and should try first. > > However, I'd caution against viewing this as purely a technical problem. > Wikimedia is not like other websites - we have allowable bots. For many > commercial websites, the only good bot is a dead bot. Wikimedia has many > good bots. On enwiki usually they have to be approved, I don't think that's > true on all wikis. We also consider it perfectly ok to do limited testing > of bots before it is approved. We also encourage the creation of > alternative "clients", which from a server perspective looks like a bot. > Unlike other websites where anything non-human is evil, here we need to > ensure our blocking corresponds to social norms of the community. This may > sound not that hard, but I think it complicates botblocking more than is > obvious at first glance. > > Second, this sort of thing is something that tends to far through the > cracks at WMF. AFAIK the last time there was a team responsible for admin > tools & anti-abuse was 2013 ( > https://www.mediawiki.org/wiki/Admin_tools_development). I believe > (correct > me if I'm wrong) that anti-harrasment team is all about human harassment > and not anti-abuse in this sense. Security is adjacent to this problem, but > traditionally has not considered this problem in scope. Even core tools > like checkuser have been largely ignored by the foundation for many many > years. > > I guess this is a long winded way of saying - I think there should be a > team responsible for this sort of stuff at WMF, but there isn't one. I > think there's a lot of rather easy things we can try (Off the top of my > head: Better captchas. More adaptive rate limits that adjust based on how > evilish you look, etc), but they definitely require close involvement with > the community to ensure that we do the actual right thing. > > -- > Brian > (p.s. Consider this a volunteer hat email) > > On Sun, Feb 10, 2019 at 6:06 AM Pine W <[hidden email]> wrote: > > > To clarify the types of unwelcome bots that we have, here are the ones > that > > I think are most common: > > > > 1) Spambots > > > > 2) Vandalbots > > > > 3) Unauthorized bots which may be intended to act in good faith but which > > may cause problems that could probably have been identified during > standard > > testing in Wikimedia communities which have a relatively well developed > bot > > approval process. (See > > https://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval.) > > > > Maybe unwelcome bots are not a priority for WMF at the moment, in which > > case I could add this subject into a backlog. I am sorry if I sound > grumpy > > at WMF regarding this subject; this is a problem but I know that there > are > > millions of problems and I don't expect a different project to be dropped > > in order to address this one. > > > > While it is a rough analogy, I think that this movie clip helps to > > illustrate a problem of bad bots. Although the clip is amusing, I am not > > amused by unwelcome bots causing problems on ENWP or anywhere else in the > > Wikiverse. https://www.youtube.com/watch?v=lokKpSrNqDA > > > > Thanks, > > > > Pine > > ( https://meta.wikimedia.org/wiki/User:Pine ) > > > > > > > > On Sat, Feb 9, 2019, 1:40 PM Pine W <[hidden email] wrote: > > > > > OK. Yesterday I was looking with a few other ENWP people at what I > think > > > was a series of edits by either a vandal bot or an inadequately > designed > > > and unapproved good faith bot. I read that it made approximately 500 > > edits > > > before someone who knew enough about ENWP saw what was happening and > did > > > something about it. I don't know how many problematic bots we have, in > > > addition to vandal bots, but I am confident that they drain a > nontrivial > > > amount of time from stewards, admins, and patrollers. > > > > > > I don't know how much of a priority WMF places on detecting and > stopping > > > unwelcome bots, but I think that the question of how to decrease the > > > numbers and effectiveness of unwelcome bots would be a good topic for > WMF > > > to research. > > > > > > Pine > > > ( https://meta.wikimedia.org/wiki/User:Pine ) > > > > > > > > > On Sat, Feb 9, 2019 at 9:24 PM Gergo Tisza <[hidden email]> > wrote: > > > > > >> On Fri, Feb 8, 2019 at 6:20 PM Pine W <[hidden email]> wrote: > > >> > > >> > I don't know how practical it would be to implement an approach like > > >> this > > >> > in the Wikiverse, and whether licensing proprietary technology would > > be > > >> > required. > > >> > > > >> > > >> They are talking about Polyform [1], a reverse proxy that filters > > traffic > > >> with a combination of browser fingerprinting, behavior analysis and > > proof > > >> of work. > > >> Proof of work is not really useful unless you have huge levels of bot > > >> traffic from a single bot operator (also it means locking out users > with > > >> no > > >> Javascript); browser and behavior analysis very likely cannot be > > >> outsourced > > >> to a third party for privacy reasons. Maybe we could do it ourselves > > >> (although it would still bring up interesting questions privacy-wise) > > but > > >> it would be a huge undertaking. > > >> > > >> > > >> [1] https://www.kasada.io/product/ > > >> _______________________________________________ > > >> Wikitech-l mailing list > > >> [hidden email] > > >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l > > >> > > > > > _______________________________________________ > > Wikitech-l mailing list > > [hidden email] > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > _______________________________________________ > Wikitech-l mailing list > [hidden email] > https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- Jonathan T. Morgan Senior Design Researcher Wikimedia Foundation User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)> _______________________________________________ Wikitech-l mailing list [hidden email] https://lists.wikimedia.org/mailman/listinfo/wikitech-l |
We've been working on unflagged bot detection on my team. It's far from a
real product integration, but we have shown that it works in practice. We tested this in Wikidata, but I don't see a good reason why a similar strategy wouldn't work for English Wikipedia. Hall, A., Terveen, L., & Halfaker, A. (2018). Bot Detection in Wikidata Using Behavioral and Other Informal Cues. *Proceedings of the ACM on Human-Computer Interaction*, *2*(CSCW), 64. pdf <https://dl.acm.org/ft_gateway.cfm?id=3274333&type=pdf> In theory, we could get this into ORES if there was strong demand. As Pine points out, we'd need to delay some other projects. For reference, the next thing on the backlog that I'm looking at is setting article quality prediction for Swedish Wikipedia. -Aaron On Mon, Feb 11, 2019 at 11:19 AM Jonathan Morgan <[hidden email]> wrote: > This may be naive, but... isn't the wishlist filling this need? And if not > through a consensus-driven method like the wishlist, how should a WMF team > prioritize which power user tools it needs to focus on? > > Or is just a matter of "Yes, wishlist, but more of it"? > > - Jonathan > > On Mon, Feb 11, 2019 at 2:34 AM bawolff <[hidden email]> wrote: > > > Sure its certainly a front we can do better on. > > > > I don't think Kasada is a product that's appropriate at this time. > Ignoring > > the ideological aspect of it being non-free software, there's a lot of > easy > > things we could and should try first. > > > > However, I'd caution against viewing this as purely a technical problem. > > Wikimedia is not like other websites - we have allowable bots. For many > > commercial websites, the only good bot is a dead bot. Wikimedia has many > > good bots. On enwiki usually they have to be approved, I don't think > that's > > true on all wikis. We also consider it perfectly ok to do limited testing > > of bots before it is approved. We also encourage the creation of > > alternative "clients", which from a server perspective looks like a bot. > > Unlike other websites where anything non-human is evil, here we need to > > ensure our blocking corresponds to social norms of the community. This > may > > sound not that hard, but I think it complicates botblocking more than is > > obvious at first glance. > > > > Second, this sort of thing is something that tends to far through the > > cracks at WMF. AFAIK the last time there was a team responsible for admin > > tools & anti-abuse was 2013 ( > > https://www.mediawiki.org/wiki/Admin_tools_development). I believe > > (correct > > me if I'm wrong) that anti-harrasment team is all about human harassment > > and not anti-abuse in this sense. Security is adjacent to this problem, > but > > traditionally has not considered this problem in scope. Even core tools > > like checkuser have been largely ignored by the foundation for many many > > years. > > > > I guess this is a long winded way of saying - I think there should be a > > team responsible for this sort of stuff at WMF, but there isn't one. I > > think there's a lot of rather easy things we can try (Off the top of my > > head: Better captchas. More adaptive rate limits that adjust based on how > > evilish you look, etc), but they definitely require close involvement > with > > the community to ensure that we do the actual right thing. > > > > -- > > Brian > > (p.s. Consider this a volunteer hat email) > > > > On Sun, Feb 10, 2019 at 6:06 AM Pine W <[hidden email]> wrote: > > > > > To clarify the types of unwelcome bots that we have, here are the ones > > that > > > I think are most common: > > > > > > 1) Spambots > > > > > > 2) Vandalbots > > > > > > 3) Unauthorized bots which may be intended to act in good faith but > which > > > may cause problems that could probably have been identified during > > standard > > > testing in Wikimedia communities which have a relatively well developed > > bot > > > approval process. (See > > > https://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval.) > > > > > > Maybe unwelcome bots are not a priority for WMF at the moment, in which > > > case I could add this subject into a backlog. I am sorry if I sound > > grumpy > > > at WMF regarding this subject; this is a problem but I know that there > > are > > > millions of problems and I don't expect a different project to be > dropped > > > in order to address this one. > > > > > > While it is a rough analogy, I think that this movie clip helps to > > > illustrate a problem of bad bots. Although the clip is amusing, I am > not > > > amused by unwelcome bots causing problems on ENWP or anywhere else in > the > > > Wikiverse. https://www.youtube.com/watch?v=lokKpSrNqDA > > > > > > Thanks, > > > > > > Pine > > > ( https://meta.wikimedia.org/wiki/User:Pine ) > > > > > > > > > > > > On Sat, Feb 9, 2019, 1:40 PM Pine W <[hidden email] wrote: > > > > > > > OK. Yesterday I was looking with a few other ENWP people at what I > > think > > > > was a series of edits by either a vandal bot or an inadequately > > designed > > > > and unapproved good faith bot. I read that it made approximately 500 > > > edits > > > > before someone who knew enough about ENWP saw what was happening and > > did > > > > something about it. I don't know how many problematic bots we have, > in > > > > addition to vandal bots, but I am confident that they drain a > > nontrivial > > > > amount of time from stewards, admins, and patrollers. > > > > > > > > I don't know how much of a priority WMF places on detecting and > > stopping > > > > unwelcome bots, but I think that the question of how to decrease the > > > > numbers and effectiveness of unwelcome bots would be a good topic for > > WMF > > > > to research. > > > > > > > > Pine > > > > ( https://meta.wikimedia.org/wiki/User:Pine ) > > > > > > > > > > > > On Sat, Feb 9, 2019 at 9:24 PM Gergo Tisza <[hidden email]> > > wrote: > > > > > > > >> On Fri, Feb 8, 2019 at 6:20 PM Pine W <[hidden email]> wrote: > > > >> > > > >> > I don't know how practical it would be to implement an approach > like > > > >> this > > > >> > in the Wikiverse, and whether licensing proprietary technology > would > > > be > > > >> > required. > > > >> > > > > >> > > > >> They are talking about Polyform [1], a reverse proxy that filters > > > traffic > > > >> with a combination of browser fingerprinting, behavior analysis and > > > proof > > > >> of work. > > > >> Proof of work is not really useful unless you have huge levels of > bot > > > >> traffic from a single bot operator (also it means locking out users > > with > > > >> no > > > >> Javascript); browser and behavior analysis very likely cannot be > > > >> outsourced > > > >> to a third party for privacy reasons. Maybe we could do it ourselves > > > >> (although it would still bring up interesting questions > privacy-wise) > > > but > > > >> it would be a huge undertaking. > > > >> > > > >> > > > >> [1] https://www.kasada.io/product/ > > > >> _______________________________________________ > > > >> Wikitech-l mailing list > > > >> [hidden email] > > > >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l > > > >> > > > > > > > _______________________________________________ > > > Wikitech-l mailing list > > > [hidden email] > > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > > _______________________________________________ > > Wikitech-l mailing list > > [hidden email] > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > > > > -- > Jonathan T. Morgan > Senior Design Researcher > Wikimedia Foundation > User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)> > _______________________________________________ > Wikitech-l mailing list > [hidden email] > https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- Aaron Halfaker Principal Research Scientist Head of the Scoring Platform team Wikimedia Foundation _______________________________________________ Wikitech-l mailing list [hidden email] https://lists.wikimedia.org/mailman/listinfo/wikitech-l |
In reply to this post by Jonathan Morgan
Stewards are just 34 people and are not enough to be a big voting power at the wishlist like enwiki people. What we actually need cannot get it thru that way.
-- Yongmin Sent from my iPhone Text licensed under CC BY ND 2.0 KR Please note that this address is list-only address and any non-mailing list mails will be treated as spam. Please use https://encrypt.to/0x947f156f16250de39788c3c35b625da5beff197a 2019. 2. 12. 02:18, Jonathan Morgan <[hidden email]> 작성: > This may be naive, but... isn't the wishlist filling this need? And if not > through a consensus-driven method like the wishlist, how should a WMF team > prioritize which power user tools it needs to focus on? > > Or is just a matter of "Yes, wishlist, but more of it"? > > - Jonathan > >> On Mon, Feb 11, 2019 at 2:34 AM bawolff <[hidden email]> wrote: >> >> Sure its certainly a front we can do better on. >> >> I don't think Kasada is a product that's appropriate at this time. Ignoring >> the ideological aspect of it being non-free software, there's a lot of easy >> things we could and should try first. >> >> However, I'd caution against viewing this as purely a technical problem. >> Wikimedia is not like other websites - we have allowable bots. For many >> commercial websites, the only good bot is a dead bot. Wikimedia has many >> good bots. On enwiki usually they have to be approved, I don't think that's >> true on all wikis. We also consider it perfectly ok to do limited testing >> of bots before it is approved. We also encourage the creation of >> alternative "clients", which from a server perspective looks like a bot. >> Unlike other websites where anything non-human is evil, here we need to >> ensure our blocking corresponds to social norms of the community. This may >> sound not that hard, but I think it complicates botblocking more than is >> obvious at first glance. >> >> Second, this sort of thing is something that tends to far through the >> cracks at WMF. AFAIK the last time there was a team responsible for admin >> tools & anti-abuse was 2013 ( >> https://www.mediawiki.org/wiki/Admin_tools_development). I believe >> (correct >> me if I'm wrong) that anti-harrasment team is all about human harassment >> and not anti-abuse in this sense. Security is adjacent to this problem, but >> traditionally has not considered this problem in scope. Even core tools >> like checkuser have been largely ignored by the foundation for many many >> years. >> >> I guess this is a long winded way of saying - I think there should be a >> team responsible for this sort of stuff at WMF, but there isn't one. I >> think there's a lot of rather easy things we can try (Off the top of my >> head: Better captchas. More adaptive rate limits that adjust based on how >> evilish you look, etc), but they definitely require close involvement with >> the community to ensure that we do the actual right thing. >> >> -- >> Brian >> (p.s. Consider this a volunteer hat email) >> >>> On Sun, Feb 10, 2019 at 6:06 AM Pine W <[hidden email]> wrote: >>> >>> To clarify the types of unwelcome bots that we have, here are the ones >> that >>> I think are most common: >>> >>> 1) Spambots >>> >>> 2) Vandalbots >>> >>> 3) Unauthorized bots which may be intended to act in good faith but which >>> may cause problems that could probably have been identified during >> standard >>> testing in Wikimedia communities which have a relatively well developed >> bot >>> approval process. (See >>> https://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval.) >>> >>> Maybe unwelcome bots are not a priority for WMF at the moment, in which >>> case I could add this subject into a backlog. I am sorry if I sound >> grumpy >>> at WMF regarding this subject; this is a problem but I know that there >> are >>> millions of problems and I don't expect a different project to be dropped >>> in order to address this one. >>> >>> While it is a rough analogy, I think that this movie clip helps to >>> illustrate a problem of bad bots. Although the clip is amusing, I am not >>> amused by unwelcome bots causing problems on ENWP or anywhere else in the >>> Wikiverse. https://www.youtube.com/watch?v=lokKpSrNqDA >>> >>> Thanks, >>> >>> Pine >>> ( https://meta.wikimedia.org/wiki/User:Pine ) >>> >>> >>> >>>> On Sat, Feb 9, 2019, 1:40 PM Pine W <[hidden email] wrote: >>>> >>>> OK. Yesterday I was looking with a few other ENWP people at what I >> think >>>> was a series of edits by either a vandal bot or an inadequately >> designed >>>> and unapproved good faith bot. I read that it made approximately 500 >>> edits >>>> before someone who knew enough about ENWP saw what was happening and >> did >>>> something about it. I don't know how many problematic bots we have, in >>>> addition to vandal bots, but I am confident that they drain a >> nontrivial >>>> amount of time from stewards, admins, and patrollers. >>>> >>>> I don't know how much of a priority WMF places on detecting and >> stopping >>>> unwelcome bots, but I think that the question of how to decrease the >>>> numbers and effectiveness of unwelcome bots would be a good topic for >> WMF >>>> to research. >>>> >>>> Pine >>>> ( https://meta.wikimedia.org/wiki/User:Pine ) >>>> >>>> >>>> On Sat, Feb 9, 2019 at 9:24 PM Gergo Tisza <[hidden email]> >> wrote: >>>> >>>>>> On Fri, Feb 8, 2019 at 6:20 PM Pine W <[hidden email]> wrote: >>>>>> >>>>>> I don't know how practical it would be to implement an approach like >>>>> this >>>>>> in the Wikiverse, and whether licensing proprietary technology would >>> be >>>>>> required. >>>>>> >>>>> >>>>> They are talking about Polyform [1], a reverse proxy that filters >>> traffic >>>>> with a combination of browser fingerprinting, behavior analysis and >>> proof >>>>> of work. >>>>> Proof of work is not really useful unless you have huge levels of bot >>>>> traffic from a single bot operator (also it means locking out users >> with >>>>> no >>>>> Javascript); browser and behavior analysis very likely cannot be >>>>> outsourced >>>>> to a third party for privacy reasons. Maybe we could do it ourselves >>>>> (although it would still bring up interesting questions privacy-wise) >>> but >>>>> it would be a huge undertaking. >>>>> >>>>> >>>>> [1] https://www.kasada.io/product/ >>>>> _______________________________________________ >>>>> Wikitech-l mailing list >>>>> [hidden email] >>>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l >>>>> >>>> >>> _______________________________________________ >>> Wikitech-l mailing list >>> [hidden email] >>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l >> _______________________________________________ >> Wikitech-l mailing list >> [hidden email] >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l > > > > -- > Jonathan T. Morgan > Senior Design Researcher > Wikimedia Foundation > User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)> > _______________________________________________ > Wikitech-l mailing list > [hidden email] > https://lists.wikimedia.org/mailman/listinfo/wikitech-l Wikitech-l mailing list [hidden email] https://lists.wikimedia.org/mailman/listinfo/wikitech-l |
Thanks for the replies.
I think that detailed discussion of the pros and cons of the Tech Wishlist should be separate from this thread, but I agree that one way to get a subject like unflagged bot detection addressed could be through the Tech Wishlist assuming that WMF is willing to devote resources to that topic if it ranked in the top X places. It sounds like there are a few different ways that work in this area could be resourced: 1. As mentioned above, making it be a tech wishlist item and having Community Tech work on it; 2. Having the Anti-Harrassment Tools team work on it; 3. Having the Security team work on it; 4. Having the ORES team work on it; 5. Funding work through a WMF grants program; 6. Funding through a mentorship program like GSOC. I believe that GSOC previously supported work on CAPTCHA improvements. Of the above options I suggest first considering 2 and 4. Having AHAT staff work on unflagged bot detection might be scope creep under the existing AHAT charter but perhaps AHAT's charter could be modified into something that would resemble the charter for an "Administrators' Tools Team". And if the ORES team has already done some work on unflagged bot detection then perhaps ORES and AHAT staff could collaborate on this topic. In the first half of the next WMF fiscal year, I think that planning for an existing WMF team or combination of staff from existing teams to work on unflagged bot detection would be good. If WMF does not resource this topic, then if community people want unflagged bot detection be resourced, we can consider other options such as 1 and 5. Pine ( https://meta.wikimedia.org/wiki/User:Pine ) _______________________________________________ Wikitech-l mailing list [hidden email] https://lists.wikimedia.org/mailman/listinfo/wikitech-l |
http://gph.is/2lnp32Z
On Mon, Feb 11, 2019 at 5:19 PM Pine W <[hidden email]> wrote: > Thanks for the replies. > > I think that detailed discussion of the pros and cons of the Tech Wishlist > should be separate from this thread, but I agree that one way to get a > subject like unflagged bot detection addressed could be through the Tech > Wishlist assuming that WMF is willing to devote resources to that topic if > it ranked in the top X places. > > It sounds like there are a few different ways that work in this area could > be resourced: > > 1. As mentioned above, making it be a tech wishlist item and having > Community Tech work on it; > 2. Having the Anti-Harrassment Tools team work on it; > 3. Having the Security team work on it; > 4. Having the ORES team work on it; > 5. Funding work through a WMF grants program; > 6. Funding through a mentorship program like GSOC. I believe that GSOC > previously supported work on CAPTCHA improvements. > > Of the above options I suggest first considering 2 and 4. Having AHAT staff > work on unflagged bot detection might be scope creep under the existing > AHAT charter but perhaps AHAT's charter could be modified into something > that would resemble the charter for an "Administrators' Tools Team". And if > the ORES team has already done some work on unflagged bot detection then > perhaps ORES and AHAT staff could collaborate on this topic. > > In the first half of the next WMF fiscal year, I think that planning for an > existing WMF team or combination of staff from existing teams to work on > unflagged bot detection would be good. If WMF does not resource this topic, > then if community people want unflagged bot detection be resourced, we can > consider other options such as 1 and 5. > > Pine > ( https://meta.wikimedia.org/wiki/User:Pine ) > _______________________________________________ > Wikitech-l mailing list > [hidden email] > https://lists.wikimedia.org/mailman/listinfo/wikitech-l Wikitech-l mailing list [hidden email] https://lists.wikimedia.org/mailman/listinfo/wikitech-l |
In reply to this post by Jonathan Morgan
The tech wishlist is awesome, and they do a lot of great work.
However, I don't think this type of democratic-driven development is appropriate for all things. If it were we would just get rid of all the other dev teams and just have a wish-list. In this case what is needed is an anti-abuse strategy, not just a one-off feature. This involves development of many features over the long term, maintenance, long-term product management, integration into the whole etc. Even in real life, nobody ever votes for maintenance until its way too late and everything is about to explode. Not to mention the product research aspect of it - wishlist inherently encourages people to think inside the box as it is basically asking the question of what's wrong with the current box. You can't vote for something if you don't realize its a choice. As other's have mentioned, majority rules is also sometimes not the appropriate way to choose what to do. Sometimes there are things that only affect a minority, but its an important minority. Sometimes there are things that affect everyone slightly and they win over things that affect a small class significantly (Of course both types of things are important). Sometimes there are things that are long term important but short term unimportant [Not saying that people can't vote rationally for long term tasks, just that the wishlist is mostly developed around the idea of short term tasks, short enough you can do about 10 of them in a year]. -- Brian On Mon, Feb 11, 2019 at 5:18 PM Jonathan Morgan <[hidden email]> wrote: > This may be naive, but... isn't the wishlist filling this need? And if not > through a consensus-driven method like the wishlist, how should a WMF team > prioritize which power user tools it needs to focus on? > > Or is just a matter of "Yes, wishlist, but more of it"? > > - Jonathan > > On Mon, Feb 11, 2019 at 2:34 AM bawolff <[hidden email]> wrote: > >> Sure its certainly a front we can do better on. >> >> I don't think Kasada is a product that's appropriate at this time. >> Ignoring >> the ideological aspect of it being non-free software, there's a lot of >> easy >> things we could and should try first. >> >> However, I'd caution against viewing this as purely a technical problem. >> Wikimedia is not like other websites - we have allowable bots. For many >> commercial websites, the only good bot is a dead bot. Wikimedia has many >> good bots. On enwiki usually they have to be approved, I don't think >> that's >> true on all wikis. We also consider it perfectly ok to do limited testing >> of bots before it is approved. We also encourage the creation of >> alternative "clients", which from a server perspective looks like a bot. >> Unlike other websites where anything non-human is evil, here we need to >> ensure our blocking corresponds to social norms of the community. This may >> sound not that hard, but I think it complicates botblocking more than is >> obvious at first glance. >> >> Second, this sort of thing is something that tends to far through the >> cracks at WMF. AFAIK the last time there was a team responsible for admin >> tools & anti-abuse was 2013 ( >> https://www.mediawiki.org/wiki/Admin_tools_development). I believe >> (correct >> me if I'm wrong) that anti-harrasment team is all about human harassment >> and not anti-abuse in this sense. Security is adjacent to this problem, >> but >> traditionally has not considered this problem in scope. Even core tools >> like checkuser have been largely ignored by the foundation for many many >> years. >> >> I guess this is a long winded way of saying - I think there should be a >> team responsible for this sort of stuff at WMF, but there isn't one. I >> think there's a lot of rather easy things we can try (Off the top of my >> head: Better captchas. More adaptive rate limits that adjust based on how >> evilish you look, etc), but they definitely require close involvement with >> the community to ensure that we do the actual right thing. >> >> -- >> Brian >> (p.s. Consider this a volunteer hat email) >> >> On Sun, Feb 10, 2019 at 6:06 AM Pine W <[hidden email]> wrote: >> >> > To clarify the types of unwelcome bots that we have, here are the ones >> that >> > I think are most common: >> > >> > 1) Spambots >> > >> > 2) Vandalbots >> > >> > 3) Unauthorized bots which may be intended to act in good faith but >> which >> > may cause problems that could probably have been identified during >> standard >> > testing in Wikimedia communities which have a relatively well developed >> bot >> > approval process. (See >> > https://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval.) >> > >> > Maybe unwelcome bots are not a priority for WMF at the moment, in which >> > case I could add this subject into a backlog. I am sorry if I sound >> grumpy >> > at WMF regarding this subject; this is a problem but I know that there >> are >> > millions of problems and I don't expect a different project to be >> dropped >> > in order to address this one. >> > >> > While it is a rough analogy, I think that this movie clip helps to >> > illustrate a problem of bad bots. Although the clip is amusing, I am not >> > amused by unwelcome bots causing problems on ENWP or anywhere else in >> the >> > Wikiverse. https://www.youtube.com/watch?v=lokKpSrNqDA >> > >> > Thanks, >> > >> > Pine >> > ( https://meta.wikimedia.org/wiki/User:Pine ) >> > >> > >> > >> > On Sat, Feb 9, 2019, 1:40 PM Pine W <[hidden email] wrote: >> > >> > > OK. Yesterday I was looking with a few other ENWP people at what I >> think >> > > was a series of edits by either a vandal bot or an inadequately >> designed >> > > and unapproved good faith bot. I read that it made approximately 500 >> > edits >> > > before someone who knew enough about ENWP saw what was happening and >> did >> > > something about it. I don't know how many problematic bots we have, in >> > > addition to vandal bots, but I am confident that they drain a >> nontrivial >> > > amount of time from stewards, admins, and patrollers. >> > > >> > > I don't know how much of a priority WMF places on detecting and >> stopping >> > > unwelcome bots, but I think that the question of how to decrease the >> > > numbers and effectiveness of unwelcome bots would be a good topic for >> WMF >> > > to research. >> > > >> > > Pine >> > > ( https://meta.wikimedia.org/wiki/User:Pine ) >> > > >> > > >> > > On Sat, Feb 9, 2019 at 9:24 PM Gergo Tisza <[hidden email]> >> wrote: >> > > >> > >> On Fri, Feb 8, 2019 at 6:20 PM Pine W <[hidden email]> wrote: >> > >> >> > >> > I don't know how practical it would be to implement an approach >> like >> > >> this >> > >> > in the Wikiverse, and whether licensing proprietary technology >> would >> > be >> > >> > required. >> > >> > >> > >> >> > >> They are talking about Polyform [1], a reverse proxy that filters >> > traffic >> > >> with a combination of browser fingerprinting, behavior analysis and >> > proof >> > >> of work. >> > >> Proof of work is not really useful unless you have huge levels of bot >> > >> traffic from a single bot operator (also it means locking out users >> with >> > >> no >> > >> Javascript); browser and behavior analysis very likely cannot be >> > >> outsourced >> > >> to a third party for privacy reasons. Maybe we could do it ourselves >> > >> (although it would still bring up interesting questions privacy-wise) >> > but >> > >> it would be a huge undertaking. >> > >> >> > >> >> > >> [1] https://www.kasada.io/product/ >> > >> _______________________________________________ >> > >> Wikitech-l mailing list >> > >> [hidden email] >> > >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l >> > >> >> > > >> > _______________________________________________ >> > Wikitech-l mailing list >> > [hidden email] >> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l >> _______________________________________________ >> Wikitech-l mailing list >> [hidden email] >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l > > > > -- > Jonathan T. Morgan > Senior Design Researcher > Wikimedia Foundation > User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)> > > Wikitech-l mailing list [hidden email] https://lists.wikimedia.org/mailman/listinfo/wikitech-l |
In reply to this post by David Barratt
Hi David, do you have a question? I saw the GIF but I don't know how to
interpret it in the context of this conversation. Thanks, Pine ( https://meta.wikimedia.org/wiki/User:Pine ) On Tue, Feb 12, 2019 at 5:49 AM David Barratt <[hidden email]> wrote: > http://gph.is/2lnp32Z > _______________________________________________ Wikitech-l mailing list [hidden email] https://lists.wikimedia.org/mailman/listinfo/wikitech-l |
In reply to this post by Pine W
Couple thoughts:
1. ORES platform (ores.wikimedia.org) was designed to host a wide range of machine learning models, not just the ones built by Aaron Halfaker himself. So, if there is a computer scientist out there who is interested in training and maintaining a new bot-detection model, it can be hosted on and surfaced through ORES. Then anyone with some bot- or web-development skills can build tools on top of that model. Noting this because that's one of the main points of having a "scoring platform": it separates the (necessarily WMF-led) work of production platform development from the development of purpose-built tools. 2. If anyone knows a computer scientist who is interested in developing and piloting a model like this please send them our way. Members of the Research team, or Aaron, *may* have capacity to support a formal collaboration 3. This seems way too complex for a GSOC project to me, but I'd love to be wrong about that. If there are students who are interested in working on this, please send them our way (no promises, obvs). 4. Modifying the charter of an existing WMF product team seems somewhat out of scope for this ask, task, and venue. :) - J On Mon, Feb 11, 2019 at 2:19 PM Pine W <[hidden email]> wrote: > Thanks for the replies. > > I think that detailed discussion of the pros and cons of the Tech Wishlist > should be separate from this thread, but I agree that one way to get a > subject like unflagged bot detection addressed could be through the Tech > Wishlist assuming that WMF is willing to devote resources to that topic if > it ranked in the top X places. > > It sounds like there are a few different ways that work in this area could > be resourced: > > 1. As mentioned above, making it be a tech wishlist item and having > Community Tech work on it; > 2. Having the Anti-Harrassment Tools team work on it; > 3. Having the Security team work on it; > 4. Having the ORES team work on it; > 5. Funding work through a WMF grants program; > 6. Funding through a mentorship program like GSOC. I believe that GSOC > previously supported work on CAPTCHA improvements. > > Of the above options I suggest first considering 2 and 4. Having AHAT staff > work on unflagged bot detection might be scope creep under the existing > AHAT charter but perhaps AHAT's charter could be modified into something > that would resemble the charter for an "Administrators' Tools Team". And if > the ORES team has already done some work on unflagged bot detection then > perhaps ORES and AHAT staff could collaborate on this topic. > > In the first half of the next WMF fiscal year, I think that planning for an > existing WMF team or combination of staff from existing teams to work on > unflagged bot detection would be good. If WMF does not resource this topic, > then if community people want unflagged bot detection be resourced, we can > consider other options such as 1 and 5. > > Pine > ( https://meta.wikimedia.org/wiki/User:Pine ) > _______________________________________________ > Wikitech-l mailing list > [hidden email] > https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- Jonathan T. Morgan Senior Design Researcher Wikimedia Foundation User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)> _______________________________________________ Wikitech-l mailing list [hidden email] https://lists.wikimedia.org/mailman/listinfo/wikitech-l |
In reply to this post by bawolff
Since we're discussing how the Tech Wishlist works then I will comment on a
few points specifically regarding that wishlist. 1. A gentle correction: the recommendations are ranked by vote, not by consensus. This has pros and cons. 2a. If memory serves me correctly, the wishlist process was designed by WMF rather than designed by community consensus. I may be wrong about this, but in my search of historical records I have not found evidence to the contrary. I think that redesigning the process would be worth considering, and I hope that a redesign would help to account for the types of needs that bawolff described in his second paragraph. 2b.. I think that it's an overstatement to say that "nobody ever votes for maintenance until its way too late and everything is about to explode". I think that many non-WMF people are aware of our backlogs, the endless requests for help and conflict resolution, and the many challenges of maintaining what we have with the current population of skilled and good faith non-WMF people. However, I have the impression that there is a common *tendency* among humans in general to chase shiny new features instead of doing mostly thankless work, and I agree that the tech wishlist is unlikely even in a redesigned form to be well suited for long term planning. I think that WMF's strategy process may be a better way to plan for the long term, including for maintenance activities that are mostly thankless and do not necessarily correlate with increasing someone's personal power, making their resume look better, or having fun. Fortunately the volunteer mentality of many non-WMF people means that we do have people who are willing to do mostly thankless, mundane, and/or stressful work, and I think that some of us feel that our work is important for maintaining the encyclopedia even when we do not enjoy it, but we have a finite supply of time from such people. Pine ( https://meta.wikimedia.org/wiki/User:Pine ) _______________________________________________ Wikitech-l mailing list [hidden email] https://lists.wikimedia.org/mailman/listinfo/wikitech-l |
I actually meant a different type of maintenance.
Maintaining the encyclopedia (and other wiki projects) is of course an activity that needs software support. But software is also something that needs maintenance. Technology, standards, circumstances change over time. Software left alone will "bitrot" over time. A long term technical strategy to do anything needs to account for that, plan for that. One off feature development does not. Democratically directed one-off feature development accounts for that even less. In response to Johnathan: So lets say that ORES/magic AI detects something is a bot. Then what? That's a small part of the picture. In fact you don't even need AI to do this, plenty of the vandal bots have generic programming language user-agents (AI could of course be useful for long-tail here, but there's much simpler stuff to start off with). Do we expose this to abusefilter somehow? Do we add a tag to mark it in RC/watchlist? Do we block it? Do we rate limit it? What amount of false positives are acceptable? What is the UI for all this? To what extent is this hard coded, and to what extent do communities control the feature? etc We don't need products to detect bots. Making products to detect bots is easy. We need product managers to come up with socio-technical systems that make sense in our special context. -- Brian On Tue, Feb 12, 2019 at 8:36 PM Pine W <[hidden email]> wrote: > Since we're discussing how the Tech Wishlist works then I will comment on a > few points specifically regarding that wishlist. > > 1. A gentle correction: the recommendations are ranked by vote, not by > consensus. This has pros and cons. > > 2a. If memory serves me correctly, the wishlist process was designed by WMF > rather than designed by community consensus. I may be wrong about this, but > in my search of historical records I have not found evidence to the > contrary. I think that redesigning the process would be worth considering, > and I hope that a redesign would help to account for the types of needs > that bawolff described in his second paragraph. > > 2b.. I think that it's an overstatement to say that "nobody ever votes for > maintenance until its way too late and everything is about to explode". I > think that many non-WMF people are aware of our backlogs, the endless > requests for help and conflict resolution, and the many challenges of > maintaining what we have with the current population of skilled and good > faith non-WMF people. However, I have the impression that there is a common > *tendency* among humans in general to chase shiny new features instead of > doing mostly thankless work, and I agree that the tech wishlist is unlikely > even in a redesigned form to be well suited for long term planning. I think > that WMF's strategy process may be a better way to plan for the long term, > including for maintenance activities that are mostly thankless and do not > necessarily correlate with increasing someone's personal power, making > their resume look better, or having fun. Fortunately the volunteer > mentality of many non-WMF people means that we do have people who are > willing to do mostly thankless, mundane, and/or stressful work, and I think > that some of us feel that our work is important for maintaining the > encyclopedia even when we do not enjoy it, but we have a finite supply of > time from such people. > > Pine > ( https://meta.wikimedia.org/wiki/User:Pine ) > _______________________________________________ > Wikitech-l mailing list > [hidden email] > https://lists.wikimedia.org/mailman/listinfo/wikitech-l Wikitech-l mailing list [hidden email] https://lists.wikimedia.org/mailman/listinfo/wikitech-l |
In reply to this post by Aaron Halfaker-3
It is extremely easy to detect a bot unless the bot operator chose to make
it hard. Just make a model for how the user interacts with the input devices, and do anomaly detection. That imply use of Javascript though, but users not using JS are either very dubious or quite well-known. There are nearly no new users that does not use JS. Reused a previous tex-file, and did not clean it up? "Magnetic Normal Modes of Bi-Component Permalloy Structures" ;) On Mon, Feb 11, 2019 at 6:47 PM Aaron Halfaker <[hidden email]> wrote: > > We've been working on unflagged bot detection on my team. It's far from a > real product integration, but we have shown that it works in practice. We > tested this in Wikidata, but I don't see a good reason why a similar > strategy wouldn't work for English Wikipedia. > > Hall, A., Terveen, L., & Halfaker, A. (2018). Bot Detection in Wikidata > Using Behavioral and Other Informal Cues. > *Proceedings of the ACM on Human-Computer Interaction*, *2*(CSCW), 64. > <https://dl.acm.org/ft_gateway.cfm?id=3274333&type=pdf> > > In theory, we could get this into ORES if there was strong demand. As Pine > points out, we'd need to delay some other projects. For reference, the > next thing on the backlog that I'm looking at is setting article quality > prediction for Swedish Wikipedia. > > -Aaron > > On Mon, Feb 11, 2019 at 11:19 AM Jonathan Morgan <[hidden email]> > wrote: > > > This may be naive, but... isn't the wishlist filling this need? And if > > through a consensus-driven method like the wishlist, how should a WMF team > > prioritize which power user tools it needs to focus on? > > > > Or is just a matter of "Yes, wishlist, but more of it"? > > > > - Jonathan > > > > On Mon, Feb 11, 2019 at 2:34 AM bawolff <[hidden email]> wrote: > > > > > Sure its certainly a front we can do better on. > > > > > > I don't think Kasada is a product that's appropriate at this time. > > Ignoring > > > the ideological aspect of it being non-free software, there's a lot of > > easy > > > things we could and should try first. > > > > > > However, I'd caution against viewing this as purely a technical > > > Wikimedia is not like other websites - we have allowable bots. For many > > > commercial websites, the only good bot is a dead bot. Wikimedia has many > > > good bots. On enwiki usually they have to be approved, I don't think > > that's > > > true on all wikis. We also consider it perfectly ok to do limited testing > > > of bots before it is approved. We also encourage the creation of > > > alternative "clients", which from a server perspective looks like a bot. > > > Unlike other websites where anything non-human is evil, here we need to > > > ensure our blocking corresponds to social norms of the community. This > > may > > > sound not that hard, but I think it complicates botblocking more than is > > > obvious at first glance. > > > > > > Second, this sort of thing is something that tends to far through the > > > cracks at WMF. AFAIK the last time there was a team responsible for admin > > > tools & anti-abuse was 2013 ( > > > https://www.mediawiki.org/wiki/Admin_tools_development). I believe > > > (correct > > > me if I'm wrong) that anti-harrasment team is all about human harassment > > > and not anti-abuse in this sense. Security is adjacent to this problem, > > but > > > traditionally has not considered this problem in scope. Even core tools > > > like checkuser have been largely ignored by the foundation for many many > > > years. > > > > > > I guess this is a long winded way of saying - I think there should be a > > > team responsible for this sort of stuff at WMF, but there isn't one. I > > > think there's a lot of rather easy things we can try (Off the top of my > > > head: Better captchas. More adaptive rate limits that adjust based on how > > > evilish you look, etc), but they definitely require close involvement > > with > > > the community to ensure that we do the actual right thing. > > > > > > -- > > > Brian > > > (p.s. Consider this a volunteer hat email) > > > > > > On Sun, Feb 10, 2019 at 6:06 AM Pine W <[hidden email]> wrote: > > > > > > > To clarify the types of unwelcome bots that we have, here are the > > > that > > > > I think are most common: > > > > > > > > 1) Spambots > > > > > > > > 2) Vandalbots > > > > > > > > 3) Unauthorized bots which may be intended to act in good faith but > > which > > > > may cause problems that could probably have been identified during > > > standard > > > > testing in Wikimedia communities which have a relatively well > > > bot > > > > approval process. (See > > > > https://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval.) > > > > > > > > Maybe unwelcome bots are not a priority for WMF at the moment, in which > > > > case I could add this subject into a backlog. I am sorry if I sound > > > grumpy > > > > at WMF regarding this subject; this is a problem but I know that there > > > are > > > > millions of problems and I don't expect a different project to be > > dropped > > > > in order to address this one. > > > > > > > > While it is a rough analogy, I think that this movie clip helps to > > > > illustrate a problem of bad bots. Although the clip is amusing, I am > > not > > > > amused by unwelcome bots causing problems on ENWP or anywhere else in > > the > > > > Wikiverse. https://www.youtube.com/watch?v=lokKpSrNqDA > > > > > > > > Thanks, > > > > > > > > Pine > > > > ( https://meta.wikimedia.org/wiki/User:Pine ) > > > > > > > > > > > > > > > > On Sat, Feb 9, 2019, 1:40 PM Pine W <[hidden email] wrote: > > > > > > > > > OK. Yesterday I was looking with a few other ENWP people at what I > > > think > > > > > was a series of edits by either a vandal bot or an inadequately > > > designed > > > > > and unapproved good faith bot. I read that it made approximately > > > > edits > > > > > before someone who knew enough about ENWP saw what was happening and > > > did > > > > > something about it. I don't know how many problematic bots we have, > > in > > > > > addition to vandal bots, but I am confident that they drain a > > > nontrivial > > > > > amount of time from stewards, admins, and patrollers. > > > > > > > > > > I don't know how much of a priority WMF places on detecting and > > > stopping > > > > > unwelcome bots, but I think that the question of how to decrease the > > > > > numbers and effectiveness of unwelcome bots would be a good topic for > > > WMF > > > > > to research. > > > > > > > > > > Pine > > > > > ( https://meta.wikimedia.org/wiki/User:Pine ) > > > > > > > > > > > > > > > On Sat, Feb 9, 2019 at 9:24 PM Gergo Tisza <[hidden email]> > > > wrote: > > > > > > > > > >> On Fri, Feb 8, 2019 at 6:20 PM Pine W <[hidden email]> > > > > >> > > > > >> > I don't know how practical it would be to implement an approach > > like > > > > >> this > > > > >> > in the Wikiverse, and whether licensing proprietary technology > > would > > > > be > > > > >> > required. > > > > >> > > > > > >> > > > > >> They are talking about Polyform [1], a reverse proxy that filters > > > > traffic > > > > >> with a combination of browser fingerprinting, behavior analysis > > > > proof > > > > >> of work. > > > > >> Proof of work is not really useful unless you have huge levels of > > bot > > > > >> traffic from a single bot operator (also it means locking out users > > > with > > > > >> no > > > > >> Javascript); browser and behavior analysis very likely cannot be > > > > >> outsourced > > > > >> to a third party for privacy reasons. Maybe we could do it ourselves > > > > >> (although it would still bring up interesting questions > > privacy-wise) > > > > but > > > > >> it would be a huge undertaking. > > > > >> > > > > >> > > > > >> [1] https://www.kasada.io/product/ > > > > >> _______________________________________________ > > > > >> Wikitech-l mailing list > > > > >> [hidden email] > > > > >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l > > > > >> > > > > > > > > > _______________________________________________ > > > > Wikitech-l mailing list > > > > [hidden email] > > > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > > > _______________________________________________ > > > Wikitech-l mailing list > > > [hidden email] > > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > > > > > > > > -- > > Jonathan T. Morgan > > Senior Design Researcher > > Wikimedia Foundation > > User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)> > > _______________________________________________ > > Wikitech-l mailing list > > [hidden email] > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > > > > -- > > Aaron Halfaker > > Principal Research Scientist > > Head of the Scoring Platform team > Wikimedia Foundation > _______________________________________________ > Wikitech-l mailing list > [hidden email] > https://lists.wikimedia.org/mailman/listinfo/wikitech-l Wikitech-l mailing list [hidden email] https://lists.wikimedia.org/mailman/listinfo/wikitech-l |
In reply to this post by bawolff
Brian,
I think we may be talking past each other. I'm Mr. Socio-technical systems. I thought what was being requested was a way to detect bots. I maintain my own bots, work extensively with product teams, and have a deep and abiding familiarity with the complexity of designing effective tools for WIkipedia. - J On Wed, Feb 13, 2019 at 4:14 AM bawolff <[hidden email]> wrote: > I actually meant a different type of maintenance. > > Maintaining the encyclopedia (and other wiki projects) is of course an > activity that needs software support. > > But software is also something that needs maintenance. Technology, > standards, circumstances change over time. Software left alone will > "bitrot" over time. A long term technical strategy to do anything needs to > account for that, plan for that. One off feature development does not. > Democratically directed one-off feature development accounts for that even > less. > > In response to Johnathan: > So lets say that ORES/magic AI detects something is a bot. Then what? > That's a small part of the picture. In fact you don't even need AI to do > this, plenty of the vandal bots have generic programming language > user-agents (AI could of course be useful for long-tail here, but there's > much simpler stuff to start off with). Do we expose this to abusefilter > somehow? Do we add a tag to mark it in RC/watchlist? Do we block it? Do we > rate limit it? What amount of false positives are acceptable? What is the > UI for all this? To what extent is this hard coded, and to what extent do > communities control the feature? etc > > We don't need products to detect bots. Making products to detect bots is > easy. We need product managers to come up with socio-technical systems that > make sense in our special context. > > -- > Brian > > On Tue, Feb 12, 2019 at 8:36 PM Pine W <[hidden email]> wrote: > > > Since we're discussing how the Tech Wishlist works then I will comment > on a > > few points specifically regarding that wishlist. > > > > 1. A gentle correction: the recommendations are ranked by vote, not by > > consensus. This has pros and cons. > > > > 2a. If memory serves me correctly, the wishlist process was designed by > WMF > > rather than designed by community consensus. I may be wrong about this, > but > > in my search of historical records I have not found evidence to the > > contrary. I think that redesigning the process would be worth > considering, > > and I hope that a redesign would help to account for the types of needs > > that bawolff described in his second paragraph. > > > > 2b.. I think that it's an overstatement to say that "nobody ever votes > for > > maintenance until its way too late and everything is about to explode". I > > think that many non-WMF people are aware of our backlogs, the endless > > requests for help and conflict resolution, and the many challenges of > > maintaining what we have with the current population of skilled and good > > faith non-WMF people. However, I have the impression that there is a > common > > *tendency* among humans in general to chase shiny new features instead of > > doing mostly thankless work, and I agree that the tech wishlist is > unlikely > > even in a redesigned form to be well suited for long term planning. I > think > > that WMF's strategy process may be a better way to plan for the long > term, > > including for maintenance activities that are mostly thankless and do not > > necessarily correlate with increasing someone's personal power, making > > their resume look better, or having fun. Fortunately the volunteer > > mentality of many non-WMF people means that we do have people who are > > willing to do mostly thankless, mundane, and/or stressful work, and I > think > > that some of us feel that our work is important for maintaining the > > encyclopedia even when we do not enjoy it, but we have a finite supply of > > time from such people. > > > > Pine > > ( https://meta.wikimedia.org/wiki/User:Pine ) > > _______________________________________________ > > Wikitech-l mailing list > > [hidden email] > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > _______________________________________________ > Wikitech-l mailing list > [hidden email] > https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- Jonathan T. Morgan Senior Design Researcher Wikimedia Foundation User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)> _______________________________________________ Wikitech-l mailing list [hidden email] https://lists.wikimedia.org/mailman/listinfo/wikitech-l |
In reply to this post by bawolff
On Wed, Feb 13, 2019 at 12:13 PM bawolff <[hidden email]> wrote:
> I actually meant a different type of maintenance. > > Maintaining the encyclopedia (and other wiki projects) is of course an > activity that needs software support. > > But software is also something that needs maintenance. Technology, > standards, circumstances change over time. Software left alone will > "bitrot" over time. A long term technical strategy to do anything needs to > account for that, plan for that. One off feature development does not. > Democratically directed one-off feature development accounts for that even > less. > I understand. I was intending to comment on maintenance activities in general, whether that be maintenance of a city's water system, maintenance of the text of encyclopedia articles, or maintenance of software. My train of thought proceeded into a somewhat detailed commentary detail regarding maintenance of non-software Wikimedia elements. I think that the tendency to under-resource maintenance in favor of novelties is similar in many domains of human activity, but I also think that humans collectively are not so unwise that we will prefer novelties over maintenance every time that there is a referendum on whether to maintain an existing service or to create something new. {{Citation needed}} I think that multiple good points have been raised in this thread regarding the subjects of technical and human systems for detecting and intervening against possible unflagged bots. I am wondering what a good way would be to get a WMF product manager or someone similar to dedicate time to this topic. My preference remains that one or more WMF people, or teams, add this to their list of topics to address in a future quarter such as Q1 of the WMF 2019-2020 fiscal year. I don't know how the WMF Community Tech team plans for maintenance of features after the features are initially built, debugged, and deployed, and based on the current state of this discussion I don't currently have a strong opinion regarding whether Community Tech or a different team would be best suited to work on the topic of unflagged bots. I also don't know how WMF makes decisions about what goals are for teams other than Community Tech for future quarters, but that information could be helpful to have for this conversation. Thanks, Pine ( https://meta.wikimedia.org/wiki/User:Pine ) _______________________________________________ Wikitech-l mailing list [hidden email] https://lists.wikimedia.org/mailman/listinfo/wikitech-l |
Free forum by Nabble | Edit this page |