Re: [WikiEN-l] JarlaxleArtemis/Grawp

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Re: [WikiEN-l] JarlaxleArtemis/Grawp

Brian J Mingus
Potthast, Stein, Gerling. (2008). Automatic Vandalism Detection in Wikipedia. http://www.uni-weimar.de/medien/webis/publications/downloads/papers/stein_2008c.pdf

Abstract. We present results of a new approach to detect destructive article revi-
sions, so-called vandalism, in Wikipedia. Vandalism detection is a one-class clas-
sification problem, where vandalism edits are the target to be identified among
all revisions. Interestingly, vandalism detection has not been addressed in the In-
formation Retrieval literature by now. In this paper we discuss the characteristics
of vandalism as humans recognize it and develop features to render vandalism
detection as a machine learning task. We compiled a large number of vandalism
edits in a corpus, which allows for the comparison of existing and new detection
approaches. Using logistic regression we achieve 83% precision at 77% recall
with our model. Compared to the rule-based methods that are currently applied
in Wikipedia, our approach increases the F -Measure performance by 49% while
being faster at the same time.


Open the PDF, scan to page 667. This bot outperforms MartinBot, T-850 Robotic Assistant, WerdnaAntiVandalBot, Xenophon, ClueBot, CounterVandalismBot, PkgBot, MiszaBot, and AntiVandalBot. It outperforms the best of those (AntiVandalBot) by a very wide margin.

So why are you wasting the ISPs time and the police's time when the best of the passive technology routes have not been explored? Using machine learning you pit the vandals against themselves.  Every time they perform a particular kind of vandalism, it can never be performed again because the bot will recognize it.

Cheers,
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l