The Revision Scoring weekly update

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view

The Revision Scoring weekly update

Aaron Halfaker-2
Hey folks!

This is the 32 - 41st weekly update from the revision scoring team that we
have sent to this mailing list.  We've been busy, but our reporting fell
behind.  So here I am getting us caught up!  This is going to be a long
one.  Bear with me.

One major thing we've done in the past few weeks is drafted and presented a
proposal to increase the resourcing for the ORES project in the 2017 Fiscal
Year.  Currently, we're just one fully funded staff member (halfak) and
partially funded contractor (Amir1) working with a bunch of volunteers.
We're proposing to staff the team with fulltime engineers, a liaison and a
tech writer.  See a full draft of our proposal and pitch deck here:

*New development:*

We've expanded support for our "editquality" models to more wikis and
improved the performance of some of the models.

   - We scaled up the number of observations for Indonesian Wikipedia to

   - We added language support for Romanian[2] and built the basic
   "reverted" model[3]

   - We trained and tested "damaging" and "goodfaith" models for Czech

   - We implemented some params in our training utilites to control memory

   - We deployed all of the above to Wikimedia Labs[6].  A production
   deployment is coming soon.

Prompted by the 2016 community wishlist[7], we've implemented a
"draftquality" model for evaluating new page creations.

   - We researched deletion reasons on English Wikipedia[8] and created a
   labeled dataset using the deletion log.

   - We engineered a set of features to predict the quality of new
   articles[9] and built a model[10]

   - We generated a set of datasets[11,12,13] to make it easier for
   volunteers and external researchers to help us audit the performance of the

   - We deployed the model on WMFLabs[14] and announced it's presence to a
   few interested patrollers in English Wikipedia

   - We've started the process of deploying the model in production[15,16]

We completed a project exploring the use of advance natural-language
processing strategies to extract new signal about vandalism, article
quality and problematic new articles.  Regretfully, memory issues prevent
us from trivially putting this into production[17], so we're looking into
alternative strategies[18].

   - We implemented a strategy for extracting sentence from Wikitext[19]

   - We built sentence banks for personal attacks[20, vandalism[21],
   spam[22], and Featured Articles[23].

   - We built PCFG-based models[24] and analyzed their ability to

We've been working with the Collaboration Team[26] on their Edit Review
Improvments project[27]

   - We defined and implemented a set of new precision-based test
   statistics that will inform thresholds used in their new user interface[28]

   - But we also decided to continue to report recall-based test statistics
   as well[29]

Based on advice from engineers on the Collaboration Team, we've begun the
process of converting Wiki labels[30] to a stand-alone tool in labs.

   - We generalize the gadget interface so that it can handle all

   - We implemented a means to auto-configure wikis based on the
   dbname[32,33] and that allowed us to simplify configuration[34]

   - We also implemented some performance improvements with minification,


In the past few weeks, we've set up labeling campaigns for a few wikis.

   - We deployed an edit types campaign for Catalan Wikipedia[36]

   - We deployed an edit quality campagin for Chinese[37] and Romanian[38]

   - We deployed a new type of campaign for English Wikipedia --
   "discussion quality" asks editors to label talk posts as "toxic" or not[39]

*Maintenance and robustness:*

We've solved a large set of problems with logging issues, compatibility
with wikibase, and we've made minor improvements to performance.

   - We addressed a few bugs in the ORES Review Tool[40,44]

   - We quieted some errors from our logging in ORES[41,45]

   - We updated our code to work with a wikibase schema change[42]

   - We fixed a language fallback pattern in Wiki labels[43]

   - We set up monitoring on ORES database disk sizes[46]

   - We fixed some issues with scap, phabricator's diffusion and other
   supporting systems so that we can continue deploying to beta labs[47]

   - We split our assets repo so that we can let our WMFLabs deploy get
   ahead of the Production deployment[48]

   - ORES can now minify its JSON responses[49]

   - We identified a bug in flask-assets and worked around it in our local
   installation of Wiki labels[50]

*Communications and outreach:*

We had a big presence at the Wikimedia Developer summit, we've drafted a
resourcing proposal, and we've made some announcements about upcoming plans
for the ORES Review tool.

   - We facilitated the "Artificial Intelligence to build and navigate
   content" track[51]

   - We ran a session for building an AI wishlist[52] and captured notes
   about more than 20 new AI proposals on a new tag in phabricator[53]

   - We also ran a session discussion the ethics and dangers of advanced
   algorithms mediating our processes[54]

   - We helped facilitate a session about where to surface current AIs in
   Wikimedia Projects[55]

   - We held a discussion with Legal about licensing labeled data that
   comes out of Wiki labels[56] and updated the interface to state the CC0
   license clearly[57]

   - We worked with the Reading Infrastructure team to analyze the
   consumption of "oresscores" through the MediaWiki API[58]

   - We drafted a pitch for increasing the resources for our team[59]

   - We worked with the Collaboration team to announce that they'll
   experimenting with a new RecentChanged filtering strategy in the ORES
   Review Tool[60,61]

1. -- Scale up the number of
observations for idwiki to 100k
2. -- Add language support for
3. -- Build reverted model for
Romanian Wikipedia
4. -- Train and test
damaging/goodfaith models for Czech Wikipedia
5. -- Add '--workers' param to
cv_train utility
6. -- Clean up dependencies and
deploy newest ORES & Models in labs
9. -- Build feature set for draft
quality model
10. -- [Epic] Build draft quality
model (spam, vandalism, attack, or OK)
11. -- Extract features for
deleted page (draft quality model)
12. -- Generate scored dataset
for 2016-08 - 2017-01
13. -- Generate extracted
features for 2016-08 - 2017-01
14. -- Deploy draftquality models
to WMFLabs
15. -- Create package stuff for
16. -- Create new repo:
17. -- Memory footprint
is enormous!
18. -- [Spike] Investigate use of
Apertium LTtoolbox API in labs/production
19. -- Implement sentences
20. -- Sentence bank for personal
21. -- Sentence bank for vandalism
22. -- Sentence bank for spam
23. -- Sentence bank for Featured
24. -- Generate PCFG sentence
25. -- Analyze differentiation of
FA, Spam, Vandalism, and Attack models/sentences.
28. -- Implement new
precision-based test stats for editquality models
29. -- Restore
recall-threshold-based metrics for editquality models.
31. -- Generalize standalone
gadget interface
32. -- Auto config wikilabels
using dbnames
33. -- Use module loader to load
JS/CSS from wikis
34. -- Remove host from
wikilabels config -- infer from request
35. -- Minification and bundling
for wikilabels assets
36. -- Deploy cawiki edit types
37. -- Deploy zhwiki edit quality
38. -- Deploy edit quality
campaign for Romanian Wikipedia
39. -- Deploy "Discussion
quality" campaign in wikilabels
40. -- Undefined method
41. -- Quiet TimeoutError in
celery logging
42. -- Quantity changes broke ORES
43. -- Chinese translations are
not being loaded
44. -- Fatal exception of type
"DBQueryError" on sorting ORES contributions
45. -- ores logspam: Model
contains an error
46. -- Set up monitoring for ORES
redis database
47. -- Fix broken beta-labs deploy
48. -- Split wheels repo into
Prod/WMFLabs branches and maintain independence
49. -- Minify json responses
50. -- assets url return empty
51. -- Artificial Intelligence to
build and navigate content
52. -- What should an AI do you
for you? Building an AI Wishlist.
54. -- Algorithmic dangers and
transparency -- Best practices
55. -- Where to surface AI in
Wikimedia Projects
56. -- Licensing of labeled data
57. -- Add notice of CC0 status
of Wikilabels data to UI & Docs
58. -- Identify baseline api.php
Action API consumption
59. -- Draft proposal/pitch for
ORES resourcing
60. -- Gather assets for post
about ORES review tool including ERI filters
61. -- Post about ORES review
tool including ERI filters

Aaron from the Revision Scoring Scoring Platform team
Wikitech-l mailing list
[hidden email]
Reply | Threaded
Open this post in threaded view

Re: The Revision Scoring weekly update

Aaron Halfaker-2
Hey folks!

I should really stop calling this a weekly update because it's getting a
bit silly at this point.  :)   But if it were a weekly update, it would
cover the weeks of 42 - 46.


   - 3 new models: Finnish Wikipedia (reverted) and Estonian Wikipedia
   (damaging & goodfaith)

   - We estimated and agreed on funding for ORES servers in the next year
   with Operations

   - We published a paper about vandalism detection in Wikidata and a blog
   post about the massive effect of some initiatives on coverage of Women
   Scientists in Wikipedia.

*New development:*

   - We added recall-based threshold metrics to the new draftquality model
   which should help tool devs know what which new page creations to highlight
   for review[1]

   - We added optional notices for ORES pages which will help us visually
   distinguish our experimental install in WMFlabs from the Prod install ([2]

   - We added basic language support for Finish (Thanks 4shadoww)[3] and
   deployed a 'reverted' model[4]

   - We lead a discussion in Wikidata about "item quality" that resulted in
   a Wikipedia 1.0 like scale for Wikidata quality[5,6] and designed a
   Wikilabels form to capture the gist of it[7]

   - We enabled the ORES Review Tool on Czech Wikipedia[8]

   - We configured ChangeProp to use our new minified JSON output to save

   - We extended the Estonian language assets (Thanks Cumbril)[10] and
   deployed the 'damaging' and 'goodfaith' models[11,12]

   - We enabled a testing model for 'goodfaith' on the Beta Cluster to make
   it easier for the Collaboration team to run tests with their new filter

   - We created a new "precache" endpoint that will allow us to
   de-duplicate configuration with ChangeProp and handle all routing in ORES


   - We completed a 2 year estimate of ORES resource needs and discussed
   funding (capital expendature) for ORES in the coming fiscal year[15].  This
   will allow us to continue to grow ORES both in number of models and in
   scoring capacity.


   - Amir improved the KDD paper based on review feedback[16] and got it

   - We published a blob post about our measurements of WikiProject Women
   Scientists[18,19] -- "The Keilana Effect"

   - Thanks to Cumbril's work, the Estonian labeling campaing was


   - In early February, we deployed a new set of translations to Wikilabels
   (specifcally targeting Romanian Wikipedia)[21]

   - In mid-February, we deployed some fixes to ORES documentation and
   response formatting[22]

   - In mid-March, we deployed 3 new scoring models and ORES notices[23]

*Maintenance and robustness:*

   - We fixed a serious issue in the "mwoauth" library that Wikilabels
   depends on[24]

   - We reduced the number of revisions per request that we could receive
   via api.php[25]

   - We investigated a scap issue that broke ORES deployment[26]

   - We fixed a minor issue with JSON minification behavior[27] and
   hard-coding of the location of ORES in the documentation[28]

   - We improved performance of ORES filters on MediaWiki[29]

   - We improved the language describing ORES behavior on

   - We added a notice to the Wikipages that Dexbot maintains about its

   - We added notices to about it's experimental nature[32]

   - We fixed some issues with testing Finnish language assets[33]

   - We fixed some styling issues that resulted from an upgrade of OOJS

1. -- Add recall based thresholds
to draftquality model
2. -- Add an optional notice to
ORES main and ui pages
3. -- Add language support for
4. -- Train/test reverted model
for fiwiki
5. -- [Discuss] item quality in
7. -- Design item_quality form
for Wikidata
8. -- Enable ORES Review Tool on
Czech Wikipedia
9. -- Use minified JSON format in
10. -- Extend estonian language
assets from Wiki page
11. -- Train/test
damaging/goodfaith models for etwiki
12. -- Deploy edit quality models
for etwiki
13. -- Enable 'goodfaith' on
testwiki on Beta Cluster
14. -- Create generalized
"precache" endpoint for ORES
15. -- Estimate ORES capex for
16. -- Improve the KDD paper
based on the review
18. -- Blog post about wp10
measurements of Women Scientists
20. -- Complete etwiki edit
quality campaign
21. -- Deploy Romanian
translations for Wiki labels
22. -- Prod deployment of ORES
23. -- Deploy ores in prod
24. -- mwoauth is broken
25. -- Reduce the number of
revisions that can be requested in one batch
26. -- Investigate failed ORES
27. -- Investigate default JSON
minification behavior in production
28. -- ORES swagger is hard-coded
for wmflabs
29. -- rcshow=oresreview is slow
30. -- Fix message in
31. -- Add notice about Dexbot
overwriting manual changes to our tracking table.
32. -- Add a notice to
ores-wmflabs-deploy about "experimental" nature
33. -- Fix testing issues in
finnish language assets
34. -- Fix minor styling issues
with OOJS-UI in wikilabels

Aaron from the Scoring Platform team
Wikitech-l mailing list
[hidden email]