People with knowledge of English swear words needed :o

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

People with knowledge of English swear words needed :o

Petr Bena
Are you good in swearing? WE NEED YOU

Huggle 3 comes with vandalism-prediction as it is precaching the diffs
even before they are enqueued including their contents. Each edit has
so called "score" which is a numerical value that if higher, the edit
is more likely a vandalism.

If you want to help us improve this feature, it is necessary to define
a "score words" list for every wiki where huggle is about to be used,
for example on English wiki.

Each list has following syntax:

(see https://en.wikipedia.org/w/index.php?title=Wikipedia:Huggle/Config&diff=573615259&oldid=573615075)


score-words(score):
    list of words separated by comma, can contain newlines but comma
must be present

example

score-words(200):
    these, are, some, words, which, presence, of, increases, the, score,
    each, word, by, 200,

So, if you know english better than me, which you likely do, go ahead
and improve the configuration file there, no worries, huggle's config
parser is very syntax-error proof.

If you have any other suggestion how to improve huggle's prediction,
go ahead and tell us!

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: People with knowledge of English swear words needed :o

C. Scott Ananian
Perhaps we could use some Math here?  Can we grab a list of the last, say,
100,000 edits reverted for vandalism, look at the diff, and compute a
frequency score based on that?
 --scott

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: People with knowledge of English swear words needed :o

Antoine Musso-3
In reply to this post by Petr Bena
Le 19/09/13 11:35, Petr Bena a écrit :
<snip>

> Huggle 3 comes with vandalism-prediction as it is precaching the diffs
> even before they are enqueued including their contents. Each edit has
> so called "score" which is a numerical value that if higher, the edit
> is more likely a vandalism.
>
> If you want to help us improve this feature, it is necessary to define
> a "score words" list for every wiki where huggle is about to be used,
> for example on English wiki.
>
> Each list has following syntax:
>
> (see https://en.wikipedia.org/w/index.php?title=Wikipedia:Huggle/Config&diff=573615259&oldid=573615075)

The good thing while reinventing the wheel, is that you can reuse
existing material :-]

Cluebot-NG has such a list: http://review.cluebot.cluenet.org  and its a
quite active one:
 http://en.wikipedia.org/wiki/Special:Contributions/ClueBot_NG


It uses a variety of algorithms to determine the score of an edit:
 http://en.wikipedia.org/wiki/User:ClueBot_NG#Vandalism_Detection_Algorithm


Maybe get in touch with them and reuse their engine?


--
Antoine "hashar" Musso


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: People with knowledge of English swear words needed :o

Chad
In reply to this post by Petr Bena
On Thu, Sep 19, 2013 at 2:35 AM, Petr Bena <[hidden email]> wrote:

> Are you good in swearing? WE NEED YOU
>
>
I know 7 words you can add ;-)

[[w:Seven dirty words]]

-Chad
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: People with knowledge of English swear words needed :o

Petr Bena
In reply to this post by Antoine Musso-3
Hi, cool, I was actually expecting someone to come out with
suggestions like this. Indeed I didn't know that and now I do. In fact
closer cooperation with cluebot is on TO-DO :-) any good algorithm to
calculate vandalism is appreciated, in fact this might be the first
thing we should create hooks for, so that people can implement own
algorithms as either c++ or python plugins which count the score just
as they like... (unfortunately I didn't manage to get python engine
working for windows build yet)

On Thu, Sep 19, 2013 at 4:47 PM, Antoine Musso <[hidden email]> wrote:

> Le 19/09/13 11:35, Petr Bena a écrit :
> <snip>
>> Huggle 3 comes with vandalism-prediction as it is precaching the diffs
>> even before they are enqueued including their contents. Each edit has
>> so called "score" which is a numerical value that if higher, the edit
>> is more likely a vandalism.
>>
>> If you want to help us improve this feature, it is necessary to define
>> a "score words" list for every wiki where huggle is about to be used,
>> for example on English wiki.
>>
>> Each list has following syntax:
>>
>> (see https://en.wikipedia.org/w/index.php?title=Wikipedia:Huggle/Config&diff=573615259&oldid=573615075)
>
> The good thing while reinventing the wheel, is that you can reuse
> existing material :-]
>
> Cluebot-NG has such a list: http://review.cluebot.cluenet.org  and its a
> quite active one:
>  http://en.wikipedia.org/wiki/Special:Contributions/ClueBot_NG
>
>
> It uses a variety of algorithms to determine the score of an edit:
>  http://en.wikipedia.org/wiki/User:ClueBot_NG#Vandalism_Detection_Algorithm
>
>
> Maybe get in touch with them and reuse their engine?
>
>
> --
> Antoine "hashar" Musso
>
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: People with knowledge of English swear words needed :o

Chris Steipp
In reply to this post by C. Scott Ananian
On Thu, Sep 19, 2013 at 7:19 AM, C. Scott Ananian <[hidden email]>wrote:

> Perhaps we could use some Math here?  Can we grab a list of the last, say,
> 100,000 edits reverted for vandalism, look at the diff, and compute a
> frequency score based on that?
>  --scott
>

This is pretty much what my gsoc student implemented in the bayesian filter
extension. If that gets some use, then those lists could easily be fed back.



> ​
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: People with knowledge of English swear words needed :o

Neil Harris
In reply to this post by Petr Bena
On 19/09/13 10:35, Petr Bena wrote:

> Are you good in swearing? WE NEED YOU
>
> Huggle 3 comes with vandalism-prediction as it is precaching the diffs
> even before they are enqueued including their contents. Each edit has
> so called "score" which is a numerical value that if higher, the edit
> is more likely a vandalism.
>
> If you want to help us improve this feature, it is necessary to define
> a "score words" list for every wiki where huggle is about to be used,
> for example on English wiki.
>
> Each list has following syntax:
>
> (see https://en.wikipedia.org/w/index.php?title=Wikipedia:Huggle/Config&diff=573615259&oldid=573615075)
>
>
> score-words(score):
>      list of words separated by comma, can contain newlines but comma
> must be present
>
> example
>
> score-words(200):
>      these, are, some, words, which, presence, of, increases, the, score,
>      each, word, by, 200,
>

[[en:User:/DeltaQuad/UAA/Blacklist]] contains a fairly comprehensive
overview of English-language profanity and general trash-talk formatted
as regexps, mixed in with other non-sweary blocking patterns that are
specific to that blacklist's needs.

Neil


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: People with knowledge of English swear words needed :o

Amir Sarabadani-2
About swears in English language, sorry I can't help but I'm very good at
Persian :D,
We have an abuse filter about Persian swears which is hidden from public
https://fa.wikipedia.org/wiki/%D9%88%DB%8C%DA%98%D9%87:%D9%BE%D8%A7%D9%84%D8%A7%DB%8C%D9%87_%D8%AE%D8%B1%D8%A7%D8%A8%DA%A9%D8%A7%D8%B1%DB%8C/4

And It works pretty good, So If you need to i18n huggle, this page will be
a good help

Best


On Thu, Sep 19, 2013 at 8:59 PM, Neil Harris <[hidden email]> wrote:

> On 19/09/13 10:35, Petr Bena wrote:
>
>> Are you good in swearing? WE NEED YOU
>>
>> Huggle 3 comes with vandalism-prediction as it is precaching the diffs
>> even before they are enqueued including their contents. Each edit has
>> so called "score" which is a numerical value that if higher, the edit
>> is more likely a vandalism.
>>
>> If you want to help us improve this feature, it is necessary to define
>> a "score words" list for every wiki where huggle is about to be used,
>> for example on English wiki.
>>
>> Each list has following syntax:
>>
>> (see https://en.wikipedia.org/w/**index.php?title=Wikipedia:**
>> Huggle/Config&diff=573615259&**oldid=573615075<https://en.wikipedia.org/w/index.php?title=Wikipedia:Huggle/Config&diff=573615259&oldid=573615075>
>> )
>>
>>
>> score-words(score):
>>      list of words separated by comma, can contain newlines but comma
>> must be present
>>
>> example
>>
>> score-words(200):
>>      these, are, some, words, which, presence, of, increases, the, score,
>>      each, word, by, 200,
>>
>>
> [[en:User:/DeltaQuad/UAA/**Blacklist]] contains a fairly comprehensive
> overview of English-language profanity and general trash-talk formatted as
> regexps, mixed in with other non-sweary blocking patterns that are specific
> to that blacklist's needs.
>
> Neil
>
>
>
> ______________________________**_________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/**mailman/listinfo/wikitech-l<https://lists.wikimedia.org/mailman/listinfo/wikitech-l>
>



--
Amir
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: People with knowledge of English swear words needed :o

Helder .
In reply to this post by C. Scott Ananian
On Thu, Sep 19, 2013 at 11:19 AM, C. Scott Ananian
<[hidden email]> wrote:
> Perhaps we could use some Math here?  Can we grab a list of the last, say,
> 100,000 edits reverted for vandalism, look at the diff, and compute a
> frequency score based on that?
>  --scott

I did something like that in JavaScript:
https://github.com/he7d3r/mw-gadget-WordFrequencyOnRevertedEdits/

Helder

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l