AntiSpoof issues

classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

AntiSpoof issues

Tim Starling-2
We've been having quite a few complaints about false positives from the
AntiSpoof extension -- an extension which attempts to prevent registration
of names which are confusingly similar to names already registered. Brion
responded to these complaints with "get a sysop to make the account for
you", but I don't think that's a very good solution. So I've been working on
the AntiSpoof extension today, attempting to make it a bit more relaxed.

The most fundamental problem is the problem of merging sets. Say if we want
to treat visually similar characters as part of a set, and we also want to
treat letters which are the same except for their case as part of a set. So,
for example, say if we have the following pairs:

Η (capital eta) = H (latin)
Η (capital eta) = η (lowercase eta)
η (lowercase eta) = n (latin)

If we merge all these pairs into a set, following the relations, we obtain
the result that latin n is the same as latin H. This is incorrect, and is
the cause of most of the bizarre false positives that we see with AntiSpoof.

The problem is that merging sets is fairly fundamental to the way AntiSpoof
works -- i.e. by calculating a canonical representation of the username,
storing it and indexing it. So it's not going to change any time soon unless
we get really clever. But there are some things we can do to minimise its
effects.

The first and most obvious thing to do was to remove the transliteration
pairs. These are pairs of characters where one member of the pair is a
common phonetic transliteration of the other, e.g. cyrillic en "Н" = latin
E. This was the cause of most of the spurious conflations between latin
characters. This should now be done.

There are now three remaining categories of conflated character pairs: case
folding, visual similarity and chinese traditional/simplified conversion.

The second thing to do is to minimise cross-script pairs. Since cross-script
usernames are disallowed, cross-script pairs are mostly redundant. You could
make a case to leave some of them in, for example some latin usernames can
be spoofed entirely using cyrillic characters. And some communities may have
a special need for allowing a certain pair of scripts in a username (e.g.
latin and hiragana). It's best if we can just keep the pairs which are
visually very similar, and consciously avoid including cross-script pairs
which will cause false conflations within scripts.

I've done some work on this, but I think it's time to hand over the job to
the community, if the community wants it. I've created a page with a big
list of pairs, at:

http://www.mediawiki.org/wiki/AntiSpoof/Equivalence_sets

You can edit this page. I will update the live copy on request.

Really clever ideas on how to avoid merging sets while maintaining good
performance would be appreciated.

Another misfeature in AntiSpoof which was causing false positives was the
fact that it merged sequences of repeated characters. For example, Yuma was
considered to be equal to Uma, because Y=U (from a transliteration pair),
and UUma = Uma. I've removed this behaviour.

I should really get a blog...

-- Tim Starling

_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: AntiSpoof issues

David Gerard-2
On 12/11/06, Tim Starling <[hidden email]> wrote:

> I should really get a blog...


wikitech.livejournal.com

Even lj user=brionv posts there ... sometimes ...


- d.
_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: AntiSpoof issues

Steve Summit
In reply to this post by Tim Starling-2
Tim Starling wrote:
> If we merge all these pairs into a set, following the relations, we obtain
> the result that latin n is the same as latin H. This is incorrect, and is
> the cause of most of the bizarre false positives that we see with AntiSpoof.
>
> The problem is that merging sets is fairly fundamental to the way AntiSpoof
> works....

Clearly a more flexible/sophisticated approach, rather than
calling all these characters "equivalent", would be to assign
some quantitative visual difference between them, and when
traversing a chain such as n -> eta -> Eta -> H, to sum the
numbers (or something) rather than considering the equivalences
to be a purely transitive relationship.

But obviously that's much more expensive than computing, storing,
and indexing a single canonical representation for each string.

A hybrid approach I've contemplated (but not implemented, so I
can't prove it works) is to use the canonical representations to
generate expansive sets of candidate collisions, but then to do
a more sophisticated (perhaps distance-based) comparison of just
those candidates, to weed out the false positives.

Anyone interested in this issue should consult Unicode Technical
Standard #39, "Unicode Security Mechanisms", at <http://www.unicode.org/
reports/tr39/>.  In particular, its discussion of "confusables"
is basically the same issue we're talking about here.  See also
the Unicode data file "confusables.txt".
_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: AntiSpoof issues

Gregory Maxwell
In reply to this post by Tim Starling-2
On 11/12/06, Tim Starling <[hidden email]> wrote:
[snip]
> The problem is that merging sets is fairly fundamental to the way AntiSpoof
> works -- i.e. by calculating a canonical representation of the username,
> storing it and indexing it.
[snip]

Two pass:

Use the current high compression function to locate candate matches
nice and quickly from a non-unique index.

Then take the real potential match names and compare them directly
using a more intelligent comparison. (i.e. 'n'!='H').

The compression function could be made more lossy so that it will
identify a large but not unreasonable number of potentials.

We could even assign points to varrious kinds of matches and deny past
a threshold. This would also make it easier to support bi/trigram
triggers such as  cI ~= d .. which perhaps get more interesting when
we consider the entire UTF-8 charset.
_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: AntiSpoof issues

Gregory Maxwell
In reply to this post by Steve Summit
On 11/12/06, Steve Summit <[hidden email]> wrote:
[snip]
> A hybrid approach I've contemplated (but not implemented, so I
> can't prove it works) is to use the canonical representations to
> generate expansive sets of candidate collisions, but then to do
> a more sophisticated (perhaps distance-based) comparison of just
> those candidates, to weed out the false positives.
[snip]

Woops.
/me reminds self to read thread before replying.

Yes, this is an interesting idea.  If anyone codes whats proposed, it
would be useful to extend it to support multiple compression
functions, for example in addition to the simmar chacter metric it
would be useful to have a comparison based on double metaphone:

dmetaphone('Sterling') == dmetaphone('Starling')  //Indexed lookup
levenshtein('Tim Starling','Tim Sterling') == 1  //Second pass

(I have no clue if php has handy standard library functions for
dmetaphone and levenshtein distance.. I'm using the ones in
postgresql.)
_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: AntiSpoof issues

Simetrical
On 11/12/06, Gregory Maxwell <[hidden email]> wrote:
> (I have no clue if php has handy standard library functions for
> dmetaphone and levenshtein distance.. I'm using the ones in
> postgresql.)

Levenshtein has a library function:
http://us2.php.net/manual/en/function.levenshtein.php

DoubleMetaPhone has at least one PHP implementation, which appears to
be maybe free enough for us to use (and I'd guess the author would
license it to be free enough if it's not):
http://swoodbridge.com/DoubleMetaPhone/
_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: AntiSpoof issues

Gregory Maxwell
In reply to this post by Tim Starling-2
On 11/12/06, Tim Starling <[hidden email]> wrote:
> We've been having quite a few complaints about false positives from the
> AntiSpoof extension -- an extension which attempts to prevent registration

Sorry to post again, another thought on this:

It would probably be useful to reduce the comparison to the set of
users with no non-deleted edits. Who care if someone spoofs a deleted
user?

A quick DELETE FROM antispoofwhatevertable WHERE NOT EXISTS (SELECT 1
FROM revision WHERE rev_user_text=whatevertable.user limit 1);    or
the like would accomplish that and cut down on the false positives.
_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: AntiSpoof issues

Neil Harris-2
In reply to this post by Steve Summit
Steve Summit wrote:

> Tim Starling wrote:
>  
>> If we merge all these pairs into a set, following the relations, we obtain
>> the result that latin n is the same as latin H. This is incorrect, and is
>> the cause of most of the bizarre false positives that we see with AntiSpoof.
>>
>> The problem is that merging sets is fairly fundamental to the way AntiSpoof
>> works....
>>    
>
> Clearly a more flexible/sophisticated approach, rather than
> calling all these characters "equivalent", would be to assign
> some quantitative visual difference between them, and when
> traversing a chain such as n -> eta -> Eta -> H, to sum the
> numbers (or something) rather than considering the equivalences
> to be a purely transitive relationship.
>
> But obviously that's much more expensive than computing, storing,
> and indexing a single canonical representation for each string.
>
> A hybrid approach I've contemplated (but not implemented, so I
> can't prove it works) is to use the canonical representations to
> generate expansive sets of candidate collisions, but then to do
> a more sophisticated (perhaps distance-based) comparison of just
> those candidates, to weed out the false positives.
>
>  
I have already discussed something exactly like the above in E-mail.

As you have suggested above, the idea was to use the big dumb
equivalence set table as a first hack to spot possible spoof candidates,
and then to apply more sophisticaed processing using, among other
things, the UTR#39 confusables.txt tables on up to N of the spoof
candidates, falling back to the dumb algorithm if the number of
candidates exceeds N, where N is perhaps 20. (This limit is needed to
avoid denial of service attacks via the antispoof algorithm.)

Indeed, if this is implemented, the canonicalization function could be
made even more of a catch-all, allow ing the catching of even more
nasties than the existing code, since the second, more sophisticated,
pass would then be able to clean up the larger number of false positives
that would be generated by a more aggressive first pass.

I'd be happy to code this up in Python, for translation into PHP.
> Anyone interested in this issue should consult Unicode Technical
> Standard #39, "Unicode Security Mechanisms", at <http://www.unicode.org/
> reports/tr39/>.  In particular, its discussion of "confusables"
> is basically the same issue we're talking about here.  See also
> the Unicode data file "confusables.txt".
>
>  

I'm actively working on this label-spoofing problem for another project,
so I'm well aware of UTR #39. As Tim has observed, the current
equivalence sets are the transitive closure of the equivalence relations
in UTR #39's confusables.txt file (plus some extra nasties), the Unicode
uppercasing relationships, and the relationships created by discarding
combining marks to uncover the base character. The script-mixing
constraints are also taken directly from UTR#39.

I've also got some suggestions that could be added to tighten up the
existing integration into MediaWiki, by dealing with a couple of edge
cases that are currently less than optimal.

-- Neil

_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: AntiSpoof issues

Neil Harris
In reply to this post by Tim Starling-2
Tim Starling wrote:

> We've been having quite a few complaints about false positives from the
> AntiSpoof extension -- an extension which attempts to prevent registration
> of names which are confusingly similar to names already registered. Brion
> responded to these complaints with "get a sysop to make the account for
> you", but I don't think that's a very good solution. So I've been working on
> the AntiSpoof extension today, attempting to make it a bit more relaxed.
>
> The most fundamental problem is the problem of merging sets. Say if we want
> to treat visually similar characters as part of a set, and we also want to
> treat letters which are the same except for their case as part of a set. So,
> for example, say if we have the following pairs:
>
> Η (capital eta) = H (latin)
> Η (capital eta) = η (lowercase eta)
> η (lowercase eta) = n (latin)
>
> If we merge all these pairs into a set, following the relations, we obtain
> the result that latin n is the same as latin H. This is incorrect, and is
> the cause of most of the bizarre false positives that we see with AntiSpoof.
>
> The problem is that merging sets is fairly fundamental to the way AntiSpoof
> works -- i.e. by calculating a canonical representation of the username,
> storing it and indexing it. So it's not going to change any time soon unless
> we get really clever. But there are some things we can do to minimise its
> effects.
>
> The first and most obvious thing to do was to remove the transliteration
> pairs. These are pairs of characters where one member of the pair is a
> common phonetic transliteration of the other, e.g. cyrillic en "Н" = latin
> E. This was the cause of most of the spurious conflations between latin
> characters. This should now be done.
>
> There are now three remaining categories of conflated character pairs: case
> folding, visual similarity and chinese traditional/simplified conversion.
>
> The second thing to do is to minimise cross-script pairs. Since cross-script
> usernames are disallowed, cross-script pairs are mostly redundant. You could
> make a case to leave some of them in, for example some latin usernames can
> be spoofed entirely using cyrillic characters. And some communities may have
> a special need for allowing a certain pair of scripts in a username (e.g.
> latin and hiragana). It's best if we can just keep the pairs which are
> visually very similar, and consciously avoid including cross-script pairs
> which will cause false conflations within scripts.
>
> I've done some work on this, but I think it's time to hand over the job to
> the community, if the community wants it. I've created a page with a big
> list of pairs, at:
>
> http://www.mediawiki.org/wiki/AntiSpoof/Equivalence_sets
>
> You can edit this page. I will update the live copy on request.
>
> Really clever ideas on how to avoid merging sets while maintaining good
> performance would be appreciated.
>
> Another misfeature in AntiSpoof which was causing false positives was the
> fact that it merged sequences of repeated characters. For example, Yuma was
> considered to be equal to Uma, because Y=U (from a transliteration pair),
> and UUma = Uma. I've removed this behaviour.
>
> I should really get a blog...
>
> -- Tim Starling
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Hi Tim;

I've already thought of this (see my recent E-mail on the Wikitech list
-- for some reason, I can't find the lengthy E-mail I thought I'd sent
earlier that I refer to there).

Fortunately, not much "real cleverness" is needed.

The basic idea is the one suggested by multiple posters on the list:
* an aggressive canonicalization process (which must still have the
transitivity requirement above)
* looking up candidates with matching canonical forms (up to some limit,
perhaps 20, to stop denial-of-service attacks)
* if #(candidates) > limit, treat as a spoof, to fail-safe
* then a second pass to do the checking _much_ more carefully, without
any need for transitivity or over-compression

I'd be happy to E-mail you an implementation in Python of the very
simple but more careful second-pass code, as a function
are_confusable_strings() that takes two Python strings as input, and
returns a boolean value. This can then be called from the PHP pass.

If we do this, we should be able to make the first pass even more
aggressive than it is currently, to catch more possible spoof
candidates, whilst still eliminating false positives in the second pass,
thus improving both the false-positive and false-negative rates to a
fraction of their current levels.

We should _not_ remove the cross-script pairs from the list, as there
are still whole-script confusables, eg "caxap", "soccer" --
surprisingly, 3% of English dictionary words have matching Cyrillic
spoofs, and 1% have Greek spoofs -- however, the second pass should
completely eliminate any problems caused by the transitivity in the
first pass.

-- Neil
 
_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: AntiSpoof issues

Tim Starling-2
Neil Harris wrote:

> Hi Tim;
>
> I've already thought of this (see my recent E-mail on the Wikitech list
> -- for some reason, I can't find the lengthy E-mail I thought I'd sent
> earlier that I refer to there).
>
> Fortunately, not much "real cleverness" is needed.
>
> The basic idea is the one suggested by multiple posters on the list:
> * an aggressive canonicalization process (which must still have the
> transitivity requirement above)
> * looking up candidates with matching canonical forms (up to some limit,
> perhaps 20, to stop denial-of-service attacks)
> * if #(candidates) > limit, treat as a spoof, to fail-safe
> * then a second pass to do the checking _much_ more carefully, without
> any need for transitivity or over-compression
>
> I'd be happy to E-mail you an implementation in Python of the very
> simple but more careful second-pass code, as a function
> are_confusable_strings() that takes two Python strings as input, and
> returns a boolean value. This can then be called from the PHP pass.

Sure, email away.

> If we do this, we should be able to make the first pass even more
> aggressive than it is currently, to catch more possible spoof
> candidates, whilst still eliminating false positives in the second pass,
> thus improving both the false-positive and false-negative rates to a
> fraction of their current levels.

Generally speaking, you can't tell whether a given pair of names is an
attempted spoof just by comparing the strings. You need to know the
motivation of the person who created it. On the one hand we have users who
want to find the minimal variation of their given name or Internet nickname
that isn't already taken, and on the other hand, we have trolls who want to
find the minimal variation of an existing username that isn't disallowed by
the software. Both users wish to evade the software restrictions, but one of
them has a motivation that we will tolerate, and one of them does not.

As Gregory suggested, one useful heuristic would be to look at the number of
edits of the target user. Another one that I proposed on IRC yesterday is a
length heuristic -- i.e. collisions of short usernames are more likely to be
accidental than collisions of long ones.

> We should _not_ remove the cross-script pairs from the list, as there
> are still whole-script confusables, eg "caxap", "soccer" --
> surprisingly, 3% of English dictionary words have matching Cyrillic
> spoofs, and 1% have Greek spoofs -- however, the second pass should
> completely eliminate any problems caused by the transitivity in the
> first pass.

We have to remove some of the cross-script pairs until the software is
changed, to fix the spurious within-script conflations. I'm not going to
make everyone suffer while we have our leisurely chat about possible
long-term fixes.

There is a need for judgement, regardless of the software in use. Trolls
will go on trolling regardless of what anti-spoofing restrictions we have in
place. Our aim should be to minimise their impact, and heuristic systems
with a high false positive rate do quite the opposite.

-- Tim Starling

_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: AntiSpoof issues

Gregory Maxwell
On 11/12/06, Tim Starling <[hidden email]> wrote:
[snip]
> There is a need for judgement, regardless of the software in use. Trolls
> will go on trolling regardless of what anti-spoofing restrictions we have in
> place. Our aim should be to minimise their impact, and heuristic systems
> with a high false positive rate do quite the opposite.

This note brings to mind an interesting homework assignment for the list...

Can we think of a good way to impliment "interactive intervention" in
mediawiki which neither adds weird backend requirements (works with
the nonpersistantness of php) or odd client requirements (no java or
the like).

The idea is that we have hundreds of people in IRC.. many people RC
patrolling.   There are *many* sorts of activities which software can
mark as suspect but which require judgement.  Is there a reasonable
way for us to get that judgement in real-time?
_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: AntiSpoof issues

Brion Vibber
In reply to this post by Gregory Maxwell
Gregory Maxwell wrote:
> On 11/12/06, Tim Starling <[hidden email]> wrote:
>> We've been having quite a few complaints about false positives from the
>> AntiSpoof extension -- an extension which attempts to prevent registration
>
> Sorry to post again, another thought on this:
>
> It would probably be useful to reduce the comparison to the set of
> users with no non-deleted edits. Who care if someone spoofs a deleted
> user?

We can expect that, say, [[User:Jimbo Wales]] won't actually have any
edits on the majority of our wikis. Does that mean there's no need to
check for spoofs of that username?

-- brion vibber (brion @ pobox.com)
_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: AntiSpoof issues

Alphax (Wikipedia email)
Brion Vibber wrote:

> Gregory Maxwell wrote:
>> On 11/12/06, Tim Starling <[hidden email]> wrote:
>>> We've been having quite a few complaints about false positives from the
>>> AntiSpoof extension -- an extension which attempts to prevent registration
>> Sorry to post again, another thought on this:
>>
>> It would probably be useful to reduce the comparison to the set of
>> users with no non-deleted edits. Who care if someone spoofs a deleted
>> user?
>
> We can expect that, say, [[User:Jimbo Wales]] won't actually have any
> edits on the majority of our wikis. Does that mean there's no need to
> check for spoofs of that username?
>
Yet Another Reason why SUL is needed yesterday...

--
Alphax - http://en.wikipedia.org/wiki/User:Alphax
Contributor to Wikipedia, the Free Encyclopedia
"We make the internet not suck" - Jimbo Wales
Public key: http://en.wikipedia.org/wiki/User:Alphax/OpenPGP


_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l

signature.asc (581 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: AntiSpoof issues

Brion Vibber
Alphax (Wikipedia email) wrote:

> Brion Vibber wrote:
>> Gregory Maxwell wrote:
>>> On 11/12/06, Tim Starling <[hidden email]> wrote:
>>>> We've been having quite a few complaints about false positives from the
>>>> AntiSpoof extension -- an extension which attempts to prevent registration
>>> Sorry to post again, another thought on this:
>>>
>>> It would probably be useful to reduce the comparison to the set of
>>> users with no non-deleted edits. Who care if someone spoofs a deleted
>>> user?
>> We can expect that, say, [[User:Jimbo Wales]] won't actually have any
>> edits on the majority of our wikis. Does that mean there's no need to
>> check for spoofs of that username?
>>
>
> Yet Another Reason why SUL is needed yesterday...

I'll be putting up the merging UI for localization & testing tomorrow.

-- brion vibber (brion @ pobox.com)
_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: AntiSpoof issues

Tim Starling-2
In reply to this post by Brion Vibber
Brion Vibber wrote:
> We can expect that, say, [[User:Jimbo Wales]] won't actually have any
> edits on the majority of our wikis. Does that mean there's no need to
> check for spoofs of that username?

Add an exception list. You can't expect a simple heuristic to get it right
in every case. [[User:Michael]] once created LiveJournal accounts
impersonating myself and a number of other English Wikipedia sysops. Do you
think any anti-spoof technology on LJ's site would have prevented this?

We have lots of users with no edits. Among them, famous people are surely in
the minority.

-- Tim Starling

_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: AntiSpoof issues

Jay Ashworth-2
In reply to this post by Tim Starling-2
On Mon, Nov 13, 2006 at 03:15:01PM +1100, Tim Starling wrote:
> Generally speaking, you can't tell whether a given pair of names is an
> attempted spoof just by comparing the strings. You need to know the
> motivation of the person who created it.

I think there's a function for that in the Python 2.6 libraries...

Cheers,
-- jra
--
Jay R. Ashworth                                                [hidden email]
Designer                          Baylink                             RFC 2100
Ashworth & Associates        The Things I Think                        '87 e24
St Petersburg FL USA      http://baylink.pitas.com             +1 727 647 1274

        "That's women for you; you divorce them, and 10 years later,
          they stop having sex with you."  -- Jennifer Crusie; _Fast_Women_
_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: AntiSpoof issues

Arne 'Timwi' Heizmann
In reply to this post by Gregory Maxwell
Gregory Maxwell wrote:

> On 11/12/06, Tim Starling <[hidden email]> wrote:
> [snip]
>
>>There is a need for judgement, regardless of the software in use. Trolls
>>will go on trolling regardless of what anti-spoofing restrictions we have in
>>place. Our aim should be to minimise their impact, and heuristic systems
>>with a high false positive rate do quite the opposite.
>
> This note brings to mind an interesting homework assignment for the list...
>
> Can we think of a good way to impliment "interactive intervention" in
> mediawiki which neither adds weird backend requirements (works with
> the nonpersistantness of php) or odd client requirements (no java or
> the like).
>
> The idea is that we have hundreds of people in IRC.. many people RC
> patrolling.   There are *many* sorts of activities which software can
> mark as suspect but which require judgement.  Is there a reasonable
> way for us to get that judgement in real-time?

But that's easy...

* User tries to create an account
* Software responds, "The username you chose is very similar to the
   username of an existing user. In order to ensure that you are not
   trying to impersonate someone else, an administrator will have to
   approve your username manually. Approval is usually processed within
   <average timeframe>. How do you wish to proceed?"
   [ Request approval ] [ Try a different username ]
* User clicks "Request approval". Software responds, "Your request for
   approval has been sent off to the administrators. You will receive an
   e-mail as soon as approval has been granted or rejected."
* Either an e-mail is sent to a mailing list, or a wiki page is updated,
   or (my preferred way) a special dedicated feature in MediaWiki is
   invoked, which alerts volunteers to the awaiting approval.
* An administrator accepts or rejects the request. If it is accepted,
   the normal welcome e-mail with the confirmation link is sent to the
   user. Otherwise, an e-mail informs the user of the rejection.

Timwi

_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: AntiSpoof issues

Mark Clements (HappyDog)
"Timwi" <[hidden email]> wrote in message
news:ejd0fd$pug$[hidden email]...
> Gregory Maxwell wrote:
> > On 11/12/06, Tim Starling
<[hidden email]> wrote:
> > [snip]
> >
> >>There is a need for judgement, regardless of the software in use. Trolls
> >>will go on trolling regardless of what anti-spoofing restrictions we
have in
> >>place. Our aim should be to minimise their impact, and heuristic systems
> >>with a high false positive rate do quite the opposite.
> >
> > This note brings to mind an interesting homework assignment for the
list...

> >
> > Can we think of a good way to impliment "interactive intervention" in
> > mediawiki which neither adds weird backend requirements (works with
> > the nonpersistantness of php) or odd client requirements (no java or
> > the like).
> >
> > The idea is that we have hundreds of people in IRC.. many people RC
> > patrolling.   There are *many* sorts of activities which software can
> > mark as suspect but which require judgement.  Is there a reasonable
> > way for us to get that judgement in real-time?
>
> But that's easy...
>
> * User tries to create an account
> * Software responds, "The username you chose is very similar to the
>    username of an existing user. In order to ensure that you are not
>    trying to impersonate someone else, an administrator will have to
>    approve your username manually. Approval is usually processed within
>    <average timeframe>. How do you wish to proceed?"
>    [ Request approval ] [ Try a different username ]
> * User clicks "Request approval". Software responds, "Your request for
>    approval has been sent off to the administrators. You will receive an
>    e-mail as soon as approval has been granted or rejected."
> * Either an e-mail is sent to a mailing list, or a wiki page is updated,
>    or (my preferred way) a special dedicated feature in MediaWiki is
>    invoked, which alerts volunteers to the awaiting approval.
> * An administrator accepts or rejects the request. If it is accepted,
>    the normal welcome e-mail with the confirmation link is sent to the
>    user. Otherwise, an e-mail informs the user of the rejection.
>
> Timwi


This requires that the user supplies an e-mail address - not currently a
requirement, so far as I know...

--
- Mark Clements (HappyDog)



_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: AntiSpoof issues

Arne 'Timwi' Heizmann
Mark Clements wrote:

> "Timwi" <[hidden email]> wrote in message
> news:ejd0fd$pug$[hidden email]...
>
>>Gregory Maxwell wrote:
>>
>>>Can we think of a good way to impliment "interactive intervention" in
>>>mediawiki which neither adds weird backend requirements (works with
>>>the nonpersistantness of php) or odd client requirements (no java or
>>>the like).
>>>
>>>The idea is that we have hundreds of people in IRC.. many people RC
>>>patrolling.   There are *many* sorts of activities which software can
>>>mark as suspect but which require judgement.  Is there a reasonable
>>>way for us to get that judgement in real-time?
>>
>>But that's easy... [long suggestion snipped]
>
> This requires that the user supplies an e-mail address - not currently a
> requirement, so far as I know...

Do you have any better ideas?

Note that it is still perfectly possible to register without an e-mail
address if you don't trigger the anti-spoof system, which I would hope
would be the vast majority of cases. If someone is determined to have a
certain username because it's their Internet handle (and not because
they're trying to impersonate someone), they normally wouldn't mind
supplying at least a temporary e-mail address.

Timwi

_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: AntiSpoof issues

Johannes Ernst-2
Is this a good time to mention that there's a very nice OpenID  
extension to MediaWiki?

See in action here:
     http://openid.net/wiki/

On Nov 15, 2006, at 12:17, Timwi wrote:

> Mark Clements wrote:
>> "Timwi" <[hidden email]> wrote in message
>> news:ejd0fd$pug$[hidden email]...
>>
>>> Gregory Maxwell wrote:
>>>
>>>> Can we think of a good way to impliment "interactive  
>>>> intervention" in
>>>> mediawiki which neither adds weird backend requirements (works with
>>>> the nonpersistantness of php) or odd client requirements (no  
>>>> java or
>>>> the like).
>>>>
>>>> The idea is that we have hundreds of people in IRC.. many people RC
>>>> patrolling.   There are *many* sorts of activities which  
>>>> software can
>>>> mark as suspect but which require judgement.  Is there a reasonable
>>>> way for us to get that judgement in real-time?
>>>
>>> But that's easy... [long suggestion snipped]
>>
>> This requires that the user supplies an e-mail address - not  
>> currently a
>> requirement, so far as I know...
>
> Do you have any better ideas?
>
> Note that it is still perfectly possible to register without an e-mail
> address if you don't trigger the anti-spoof system, which I would hope
> would be the vast majority of cases. If someone is determined to  
> have a
> certain username because it's their Internet handle (and not because
> they're trying to impersonate someone), they normally wouldn't mind
> supplying at least a temporary e-mail address.
>
> Timwi
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> http://mail.wikipedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
12