Importing Google Map causes XML parse error

classic Classic list List threaded Threaded
32 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: jibberish

Eric K-2
I agree, you're right, to be more accurate, captcha only makes certain that a human is editing the page (then to get more technical, complex bots can solve the captcha). Throttling is also necessary - anything to prevent bots from doing the things they do good.
   
 

Rob Church <[hidden email]> wrote:
  On 16/10/2007, Eric K wrote:
> There is definitely no way to check if an edit is spam or not, except for capthcha.

I have to point out the flaw in that statement, tenuous thought it is
- a CAPTCHA does *not* constitute an anti-spam acid test; all it does
is confirms that, to the best of the test's ability (which might not
count for anything), that we are dealing with a human being, rather
than an automated program.

A human could quite well post spam to his/her heart's content, and
would be able to pass a CAPTCHA (we hope). The default configuration
settings for ConfirmEdit, which CAPTCHA extensions are based upon,
allow registered users to skip these tests, so in theory, one could
set up a spam bot with a few minutes of initial human assistance,
which is why we supplement such things with throttles, "heuristics"
(regular expressions aren't that great in terms of configurability,
but I cling to the hope that one day we'll have decent spam-edit
detection heuristics, even if just for the basics).


Rob Church

_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l


       
---------------------------------
Don't let your dream ride pass you by.    Make it a reality with Yahoo! Autos.
_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: jibberish

Chuck-45
In reply to this post by 2007@gmask.com
[hidden email] wrote:

> This is what is happening to me as well.. but the inserted words are
> allways at the beginning of the page which gives me hope in blocking
> these types of bot edits with a regex.

Right. This is the same bot we're having problems with.

Chuck

_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

How does the Show/Hide funcationality work on this template?

Shah, Nikhil
In reply to this post by Eric K-2
I want to add the functionality similar to what is available here:

 http://en.wikipedia.org/wiki/Template:Anarchism

Any Idea how the Show/Hide feature is working?

Thanks,

Nikhil

_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: jibberish REGEX syntax help

2007@gmask.com
In reply to this post by Chuck-45
So what would the syntax be to match something that begins at the start
of the page?

Sort of what I'm thinking is to try and match anonymous users who post
under a certain number of characters to the beginning of a page.

But it seems like regex is limited to matching the beginning of a line.

-Adrian

--- Chuck <[hidden email]> wrote:

> [hidden email] wrote:
>
> > This is what is happening to me as well.. but the inserted words
> are
> > allways at the beginning of the page which gives me hope in
> blocking
> > these types of bot edits with a regex.
>
> Right. This is the same bot we're having problems with.
>
> Chuck
>


_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: jibberish

Michael Daly-3
In reply to this post by Chuck-45
[hidden email] wrote:

> This is what is happening to me as well.. but the inserted words are
> allways at the beginning of the page which gives me hope in blocking
> these types of bot edits with a regex.

I was thinking that this could be checked against a dictionary.  If the
first "word" inserted is not in the dictionary (for the page's
language), require the user to confirm the save.  A bot won't confirm.

This would have to be smart enough to skip wikitext (e.g. don't worry
about "[[Image:").  Similarly, it would choke on obscure acronyms, but a
real person would not likely complain too much.

This could be a hook into the "save" code and only need check for the
first word.  However, the bot writer can switch to posting at the end of
the article...  Possibly, a scan of the entire page to reject
exceptionally bad spelling might suffice, but will put off some
contributers (and annoy US vs Canadian vs British spellers if the bad
spelling algorithm isn't smart enough to think honour vs honor isn't
that bad).

Mike




_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: jibberish

David A. Desrosiers-2
In reply to this post by Chuck-45
On Tue, 2007-10-16 at 10:04 -0500, Chuck wrote:
> My wikis are getting spammed with short text strings like "copasnotra"
> and "romonboel". Based on my limited understanding of spambots, it
> seems like the bots are making these changes as a prelude to doing
> something else

What they're doing is polluting the database of heuristics, by inserting
either common or nonsense words. For example, if (prior to this tactic),
the amount of "spammy" words in the table (Viagra, etc.) was 80% of the
total number of words, they fill the database with common, nonsense
words to get the quality of the filter to lower itself enough to let the
spammy words back through, by pushing them down below that threshold.

I've seen this used for years while using dspam, but thankfully for us,
dspam has kept us 100% spam-free for years. Not a single spam email or
other garbage in any user's mailbox going on years, with only very
minimal false-positives.

Perhaps a look at their methods, and rolling those in to mediawiki's
anti-spammy comment approach might be worthwhile?


--
David A. Desrosiers
[hidden email]
[hidden email]
http://projects.plkr.org/
Skype...: 860-967-3820

_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: jibberish

Dan Bolser-3
In reply to this post by Michael Daly-3
On 16/10/2007, Michael Daly <[hidden email]> wrote:

> [hidden email] wrote:
>
> > This is what is happening to me as well.. but the inserted words are
> > allways at the beginning of the page which gives me hope in blocking
> > these types of bot edits with a regex.
>
> I was thinking that this could be checked against a dictionary.  If the
> first "word" inserted is not in the dictionary (for the page's
> language), require the user to confirm the save.  A bot won't confirm.
>
> This would have to be smart enough to skip wikitext (e.g. don't worry
> about "[[Image:").  Similarly, it would choke on obscure acronyms, but a
> real person would not likely complain too much.
>
> This could be a hook into the "save" code and only need check for the
> first word.  However, the bot writer can switch to posting at the end of
> the article...  Possibly, a scan of the entire page to reject
> exceptionally bad spelling might suffice, but will put off some
> contributers (and annoy US vs Canadian vs British spellers if the bad
> spelling algorithm isn't smart enough to think honour vs honor isn't
> that bad).

So; 1) We are all seeing the same kind of spam. 2) We need something
that looks at the whole edit, and isn't based on some trivial aspect
of the particular spam attack (that could easily be changed). 3) We
need something that goes beyond an 'are you a human captcha' - because
such tests are either too infrequent to be useful or too common to be
tenable.

4) What is wrong with a Bayesian (email style) spam filter?

Each edit gets certain attributes set - username and email or IP
address, number of good edits from this user, edit frequency of this
user, edit diff text, etc. - and then the Bayesian filter flags the
edit with a 'level of spamminess'. Depending on configuration spammy
edits can be flat out rejected with multiple spams leading to
automatic bans. Or potential spam can be queued in a special list of
edits for review (the review process being key to learning the
patterns of spam). Such a filter could equally be applied to
vandalism... Also (while I am at it) sysops will have the option to
'mark edit as spam', providing more data for the training algorithm.

So there is only one problem... Were should we start?

Some Googling for PHP code to nick looks promising...

http://www.phpclasses.org/browse/file/9319.html Guestbook Example with
SpamFilter
http://www.squirrelmail.org/plugin_view.php?id=115 uses a Bayesian
algorithm to determine what you consider to be spam.

_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: How does the Show/Hide funcationality work on this template?

Platonides
In reply to this post by Shah, Nikhil
Shah, Nikhil wrote:
> I want to add the functionality similar to what is available here:
>
>  http://en.wikipedia.org/wiki/Template:Anarchism
>
> Any Idea how the Show/Hide feature is working?
>
> Thanks,
>
> Nikhil

With JavaScript:
http://en.wikipedia.org/wiki/Wikipedia:NavFrame


_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: How does the Show/Hide funcationality work onthis template?

Shah, Nikhil
Thank you Platonides this is exactly what I was looking for.

The common.css & common.js available on
http://en.wikipedia.org/wiki/Wikipedia:NavFrame are very different from
the one I have currently.

I am wondering if the best way to implement is to manually merger the
files? Any other shortcut?
 

-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of Platonides
Sent: Wednesday, October 17, 2007 08:35
To: [hidden email]
Subject: Re: [Mediawiki-l] How does the Show/Hide funcationality work
onthis template?

Shah, Nikhil wrote:
> I want to add the functionality similar to what is available here:
>
>  http://en.wikipedia.org/wiki/Template:Anarchism
>
> Any Idea how the Show/Hide feature is working?
>
> Thanks,
>
> Nikhil

With JavaScript:
http://en.wikipedia.org/wiki/Wikipedia:NavFrame


_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l

_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: jibberish

Karl Schmidt
In reply to this post by Dan Bolser-3
Dan Bolser wrote:

>
> 4) What is wrong with a Bayesian (email style) spam filter?
Look at bogofilter.  There is no reason you couldn't pipe all changes through it - and
creating HAM and SPAM files for sorting and training. I use it for email with very good results.

----------------------------------------------------------------
Karl Schmidt                         EMail [hidden email]
Transtronics, Inc.                     WEB http://xtronics.com
3209 West 9th Street                    Ph (785) 841-3089
Lawrence, KS 66049                     FAX (785) 841-0434

Why are so many spending time watching dark movies about
hopelessness, the macabre, and perversion; why are they reading
books about unfaithfulness and self destruction?  Why is nothing
uplifting, also considered 'cool' or entertaining? -kps

----------------------------------------------------------------

_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: jibberish REGEX syntax help

Christensen, Courtney
In reply to this post by 2007@gmask.com
How about similar to this?
$text = explode("\n", $revision->getText());
If (preg_match($gibberishRegex, $text[0]) ) {
        Return "bad user";
} else {
        Return "ok";
}

-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of
[hidden email]
Sent: Tuesday, October 16, 2007 2:34 PM
To: [hidden email]
Subject: Re: [Mediawiki-l] jibberish REGEX syntax help

So what would the syntax be to match something that begins at the start
of the page?

Sort of what I'm thinking is to try and match anonymous users who post
under a certain number of characters to the beginning of a page.

But it seems like regex is limited to matching the beginning of a line.

-Adrian

--- Chuck <[hidden email]> wrote:

> [hidden email] wrote:
>
> > This is what is happening to me as well.. but the inserted words
> are
> > allways at the beginning of the page which gives me hope in
> blocking
> > these types of bot edits with a regex.
>
> Right. This is the same bot we're having problems with.
>
> Chuck
>


_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l

_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: jibberish REGEX syntax help

Dan Bolser-3
What will you do when the pattern of spam immediately changes?

On 19/10/2007, Christensen, Courtney <[hidden email]> wrote:

> How about similar to this?
> $text = explode("\n", $revision->getText());
> If (preg_match($gibberishRegex, $text[0]) ) {
>         Return "bad user";
> } else {
>         Return "ok";
> }
>
> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of
> [hidden email]
> Sent: Tuesday, October 16, 2007 2:34 PM
> To: [hidden email]
> Subject: Re: [Mediawiki-l] jibberish REGEX syntax help
>
> So what would the syntax be to match something that begins at the start
> of the page?
>
> Sort of what I'm thinking is to try and match anonymous users who post
> under a certain number of characters to the beginning of a page.
>
> But it seems like regex is limited to matching the beginning of a line.
>
> -Adrian
>
> --- Chuck <[hidden email]> wrote:
>
> > [hidden email] wrote:
> >
> > > This is what is happening to me as well.. but the inserted words
> > are
> > > allways at the beginning of the page which gives me hope in
> > blocking
> > > these types of bot edits with a regex.
> >
> > Right. This is the same bot we're having problems with.
> >
> > Chuck
> >
>
>
> _______________________________________________
> MediaWiki-l mailing list
> [hidden email]
> http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
>
> _______________________________________________
> MediaWiki-l mailing list
> [hidden email]
> http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
>


--
hello

_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
12