Captcha for non-English speakers II

classic Classic list List threaded Threaded
28 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Captcha for non-English speakers II

Everton Zanella Alvarenga
Hi all,

how are you? I'd like to know about the possibility of solving an old
issue with CAPTCHA for Wikipedias in languages other than English.
This bug

https://bugzilla.wikimedia.org/show_bug.cgi?id=5309

was created in 2006. There is a discussion here about having CAPTCHA
in other languages from February 2012

http://thread.gmane.org/gmane.science.linguistics.wikipedia.technical/51951/

but it seems there was no conclusion. After working on campus with new
editors in Brazil, I've checked this is a real obstacle, since most
people here cannot ready English at all.

I'd like to know if there are plans to solve this issue - I hope I
don't sound rude, maybe this can be a minor issue when we don't see
the difficulties people from a different place can face. I think this
is important for Wikipedias other than the English one (just read
people comments in the bug) and we can be loosing new contributors
because of their first impressions. Thanks,

Tom

--
Everton Zanella Alvarenga (also Tom)
Wikimedia Brasil
Wikimedia Foundation

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Captcha for non-English speakers II

Everton Zanella Alvarenga
2012/7/26 Everton Zanella Alvarenga <[hidden email]>:

> was created in 2006. There is a discussion here about having CAPTCHA
> in other languages from February 2012
>
> http://thread.gmane.org/gmane.science.linguistics.wikipedia.technical/51951/

Sorry, I meant 2011.

--
Everton Zanella Alvarenga (also Tom)
Wikimedia Brasil
Wikimedia Foundation

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Captcha for non-English speakers II

Hunter Fernandes
Is there a such thing as localized captchas?

And should turning off account/ip creation throttling for events also
turn off the captcha requirement?
- Hunter F.


On Thu, Jul 26, 2012 at 6:54 AM, Everton Zanella Alvarenga
<[hidden email]> wrote:

> 2012/7/26 Everton Zanella Alvarenga <[hidden email]>:
>
>> was created in 2006. There is a discussion here about having CAPTCHA
>> in other languages from February 2012
>>
>> http://thread.gmane.org/gmane.science.linguistics.wikipedia.technical/51951/
>
> Sorry, I meant 2011.
>
> --
> Everton Zanella Alvarenga (also Tom)
> Wikimedia Brasil
> Wikimedia Foundation
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Captcha for non-English speakers II

Federico Leva (Nemo)
Ehm, I know that I'll sound like a broken record, but look at the
WikiCAPTCHA proposal: it's just a proposal, but it could address the
problem "just" by fetching books from the relevant Wikisource.
Links in: https://www.mediawiki.org/wiki/CAPTCHA

Nemo

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Captcha for non-English speakers II

Neil Harris
In reply to this post by Hunter Fernandes
On 26/07/12 14:58, Hunter Fernandes wrote:
> Is there a such thing as localized captchas?
>
> And should turning off account/ip creation throttling for events also
> turn off the captcha requirement?
> - Hunter F.
>

It's really a matter of configuration; the core captcha code is
intrinsically language-agnostic.

The existing captcha code takes input from a file with a few thousand
short words in, then generates the captchas from a pair of those words.

To localize the captcha, all that is needed is to arrange that a
different word list (and image pool) is used for each language.

If you have a language you want the captcha implemented in, a good first
thing to do would be to create a list of say 4 to 5,000 short words in
that language for use by the captcha code.

-- N.


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Captcha for non-English speakers II

Platonides
In reply to this post by Everton Zanella Alvarenga
On 26/07/12 15:53, Everton Zanella Alvarenga wrote:

> Hi all,
>
> how are you? I'd like to know about the possibility of solving an old
> issue with CAPTCHA for Wikipedias in languages other than English.
> This bug
>
> https://bugzilla.wikimedia.org/show_bug.cgi?id=5309
>
> was created in 2006. There is a discussion here about having CAPTCHA
> in other languages from February 2012
>
> http://thread.gmane.org/gmane.science.linguistics.wikipedia.technical/51951/
>
> but it seems there was no conclusion. After working on campus with new
> editors in Brazil, I've checked this is a real obstacle, since most
> people here cannot ready English at all.

Thet don't need to read English. They just need to type the letters they
see on the image. Sure, you can have a small advantage if you know what
letters could make a valid English word (or if you have the captcha
dictionary installed), but a Brazilian which can read wikipedia should
have no problems typing the captcha.

That said, it's easy enough to make a different set of captchas if we
are provided a suitable dictionary of words (note that we don't want
non-ansi letters such as ç in the captcha in case it's seen by a foreign
user which doesn't have such letter on its keyboard).



_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Captcha for non-English speakers II

Everton Zanella Alvarenga
2012/7/26 Platonides <[hidden email]>:

> Thet don't need to read English. They just need to type the letters they
> see on the image. Sure, you can have a small advantage if you know what
> letters could make a valid English word (or if you have the captcha
> dictionary installed), but a Brazilian which can read wikipedia should
> have no problems typing the captcha.

If that is the case, why don't we change the CAPTCH for random letters?

--
Everton Zanella Alvarenga (also Tom)
Wikimedia Brasil
Wikimedia Foundation

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Captcha for non-English speakers II

Yury Katkov
In reply to this post by Everton Zanella Alvarenga
I think that making Russian, Korean and Arabian captcha is really bad idea.
English keyboad layout is installed by default in all operation systems, as
far as I know. Moreover very interesting problems can appear if this
feature would be implemented. Who will decide what captcha language is
used? We can look at user IP address - then sometimes the foreigners will
be in trouble. We can use Ukrainian capcha for the Ukrainian wesites - thus
assuming that every person who knows Ukrainian has the Ukrainian keyboard
layout, which is not true.
I think that the assumption that "everyone in the internet is able to print
English letters loking at their noised example" is not very bold assumption.
26.07.2012 17:53 пользователь "Everton Zanella Alvarenga" <
[hidden email]> написал:

> Hi all,
>
> how are you? I'd like to know about the possibility of solving an old
> issue with CAPTCHA for Wikipedias in languages other than English.
> This bug
>
> https://bugzilla.wikimedia.org/show_bug.cgi?id=5309
>
> was created in 2006. There is a discussion here about having CAPTCHA
> in other languages from February 2012
>
>
> http://thread.gmane.org/gmane.science.linguistics.wikipedia.technical/51951/
>
> but it seems there was no conclusion. After working on campus with new
> editors in Brazil, I've checked this is a real obstacle, since most
> people here cannot ready English at all.
>
> I'd like to know if there are plans to solve this issue - I hope I
> don't sound rude, maybe this can be a minor issue when we don't see
> the difficulties people from a different place can face. I think this
> is important for Wikipedias other than the English one (just read
> people comments in the bug) and we can be loosing new contributors
> because of their first impressions. Thanks,
>
> Tom
>
> --
> Everton Zanella Alvarenga (also Tom)
> Wikimedia Brasil
> Wikimedia Foundation
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Captcha for non-English speakers II

Martijn Hoekstra
Maybe present three or four different capcha's with different scripts,
requiring only one to be filled out?

On Fri, Jul 27, 2012 at 8:09 PM, Yury Katkov <[hidden email]> wrote:

> I think that making Russian, Korean and Arabian captcha is really bad idea.
> English keyboad layout is installed by default in all operation systems, as
> far as I know. Moreover very interesting problems can appear if this
> feature would be implemented. Who will decide what captcha language is
> used? We can look at user IP address - then sometimes the foreigners will
> be in trouble. We can use Ukrainian capcha for the Ukrainian wesites - thus
> assuming that every person who knows Ukrainian has the Ukrainian keyboard
> layout, which is not true.
> I think that the assumption that "everyone in the internet is able to print
> English letters loking at their noised example" is not very bold assumption.
> 26.07.2012 17:53 пользователь "Everton Zanella Alvarenga" <
> [hidden email]> написал:
>
>> Hi all,
>>
>> how are you? I'd like to know about the possibility of solving an old
>> issue with CAPTCHA for Wikipedias in languages other than English.
>> This bug
>>
>> https://bugzilla.wikimedia.org/show_bug.cgi?id=5309
>>
>> was created in 2006. There is a discussion here about having CAPTCHA
>> in other languages from February 2012
>>
>>
>> http://thread.gmane.org/gmane.science.linguistics.wikipedia.technical/51951/
>>
>> but it seems there was no conclusion. After working on campus with new
>> editors in Brazil, I've checked this is a real obstacle, since most
>> people here cannot ready English at all.
>>
>> I'd like to know if there are plans to solve this issue - I hope I
>> don't sound rude, maybe this can be a minor issue when we don't see
>> the difficulties people from a different place can face. I think this
>> is important for Wikipedias other than the English one (just read
>> people comments in the bug) and we can be loosing new contributors
>> because of their first impressions. Thanks,
>>
>> Tom
>>
>> --
>> Everton Zanella Alvarenga (also Tom)
>> Wikimedia Brasil
>> Wikimedia Foundation
>>
>> _______________________________________________
>> Wikitech-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Captcha for non-English speakers II

Max Semenik
In reply to this post by Yury Katkov
On 27.07.2012, 22:09 Yury wrote:

> I think that making Russian, Korean and Arabian captcha is really bad idea.
> English keyboad layout is installed by default in all operation systems, as
> far as I know. Moreover very interesting problems can appear if this
> feature would be implemented. Who will decide what captcha language is
> used? We can look at user IP address - then sometimes the foreigners will
> be in trouble. We can use Ukrainian capcha for the Ukrainian wesites - thus
> assuming that every person who knows Ukrainian has the Ukrainian keyboard
> layout, which is not true.
> I think that the assumption that "everyone in the internet is able to print
> English letters loking at their noised example" is not very bold assumption.

Even funnier: imagine a Eeuropean trying to just read a Chinese
captcha:)

--
Best regards,
  Max Semenik ([[User:MaxSem]])


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Captcha for non-English speakers II

Strainu
2012/7/28 Max Semenik <[hidden email]>:

> On 27.07.2012, 22:09 Yury wrote:
>
>> I think that making Russian, Korean and Arabian captcha is really bad idea.
>> English keyboad layout is installed by default in all operation systems, as
>> far as I know. Moreover very interesting problems can appear if this
>> feature would be implemented. Who will decide what captcha language is
>> used? We can look at user IP address - then sometimes the foreigners will
>> be in trouble. We can use Ukrainian capcha for the Ukrainian wesites - thus
>> assuming that every person who knows Ukrainian has the Ukrainian keyboard
>> layout, which is not true.
>> I think that the assumption that "everyone in the internet is able to print
>> English letters loking at their noised example" is not very bold assumption.
>
> Even funnier: imagine a Eeuropean trying to just read a Chinese
> captcha:)

Funny as it may be, this is a non-problem. You can easily have a "give
me an English CAPTCHA" link... And that would be one more step for a
robot to learn, that is, one more (thin) defence line.

Strainu

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Captcha for non-English speakers II

Platonides
In reply to this post by Everton Zanella Alvarenga
On 27/07/12 16:31, Everton Zanella Alvarenga wrote:
> 2012/7/26 Platonides <[hidden email]>:
>
>> Thet don't need to read English. They just need to type the letters they
>> see on the image. Sure, you can have a small advantage if you know what
>> letters could make a valid English word (or if you have the captcha
>> dictionary installed), but a Brazilian which can read wikipedia should
>> have no problems typing the captcha.
>
> If that is the case, why don't we change the CAPTCH for random letters?

You should probably ask Neil Harris, the author of the captcha generator
we use.

from his 06/02/2011 mail:
> The wordlists themselves need not be secret: they are only needed to
> create easily-typed strings that are sufficiently large in number to
> provide a moderate challenge to brute force guessing.


I have added a random captcha at http://test.wikipedia.beta.wmflabs.org/
You can try adding urls at
http://test.wikipedia.beta.wmflabs.org/w/index.php?title=Main_Page&action=edit
and http://en.wikipedia.beta.wmflabs.org/wiki/Wikipedia:Sandbox for
comparing the presented captchas.

(yes, testwikibeta is quite broken right now, but the captchas show)


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Captcha for non-English speakers II

Everton Zanella Alvarenga
"Usability of CAPTCHAs Or usability issues in CAPTCHA design", Jeff
Yan and Ahmad Salah El Ahmad (Newcastle University, UK)

http://homepages.cs.ncl.ac.uk/jeff.yan/soups08.pdf

Pages 3 and 4:

"Friendly to foreigners? In theory, text-based CAPTCHAs are
intuitive to world-wide users and have little localization issues –
these were recognised by many researchers (e.g. [5]) as major
advantages of text-based CAPTCHAs over other schemes.
However, in a small scale test carried out with 20 students in the
first author’s class in October 2007, we observed that many
foreign students whose mother tongue does not use the Latin
alphabet performed much worse than those whose first language
is based on Latin alphabet (e.g. native English speakers), when
asked to recognise distorted challenges generated by BaffleText
[6], an early text-based scheme. The former found it hard to
recognise (or even guess) distorted letters in the scheme."

[...]

"The performance difference between foreigners and natives does
not appear to be large in the case of reCAPTCHA. However,
given the size of population using this service (hundreds of
thousands websites serving millions of people at least, for
example, popular sites such as Facebook and Twitter are amongst
subscribers of this service), this “being friendly to foreigners”
issue can be a serious usability concern. Moreover, for schemes
whose designers were unaware of this issue, usability problems
caused can be even worse."

[...]

In the conclusion:

"Contrary to the common belief, text-based CAPTCHAs can be difficult
for foreigners."

It is worth reading and likely the same for references there in. The
first sentence is similar to what I have experience in 3 classes. And
people begin to get anxious and usually say "If I type wrongly again,
I'll give up". I've seen 3 students saying this to me.

Even if hypothetically had in an experiment that only 1% of foreigners
will face difficulties with CAPTCHA in a foreign language (I bet it's
much more from real life experience), how much users this would
represent in one of the most accessed sites in the world?

Tom

--
Everton Zanella Alvarenga (also Tom)
Wikimedia Brasil
Wikimedia Foundation

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Captcha for non-English speakers II

Platonides
On 28/07/12 16:55, Everton Zanella Alvarenga wrote:

> In the conclusion:
>
> "Contrary to the common belief, text-based CAPTCHAs can be difficult
> for foreigners."
>
> It is worth reading and likely the same for references there in. The
> first sentence is similar to what I have experience in 3 classes. And
> people begin to get anxious and usually say "If I type wrongly again,
> I'll give up". I've seen 3 students saying this to me.
>
> Even if hypothetically had in an experiment that only 1% of foreigners
> will face difficulties with CAPTCHA in a foreign language (I bet it's
> much more from real life experience), how much users this would
> represent in one of the most accessed sites in the world?
>
> Tom

There are two types of "foreigners" here:
- One are speakers of another language written in latin1 (such as
Brazilians).
- Another are those who use a diferent writing script, such as Russians
or Greeks.

In the first case, they should have little problem. Native speakers of
the language used for the wordlist have an extra help, because they are
more likely to recognise the words and it can also help them perform
error recovery.

It would be nice to provide a captcha with a native wordlist, but by
limiting to ascii characters, it can get pretty universal.

Distortion where a letter looks like a different one is still
problematic. Even people with English knowledge can have trouble with
it, so being a native speaker doesn't magically make you invulnerable to
captcha errors.
On 16th July of 2007 Arnomane reported a case where "o" distortion made
it look like an "a", on August I reported another where an "s" looked
like a "g".
I expect that using random characters would make it worse, though.

People with other scripts are a different matter.
* They may not be able to recognise the latin characters.
* You may be forcing them to change the language layouts for solving the
captcha.
* Foreign visitors may not be able to pass your captcha.
** Lack of appropiate keyboard layout.
** Unable to differenciate the characters (you want me to differenciate
ت  and ث distorted in a noisy background?)
** No fonts installed for viewing the characters (eg. 𓀝 vs 𓀞) such as
if you were trying to browse the in character map the  script characters
of the language (potentially hundreds!) looking for a visual match.

Yet, there are reports such as this by Liangent (native Chinese speaker)
on this list on 5th February 2011:
> I hate the case that I'm asked with a Chinese captcha when I'm surfing
> some Chinese websites without IME available.
>
> Besides I don't prefer Chinese captchas personally because Chinese
> characters usually require more key hits.


At least for those languages I think we would need a switch to get a
captcha in the different "language".

We should also add the "button to get a new captcha" (bug 14230), which
should help when you get the wrong captcha.
And I think we should also add a "Problems solving the captcha? Mail us"
link for those cases when people can't pass the captcha.
Not that it would solve their problems, but it would at least provide a
way to lighten their frustration.


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Captcha for non-English speakers II

Pau Giner
From the UX perspective, a captcha is always an obstacle for the
interaction flow.
Reducing the complexity of user interaction when solving the captcha can
benefit all kinds of users but also solve problems for non-English speakers.

Checkbox and honeypot-based captchas avoid most of the problems of
text-based captchas since interaction is simplified to the minimum for the
user:
http://uxmovement.com/forms/captchas-vs-spambots-why-the-checkbox-captcha-wins/


Simple questions where the user can select an answer (not type) will solve
some of the input-related issues for non-English speakers.
These questions can be of different kinds (e.g., "Which one does not belong
to the group: Red, Green, Skateboard, Blue?", "Is fire hot or cold?") and
they can be based on text or image selection.
An example of image-based captcha is available at
http://www.picatcha.com/captcha/

Tagging media can be also used as a captcha. Google has been experimenting
with asking users to tag videos as a captcha:
http://cups.cs.cmu.edu/soups/2009/proceedings/a14-kleuver.pdf  [PDF]


In any case, some experimentation would be required to determine any of the
above approaches (or combination of several) provides an appropriate
security-usability balance for the specific needs of the Wikipedia.


Pau



On Sat, Jul 28, 2012 at 8:29 PM, Platonides <[hidden email]> wrote:

> On 28/07/12 16:55, Everton Zanella Alvarenga wrote:
> > In the conclusion:
> >
> > "Contrary to the common belief, text-based CAPTCHAs can be difficult
> > for foreigners."
> >
> > It is worth reading and likely the same for references there in. The
> > first sentence is similar to what I have experience in 3 classes. And
> > people begin to get anxious and usually say "If I type wrongly again,
> > I'll give up". I've seen 3 students saying this to me.
> >
> > Even if hypothetically had in an experiment that only 1% of foreigners
> > will face difficulties with CAPTCHA in a foreign language (I bet it's
> > much more from real life experience), how much users this would
> > represent in one of the most accessed sites in the world?
> >
> > Tom
>
> There are two types of "foreigners" here:
> - One are speakers of another language written in latin1 (such as
> Brazilians).
> - Another are those who use a diferent writing script, such as Russians
> or Greeks.
>
> In the first case, they should have little problem. Native speakers of
> the language used for the wordlist have an extra help, because they are
> more likely to recognise the words and it can also help them perform
> error recovery.
>
> It would be nice to provide a captcha with a native wordlist, but by
> limiting to ascii characters, it can get pretty universal.
>
> Distortion where a letter looks like a different one is still
> problematic. Even people with English knowledge can have trouble with
> it, so being a native speaker doesn't magically make you invulnerable to
> captcha errors.
> On 16th July of 2007 Arnomane reported a case where "o" distortion made
> it look like an "a", on August I reported another where an "s" looked
> like a "g".
> I expect that using random characters would make it worse, though.
>
> People with other scripts are a different matter.
> * They may not be able to recognise the latin characters.
> * You may be forcing them to change the language layouts for solving the
> captcha.
> * Foreign visitors may not be able to pass your captcha.
> ** Lack of appropiate keyboard layout.
> ** Unable to differenciate the characters (you want me to differenciate
> ت  and ث distorted in a noisy background?)
> ** No fonts installed for viewing the characters (eg. 𓀝 vs 𓀞) such as
> if you were trying to browse the in character map the  script characters
> of the language (potentially hundreds!) looking for a visual match.
>
> Yet, there are reports such as this by Liangent (native Chinese speaker)
> on this list on 5th February 2011:
> > I hate the case that I'm asked with a Chinese captcha when I'm surfing
> > some Chinese websites without IME available.
> >
> > Besides I don't prefer Chinese captchas personally because Chinese
> > characters usually require more key hits.
>
>
> At least for those languages I think we would need a switch to get a
> captcha in the different "language".
>
> We should also add the "button to get a new captcha" (bug 14230), which
> should help when you get the wrong captcha.
> And I think we should also add a "Problems solving the captcha? Mail us"
> link for those cases when people can't pass the captcha.
> Not that it would solve their problems, but it would at least provide a
> way to lighten their frustration.
>
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



--
Pau Giner
Interaction Designer
Wikimedia Foundation
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Captcha for non-English speakers II

Daniel Friesen-4
Those checkbox and honeypot captchas look like junk to me.

Firstly the checkbox captcha. It relies entirely on the assumption that  
spambots don't have JavaScript. It also assumes that spambots won't simply  
get wise and throw a few regexp tests to figure out when the plugin is  
sitting on the page inserting a form element. If people actually start  
using checkbox captchas they will inevitably become useless.
Additionally it imposes the requirement that the client has JavaScript  
enabled simply to make an edit. This is something we consider unacceptable.

honeypot-captchas... yeah, we already have that:
https://www.mediawiki.org/wiki/Extension:SimpleAntiSpam
If it weren't for the fact that it's useless for login-only and private  
wikis I'd bake it right into core.
honeypot-captchas aren't actually captchas. As a testament to that a real  
captcha and SimpleAntiSpam can be installed at the same time.
And I do recommend you do that. SimpleAntiSpam trips up the trivial bots  
while the captcha deals with the non-trivial link inserting bots.
But that's all they do. Beyond the most worthless of spambots,  
honeypot-captchas have absolutely no value. If a bot is capable of  
breaking any normal captcha it is already sophisticated enough that a  
honeypot-captcha will do absolutely nothing.

Need I remind people we have bots walking around that know how to register  
and login to MediaWiki. Know how to deal with image captchas. Know how to  
wait for autoconfirmed status. Know how to confirm an AbuseFilter warning  
page. And even know how to upload an image and use it in wikitext.

--
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]
On Mon, 30 Jul 2012 06:28:13 -0700, Pau Giner <[hidden email]> wrote:

> From the UX perspective, a captcha is always an obstacle for the
> interaction flow.
> Reducing the complexity of user interaction when solving the captcha can
> benefit all kinds of users but also solve problems for non-English  
> speakers.
>
> Checkbox and honeypot-based captchas avoid most of the problems of
> text-based captchas since interaction is simplified to the minimum for  
> the
> user:
> http://uxmovement.com/forms/captchas-vs-spambots-why-the-checkbox-captcha-wins/
>
>
> Simple questions where the user can select an answer (not type) will  
> solve
> some of the input-related issues for non-English speakers.
> These questions can be of different kinds (e.g., "Which one does not  
> belong
> to the group: Red, Green, Skateboard, Blue?", "Is fire hot or cold?") and
> they can be based on text or image selection.
> An example of image-based captcha is available at
> http://www.picatcha.com/captcha/
>
> Tagging media can be also used as a captcha. Google has been  
> experimenting
> with asking users to tag videos as a captcha:
> http://cups.cs.cmu.edu/soups/2009/proceedings/a14-kleuver.pdf  [PDF]
>
>
> In any case, some experimentation would be required to determine any of  
> the
> above approaches (or combination of several) provides an appropriate
> security-usability balance for the specific needs of the Wikipedia.
>
>
> Pau
>
>
>
> On Sat, Jul 28, 2012 at 8:29 PM, Platonides <[hidden email]> wrote:
>
>> On 28/07/12 16:55, Everton Zanella Alvarenga wrote:
>> > In the conclusion:
>> >
>> > "Contrary to the common belief, text-based CAPTCHAs can be difficult
>> > for foreigners."
>> >
>> > It is worth reading and likely the same for references there in. The
>> > first sentence is similar to what I have experience in 3 classes. And
>> > people begin to get anxious and usually say "If I type wrongly again,
>> > I'll give up". I've seen 3 students saying this to me.
>> >
>> > Even if hypothetically had in an experiment that only 1% of foreigners
>> > will face difficulties with CAPTCHA in a foreign language (I bet it's
>> > much more from real life experience), how much users this would
>> > represent in one of the most accessed sites in the world?
>> >
>> > Tom
>>
>> There are two types of "foreigners" here:
>> - One are speakers of another language written in latin1 (such as
>> Brazilians).
>> - Another are those who use a diferent writing script, such as Russians
>> or Greeks.
>>
>> In the first case, they should have little problem. Native speakers of
>> the language used for the wordlist have an extra help, because they are
>> more likely to recognise the words and it can also help them perform
>> error recovery.
>>
>> It would be nice to provide a captcha with a native wordlist, but by
>> limiting to ascii characters, it can get pretty universal.
>>
>> Distortion where a letter looks like a different one is still
>> problematic. Even people with English knowledge can have trouble with
>> it, so being a native speaker doesn't magically make you invulnerable to
>> captcha errors.
>> On 16th July of 2007 Arnomane reported a case where "o" distortion made
>> it look like an "a", on August I reported another where an "s" looked
>> like a "g".
>> I expect that using random characters would make it worse, though.
>>
>> People with other scripts are a different matter.
>> * They may not be able to recognise the latin characters.
>> * You may be forcing them to change the language layouts for solving the
>> captcha.
>> * Foreign visitors may not be able to pass your captcha.
>> ** Lack of appropiate keyboard layout.
>> ** Unable to differenciate the characters (you want me to differenciate
>> ت  and ث distorted in a noisy background?)
>> ** No fonts installed for viewing the characters (eg. 𓀝 vs 𓀞) such as
>> if you were trying to browse the in character map the  script characters
>> of the language (potentially hundreds!) looking for a visual match.
>>
>> Yet, there are reports such as this by Liangent (native Chinese speaker)
>> on this list on 5th February 2011:
>> > I hate the case that I'm asked with a Chinese captcha when I'm surfing
>> > some Chinese websites without IME available.
>> >
>> > Besides I don't prefer Chinese captchas personally because Chinese
>> > characters usually require more key hits.
>>
>>
>> At least for those languages I think we would need a switch to get a
>> captcha in the different "language".
>>
>> We should also add the "button to get a new captcha" (bug 14230), which
>> should help when you get the wrong captcha.
>> And I think we should also add a "Problems solving the captcha? Mail us"
>> link for those cases when people can't pass the captcha.
>> Not that it would solve their problems, but it would at least provide a
>> way to lighten their frustration.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Captcha for non-English speakers II

Platonides
In reply to this post by Pau Giner
On 30/07/12 15:28, Pau Giner wrote:
> From the UX perspective, a captcha is always an obstacle for the
> interaction flow.
I agree.
But when you're spammed to death if there's no captcha, you end up
accepting it as a necessary evil.
But don't let this pessimistic view stop you from proposing new
alternatives.


> Reducing the complexity of user interaction when solving the captcha can
> benefit all kinds of users but also solve problems for non-English speakers.
>
> Checkbox and honeypot-based captchas avoid most of the problems of
> text-based captchas since interaction is simplified to the minimum for the
> user:
> http://uxmovement.com/forms/captchas-vs-spambots-why-the-checkbox-captcha-wins/

No. Those work against generic spambots. For a small site, pretty much
any custom-made captcha will work.
When someone designs against your captcha, you need to provide a hard test.
If we were comparing against a math captcha, checkbox is more usable
while only slightly weaker. None of them has a chance against a captcha
designed against them.

If you run Wikipedia, bad guys will work to defeat your captcha and
spam/vandalise/annoy you.
If you are developing MediaWiki, a wiki used in thousands of sites [1],
spammers will work to make bots capable to spam those many MediaWiki
installs (cf. DantMan reply)
If you are Open Source, then it's much harder to make (not only due to
security by obscurity of the code, but also of the own challenges...).


1- http://www.google.com/search?q=%22powered%20by%20mediawiki%22
~201.000.000 results



> Simple questions where the user can select an answer (not type) will solve
> some of the input-related issues for non-English speakers.
> These questions can be of different kinds (e.g., "Which one does not belong
> to the group: Red, Green, Skateboard, Blue?", "Is fire hot or cold?") and
> they can be based on text or image selection.
> An example of image-based captcha is available at
> http://www.picatcha.com/captcha/

No.
Those are *harder* since you need a knowledge of English language and terms.

I can fill in a text captcha in a foreign language site since its own
appearance (after being trained by hundreds of sites!) shows what it is
expected from me.
If I go to http://www.picatcha.com/captcha/, I am asked to "Select ALL
the images of «concept»". Which is fine but requires me to know what is
that «concept». I might eg. think that hourglasses are a kind of
spectacles (eyeglasses) and get very annoyed by not being able to pass it.

Also, making good questions is tricky. You need to produce loads of that
kind of questions with their answers, if you made just a few hundreds
(eg. it's done by a human), I could make a list of questions with their
answer (manually solved) and spam you as many times I want.

You want to make intelligent questions hard for bots, but anyone should
be able to solve them, even if they are young, uneducated or foreign.
I may know that I have to rule colors out, but I don't which of
skateboard vs turquoise is the color.
And yet, you can't dumbify it so much that a computer will be able to
answer it.

Suppose you are performing questions of type "Is X Y or Z?" and have
made thousands of pairs (that you can't share!).
A naive approach would just to answer Y or Z at random, accepting a 50%
of failure (bots don't mind resending their requests many times, a 50%
blocking captcha is broken). But we can do better, when you ask my bot
"Is fire hot or cold?" it could go and search google for those concepts:
* fire hot 1.210.000.000 results
* fire cold 656.000.000 results

There's a very clear correlation of fire with hot rather than with cold,
thus it chooses 'hot', and defeats your captcha. :)



> Tagging media can be also used as a captcha. Google has been experimenting
> with asking users to tag videos as a captcha:
> http://cups.cs.cmu.edu/soups/2009/proceedings/a14-kleuver.pdf  [PDF]

If we were doing this with Wikimedia Commons videos
a) The video set is known, as are the descriptions. Ergo, match the
video with its file and .
b) IMHO having to watch a video (even if short) is *more* annoying than
typing a text captcha.*
c) No/poor localisation.


* This needs to be balanced with how much you want to enter the
captcha-walled garden, of course. I may accept watching your CEO
boasting about your service (from which you then ask me the captcha**)
in exchange for a gmail-like mail account or multigigabyte dropbox
storage, but not to watch one everytime I sign in!

** Don't complain if he's tagged by most users as 'boring'. :)


> In any case, some experimentation would be required to determine any of the
> above approaches (or combination of several) provides an appropriate
> security-usability balance for the specific needs of the Wikipedia.

We would first need an evaluation of what is considered spam, and how to
measure. If we get lots of bots the next day you enable it, it's clearly
broken, but how much time would we need before being x% confident that
it is secure enough, when you are just waiting some random guy to decide
coding against your challenge?


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Captcha for non-English speakers II

Tei-2
Sounds like captchas is something you want to make plug and play, and
use some external project that is evolving quickly to stay in the
winning side of a arms race.
Also sounds like captchas is something you want to be handled by
locals, to avoid the situation a chinese wiki with a english captcha.

Is pretty much proved that "small self-made captchas" don't do for
something like mediawiki, because attackers target it and is a huge
delicious target.


Has experience of people with AI and computer power raise, perhaps
this will become a lost battle*. The other option is anon can't edit
articles, ...anon edits are invisible and waiting for moderation,
..anon changes are satinified in some way (perhaps not allowing new
external links / modiying links ).


* I can imagine the ability of bots to understand catpchas will grown,
but not the ability of humans.


--
--
ℱin del ℳensaje.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Captcha for non-English speakers II

James Forrester-4
In reply to this post by Platonides
On 30 July 2012 15:22, Platonides <[hidden email]> wrote:
> On 30/07/12 15:28, Pau Giner wrote:
>> From the UX perspective, a captcha is always an obstacle for the
>> interaction flow.
>
> I agree. But when you're spammed to death if there's no captcha,
> you end up accepting it as a necessary evil.

Just to jump in here, it's not actually clear that our CAPTCHAs work
at all at this point (per Tim's e-mail from last year of being able to
robotically break ours 75% of the time).

On https://www.mediawiki.org/wiki/Admin_tools_development (created
last week), we in WMF Engineering noted that we'd want to look
properly at some data around these CAPTCHAs and how they're working.
This might show us that it would be sensible to just turn them off
(which of course would help usability for all users), as long as we're
happy that the tools for preventing the vandalism they were intended
to stop are working well.

Yours,
--
James D. Forrester
Product Manager for Visual Editor and Flagged Revisions
Wikimedia Foundation, Inc.

[hidden email] | @jdforrester | +1 415-839-6885 x6844

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Captcha for non-English speakers II

Risker
On 31 July 2012 13:53, James Forrester <[hidden email]> wrote:

> On 30 July 2012 15:22, Platonides <[hidden email]> wrote:
> > On 30/07/12 15:28, Pau Giner wrote:
> >> From the UX perspective, a captcha is always an obstacle for the
> >> interaction flow.
> >
> > I agree. But when you're spammed to death if there's no captcha,
> > you end up accepting it as a necessary evil.
>
> Just to jump in here, it's not actually clear that our CAPTCHAs work
> at all at this point (per Tim's e-mail from last year of being able to
> robotically break ours 75% of the time).
>
> On https://www.mediawiki.org/wiki/Admin_tools_development (created
> last week), we in WMF Engineering noted that we'd want to look
> properly at some data around these CAPTCHAs and how they're working.
> This might show us that it would be sensible to just turn them off
> (which of course would help usability for all users), as long as we're
> happy that the tools for preventing the vandalism they were intended
> to stop are working well.
>
> Yours,
> -
>
>
Putting on my checkuser hat for a moment - yes, please please look at
finding a different CAPTCHA process - the cross-wiki spamming by bots that
are able to "break" the CAPTCHA is becoming overwhelming.  This issue has
been reported separately, and there may be a different fix, but this is a
pretty big deal as a few hundred volunteer hours a month are going into the
despamming effort.

Risker/Anne
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
12