Invisible malicious changes.

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Invisible malicious changes.

Tei-2
Possibly off-topic.


Heres is a script that replace normal whitespace with one of the
whitespaces supported by UTF8 ( Others are
          ​  ).

I have made a few vandalization test here:
http://en.wikipedia.org/wiki/User:Tei/lalaland

What do you guys think? could this be a problem? You can break links
like [[Mr Thonson]] replacing it by [[Mr Thonson]]

while(<DATA>){
 @chars = split(//,$_);

 foreach $ch (@chars){
   if ( $ch eq " "){
      print pack("ccc",0xe2,0x80,0x80);
   }else {
      print $ch;
   }
 }
}

__DATA__
Text to be vandalized goes here
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Invisible malicious changes.

Aryeh Gregor
On Thu, Oct 2, 2008 at 6:33 PM, Tei <[hidden email]> wrote:
> Heres is a script that replace normal whitespace with one of the
> whitespaces supported by UTF8 ( Others are
> &#32;&#160;&#5760;&#8192;&#8193;&#8194;&#8195;&#8196;&#8198;&#8199;&#8203;&#8239;&#8287;).
>
> I have made a few vandalization test here:
> http://en.wikipedia.org/wiki/User:Tei/lalaland
>
> What do you guys think? could this be a problem? You can break links
> like [[Mr Thonson]] replacing it by [[Mr Thonson]]

We don't want to ban all Unicode whitespace.  Some of it is useful,
which is why it's in Unicode.  :)  For the specific case of titles,
see bug 1414:

https://bugzilla.wikimedia.org/show_bug.cgi?id=1414
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Invisible malicious changes.

Charlotte Webb
Comment  #3 From Brion Vibber  2005-04-25 06:10:15 UTC  -------
> It might make sense to explicitly disallow the Zl and Zp chars (line separator
> and paragraph separator), and normalize all the Zs
> chars to spaces (well, underscores) in title processing.

At first glance this seems like a trivial and uncontroversial change,
so I'm curious why it wasn't done 3 1/2 years ago.

On the other hand some browsers apparently convert esoteric whitespace
literals back to \u0020 in the <textarea> anyway whether the original
change was malicious or not.

http://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)/Archive_42#Problem_with_non-breaking_space

-_-

—C.W.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Invisible malicious changes.

Nikola Smolenski
In reply to this post by Aryeh Gregor
On Friday 03 October 2008 02:23:37 Aryeh Gregor wrote:

> On Thu, Oct 2, 2008 at 6:33 PM, Tei <[hidden email]> wrote:
> > Heres is a script that replace normal whitespace with one of the
> > whitespaces supported by UTF8 ( Others are
> > &#32;&#160;&#5760;&#8192;&#8193;&#8194;&#8195;&#8196;&#8198;&#8199;&#8203
> >;&#8239;&#8287;).
> >
> > I have made a few vandalization test here:
> > http://en.wikipedia.org/wiki/User:Tei/lalaland
> >
> > What do you guys think? could this be a problem? You can break links
> > like [[Mr Thonson]] replacing it by [[Mr Thonson]]
>
> We don't want to ban all Unicode whitespace.  Some of it is useful,
> which is why it's in Unicode.  :)  For the specific case of titles,

Thinking a bit about it, why not? Upon saving, convert all spaces to the ASCII
space. If someone legitimately needs another space, he can and should use
HTML entity. Someone who uses another space simply creates confusion for
other editors who have no way to differ it from the ordinary space.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Invisible malicious changes.

Max Semenik
On 04.10.2008, 23:27 Nikola wrote:

> On Friday 03 October 2008 02:23:37 Aryeh Gregor wrote:
>> On Thu, Oct 2, 2008 at 6:33 PM, Tei <[hidden email]> wrote:
>> > Heres is a script that replace normal whitespace with one of the
>> > whitespaces supported by UTF8 ( Others are
>> > &#32;&#160;&#5760;&#8192;&#8193;&#8194;&#8195;&#8196;&#8198;&#8199;&#8203
>> >;&#8239;&#8287;).
>> >
>> > I have made a few vandalization test here:
>> > http://en.wikipedia.org/wiki/User:Tei/lalaland
>> >
>> > What do you guys think? could this be a problem? You can break links
>> > like [[Mr Thonson]] replacing it by [[Mr?Thonson]]
>>
>> We don't want to ban all Unicode whitespace.  Some of it is useful,
>> which is why it's in Unicode.  :)  For the specific case of titles,

> Thinking a bit about it, why not? Upon saving, convert all spaces to the ASCII
> space. If someone legitimately needs another space, he can and should use
> HTML entity. Someone who uses another space simply creates confusion for
> other editors who have no way to differ it from the ordinary space.

Just not nbsp - it's widely used on Russian Wikipedia, and no-one
wants to replace it with an entity.

--
Best regards,
  Max Semenik ([[User:MaxSem]])


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Invisible malicious changes.

Gregory Maxwell
In reply to this post by Nikola Smolenski
On Sat, Oct 4, 2008 at 3:27 PM, Nikola Smolenski <[hidden email]> wrote:

> On Friday 03 October 2008 02:23:37 Aryeh Gregor wrote:
>> On Thu, Oct 2, 2008 at 6:33 PM, Tei <[hidden email]> wrote:
>> > Heres is a script that replace normal whitespace with one of the
>> > whitespaces supported by UTF8 ( Others are
>> > &#32;&#160;&#5760;&#8192;&#8193;&#8194;&#8195;&#8196;&#8198;&#8199;&#8203
>> >;&#8239;&#8287;).
>> >
>> > I have made a few vandalization test here:
>> > http://en.wikipedia.org/wiki/User:Tei/lalaland
>> >
>> > What do you guys think? could this be a problem? You can break links
>> > like [[Mr Thonson]] replacing it by [[Mr Thonson]]
>>
>> We don't want to ban all Unicode whitespace.  Some of it is useful,
>> which is why it's in Unicode.  :)  For the specific case of titles,
>
> Thinking a bit about it, why not? Upon saving, convert all spaces to the ASCII
> space. If someone legitimately needs another space, he can and should use
> HTML entity. Someone who uses another space simply creates confusion for
> other editors who have no way to differ it from the ordinary space.

You could say that about a lot of Unicode characters. "it simply
create confusion" "should use the HTML entity".

My keyboard mapping types the non-breaking space just fine (I press
greek-space) and I find it pretty useful.

If you were going to do any conversion, I'd suggest it be TO the
correct HTML entity. But I think it would far better to not convert at
all and instead give the editing and diff views some kind ability to
colorize interesting characters.
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Invisible malicious changes.

Ilmari Karonen
In reply to this post by Aryeh Gregor
Aryeh Gregor wrote:

> On Thu, Oct 2, 2008 at 6:33 PM, Tei <[hidden email]> wrote:
>> Heres is a script that replace normal whitespace with one of the
>> whitespaces supported by UTF8 ( Others are
>> &#32;&#160;&#5760;&#8192;&#8193;&#8194;&#8195;&#8196;&#8198;&#8199;&#8203;&#8239;&#8287;).
>>
>> I have made a few vandalization test here:
>> http://en.wikipedia.org/wiki/User:Tei/lalaland
>>
>> What do you guys think? could this be a problem? You can break links
>> like [[Mr Thonson]] replacing it by [[Mr Thonson]]
>
> We don't want to ban all Unicode whitespace.  Some of it is useful,
> which is why it's in Unicode.  :)  For the specific case of titles,
> see bug 1414:
>
> https://bugzilla.wikimedia.org/show_bug.cgi?id=1414

On the English Wikipedia we've (actually, I did) set the TitleBlacklist
extension to block those.  We also block the bidirectional override
characters which can be even more problematic.  (Nothing as fun as an
invisible character that makes all following text render right to left.)

http://en.wikipedia.org/wiki/MediaWiki:Titleblacklist

--
Ilmari Karonen

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Invisible malicious changes.

Marco Schuster-2
What about the direction-reverse stuff in the the texts?
Okay, these are probably needed in Arab/Hebrew wikis, but can't they be a
bit confusing in the article sources?

Marco
2008/10/5 Ilmari Karonen <[hidden email]>

> Aryeh Gregor wrote:
> > On Thu, Oct 2, 2008 at 6:33 PM, Tei <[hidden email]> wrote:
> >> Heres is a script that replace normal whitespace with one of the
> >> whitespaces supported by UTF8 ( Others are
> >>
> &#32;&#160;&#5760;&#8192;&#8193;&#8194;&#8195;&#8196;&#8198;&#8199;&#8203;&#8239;&#8287;).
> >>
> >> I have made a few vandalization test here:
> >> http://en.wikipedia.org/wiki/User:Tei/lalaland
> >>
> >> What do you guys think? could this be a problem? You can break links
> >> like [[Mr Thonson]] replacing it by [[Mr Thonson]]
> >
> > We don't want to ban all Unicode whitespace.  Some of it is useful,
> > which is why it's in Unicode.  :)  For the specific case of titles,
> > see bug 1414:
> >
> > https://bugzilla.wikimedia.org/show_bug.cgi?id=1414
>
> On the English Wikipedia we've (actually, I did) set the TitleBlacklist
> extension to block those.  We also block the bidirectional override
> characters which can be even more problematic.  (Nothing as fun as an
> invisible character that makes all following text render right to left.)
>
> http://en.wikipedia.org/wiki/MediaWiki:Titleblacklist
>
> --
> Ilmari Karonen
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Invisible malicious changes.

Aryeh Gregor
On Sun, Oct 5, 2008 at 1:54 AM, Marco Schuster
<[hidden email]> wrote:
> What about the direction-reverse stuff in the the texts?
> Okay, these are probably needed in Arab/Hebrew wikis, but can't they be a
> bit confusing in the article sources?

Not as confusing, in my experience, as having punctuation marks and
things in totally the wrong places when you're editing in the text
box.  (But RTL editors who are less tech-savvy would quite likely
disagree with that, I'm guessing: I wouldn't be at all surprised if
they were specifically banned on the RTL wikis.)

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Invisible malicious changes.

Ilmari Karonen
Aryeh Gregor wrote:

> On Sun, Oct 5, 2008 at 1:54 AM, Marco Schuster
> <[hidden email]> wrote:
>> What about the direction-reverse stuff in the the texts?
>> Okay, these are probably needed in Arab/Hebrew wikis, but can't they be a
>> bit confusing in the article sources?
>
> Not as confusing, in my experience, as having punctuation marks and
> things in totally the wrong places when you're editing in the text
> box.  (But RTL editors who are less tech-savvy would quite likely
> disagree with that, I'm guessing: I wouldn't be at all surprised if
> they were specifically banned on the RTL wikis.)

Yes, and page text doesn't (usually) end up in places like logs and
recent changes.  See for example (warning, ugly URL follows):

http://en.wikipedia.org/w/index.php?title=%E2%80%AA%E2%80%AB%E2%80%AC%E2%80%AD%E2%80%AE%E2%80%AA%E2%80%AB%E2%80%AC%E2%80%AD%E2%80%AA%E2%80%AB%E2%80%AC%E2%80%AD%E2%80%AE%E2%80%AA%E2%80%AB%E2%80%AC%E2%80%AD%E2%80%AE%D2%89&action=edit

--
Ilmari Karonen

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Invisible malicious changes.

Ilmari Karonen
Ilmari Karonen wrote:

> Aryeh Gregor wrote:
>> On Sun, Oct 5, 2008 at 1:54 AM, Marco Schuster
>> <[hidden email]> wrote:
>>> What about the direction-reverse stuff in the the texts?
>>> Okay, these are probably needed in Arab/Hebrew wikis, but can't they be a
>>> bit confusing in the article sources?
>> Not as confusing, in my experience, as having punctuation marks and
>> things in totally the wrong places when you're editing in the text
>> box.  (But RTL editors who are less tech-savvy would quite likely
>> disagree with that, I'm guessing: I wouldn't be at all surprised if
>> they were specifically banned on the RTL wikis.)
>
> Yes, and page text doesn't (usually) end up in places like logs and
> recent changes.  See for example (warning, ugly URL follows):

Actually, make that:

http://en.wikipedia.org/wiki/%E2%80%AA%E2%80%AB%E2%80%AC%E2%80%AD%E2%80%AE%E2%80%AA%E2%80%AB%E2%80%AC%E2%80%AD%E2%80%AA%E2%80%AB%E2%80%AC%E2%80%AD%E2%80%AE%E2%80%AA%E2%80%AB%E2%80%AC%E2%80%AD%E2%80%AE%D2%89

so it works right for non-admins too.  Sorry.

--
Ilmari Karonen

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Invisible malicious changes.

Tei-2
On Mon, Oct 6, 2008 at 10:00 PM, Ilmari Karonen <[hidden email]> wrote:

> Ilmari Karonen wrote:
> > Aryeh Gregor wrote:
> >> On Sun, Oct 5, 2008 at 1:54 AM, Marco Schuster
> >> <[hidden email]> wrote:
> >>> What about the direction-reverse stuff in the the texts?
> >>> Okay, these are probably needed in Arab/Hebrew wikis, but can't they be
> a
> >>> bit confusing in the article sources?
> >> Not as confusing, in my experience, as having punctuation marks and
> >> things in totally the wrong places when you're editing in the text
> >> box.  (But RTL editors who are less tech-savvy would quite likely
> >> disagree with that, I'm guessing: I wouldn't be at all surprised if
> >> they were specifically banned on the RTL wikis.)
> >
> > Yes, and page text doesn't (usually) end up in places like logs and
> > recent changes.  See for example (warning, ugly URL follows):
>
> Actually, make that:
>
>
> http://en.wikipedia.org/wiki/%E2%80%AA%E2%80%AB%E2%80%AC%E2%80%AD%E2%80%AE%E2%80%AA%E2%80%AB%E2%80%AC%E2%80%AD%E2%80%AA%E2%80%AB%E2%80%AC%E2%80%AD%E2%80%AE%E2%80%AA%E2%80%AB%E2%80%AC%E2%80%AD%E2%80%AE%D2%89
>
> so it works right for non-admins too.  Sorry.
>


I have created a online hex viewer, that may prove handy to seek into binary
problems on our texts.

http://zerror.com/bin/wex/?url=http://en.wikipedia.org/wiki/MediaWiki:Titleblacklist


--
ℱin del ℳensaje.
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Invisible malicious changes.

Charlotte Webb
On 10/7/08, Tei <[hidden email]> wrote:
> I have created a online hex viewer, that may prove handy to seek into binary
> problems on our texts.
>
> http://zerror.com/bin/wex/?url=http://en.wikipedia.org/wiki/MediaWiki:Titleblacklist

Just curious, can you limit this to [[homoglyph]]ic characters and
make a javascript gadget for it?

—C.W.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Invisible malicious changes.

Mark
In reply to this post by Marco Schuster-2
Marco Schuster wrote:
> What about the direction-reverse stuff in the the texts?
> Okay, these are probably needed in Arab/Hebrew wikis, but can't they be a
> bit confusing in the article sources?
>  
Probably, but on Latin-script Wikipedias they're sometimes necessary to
keep the direction auto-detection from screwing up. For example, I tried
to do this once:

'''Person''' (Arabic: [arabic script here]; 1935-1970)

Since the only thing immediately after the Arabic script is a semicolon
and digits, they're interpreted as part of the r-to-l text block, so
it's rendered something like:

Person (Arabic: 1935-1970 ;[arabic script here)

which is clearly not what was intended. =]  The other workaround is to
gratuitiously add in some Latin script characters we wouldn't usually
use, like "b. 1935; d. 1970".

-Mark


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l