Code detecting bots?

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

Code detecting bots?

Nicholas Moreau
Does the MediaWiki software, or any independently-running 'bots, look
for code placed into pages of the Foundation projects? This article
claims that we're a security risk...

http://www.itworldcanada.com/a/News/036ff0b8-a384-4019-944c-bf09be58eec5.html

Nick

_______________________________________________
foundation-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Code detecting bots?

David Gerard-2
On 02/08/07, Nicholas Moreau <[hidden email]> wrote:

> Does the MediaWiki software, or any independently-running 'bots, look
> for code placed into pages of the Foundation projects? This article
> claims that we're a security risk...
> http://www.itworldcanada.com/a/News/036ff0b8-a384-4019-944c-bf09be58eec5.html


Rubbish. I've commented accordingly.


- d.

_______________________________________________
foundation-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Code detecting bots?

Gregory Maxwell
On 8/2/07, David Gerard <[hidden email]> wrote:
> On 02/08/07, Nicholas Moreau <[hidden email]> wrote:
> > Does the MediaWiki software, or any independently-running 'bots, look
> > for code placed into pages of the Foundation projects? This article
> > claims that we're a security risk...
> > http://www.itworldcanada.com/a/News/036ff0b8-a384-4019-944c-bf09be58eec5.html
>
> Rubbish. I've commented accordingly.

Only mostly rubbish:

People can, and have, externally linked to malicious software from our sites.

Of course, that can happen anywhere on the net and users (and their
browser software) should be smart enough not to execute such code, but
Wikipedia looks rather respectable so people may be more inclined to
bypass security measures based on something on our site.

At the moment there are 209 external links to .exe files from the main
namespace of English Wikipedia.

I don't think we should worry about malicious software specifically.
Instead view any external link to malicious code as part of the larger
problem of weak oversight of external links.

A while back I ran clamav against all 'executable' looking external
links and found one nasty file. It would be really nice if the
mechanism that updates externalinks table spat out a running log of
external link additions and removals that we could hook an ongoing
scanner into.

It's also possible to rename malicious content as one of our accepted
formats for upload and upload it. If you client will execute an 'exe'
renamed to 'ogg' and sent with the Ogg mime type your client is
broken, but broken clients do exist.  I do not recall ever seeing an
example of something malicious distributed that way on our sites.

_______________________________________________
foundation-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Code detecting bots?

Nicholas Moreau
In reply to this post by Nicholas Moreau
> People can, and have, externally linked to malicious software from our sites.

I remember the time that hit the news about three months ago, and
almost all outlets wrote the software was actually uploaded to our
site.

> Of course, that can happen anywhere on the net and users (and their
> browser software) should be smart enough not to execute such code, but
> Wikipedia looks rather respectable so people may be more inclined to
> bypass security measures based on something on our site.

Okay, so none of this stuff would be automatically loading, it would
all be "This site is requesting you activate ****.*** [Yes] [No]" sort
of thing?

> At the moment there are 209 external links to .exe files from the main
> namespace of English Wikipedia.

Is there a list of where these links are, so they can be reviewed? Or
have they indeed already been reviewed? If they're linking to freeware
or open source programs, for example, they likely should all be
linking to a product page, not directly to the download.

_______________________________________________
foundation-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Code detecting bots?

Thomas Dalton
> > At the moment there are 209 external links to .exe files from the main
> > namespace of English Wikipedia.
>
> Is there a list of where these links are, so they can be reviewed? Or
> have they indeed already been reviewed? If they're linking to freeware
> or open source programs, for example, they likely should all be
> linking to a product page, not directly to the download.

Indeed. I can't see any reason for a direct link to an exe in a
Wikipedia article.

_______________________________________________
foundation-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Code detecting bots?

Dan Rosenthal

On Aug 2, 2007, at 10:23 AM, Thomas Dalton wrote:

>>> At the moment there are 209 external links to .exe files from the  
>>> main
>>> namespace of English Wikipedia.
>>
>> Is there a list of where these links are, so they can be reviewed? Or
>> have they indeed already been reviewed? If they're linking to  
>> freeware
>> or open source programs, for example, they likely should all be
>> linking to a product page, not directly to the download.
>
> Indeed. I can't see any reason for a direct link to an exe in a
> Wikipedia article.
>
> _______________________________________________
> foundation-l mailing list
> [hidden email]
> http://lists.wikimedia.org/mailman/listinfo/foundation-l


Does the bot/script detect things like say, .php download pages that  
automatically download a .exe file upon loading?

-Dan Rosenthal

_______________________________________________
foundation-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Code detecting bots?

David Gerard-2
In reply to this post by Gregory Maxwell
On 02/08/07, Gregory Maxwell <[hidden email]> wrote:

> It's also possible to rename malicious content as one of our accepted
> formats for upload and upload it. If you client will execute an 'exe'
> renamed to 'ogg' and sent with the Ogg mime type your client is
> broken, but broken clients do exist.  I do not recall ever seeing an
> example of something malicious distributed that way on our sites.


Really? I thought we ran "file" on uploads as well as looking at the extension.

Though I suppose that wouldn't protect against the "specially crafted
malicious file" of security notice fame.


- d.

_______________________________________________
foundation-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Code detecting bots?

Thomas Dalton
In reply to this post by Dan Rosenthal
> Does the bot/script detect things like say, .php download pages that
> automatically download a .exe file upon loading?

I doubt it, but no decent browser would fool for that without giving
the user a pretty big warning.

_______________________________________________
foundation-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Code detecting bots?

Gregory Maxwell
In reply to this post by Nicholas Moreau
On 8/2/07, Nicholas Moreau <[hidden email]> wrote:
> > People can, and have, externally linked to malicious software from our sites.
>
> I remember the time that hit the news about three months ago, and
> almost all outlets wrote the software was actually uploaded to our
> site.

Yes and that wasn't accurate.


> > Of course, that can happen anywhere on the net and users (and their
> > browser software) should be smart enough not to execute such code, but
> > Wikipedia looks rather respectable so people may be more inclined to
> > bypass security measures based on something on our site.
>
> Okay, so none of this stuff would be automatically loading, it would
> all be "This site is requesting you activate ****.*** [Yes] [No]" sort
> of thing?


Right. It would be a 'click the link', then your browser would
download and say 'Are you sure you want to run this probably malicious
software, "Brittney_spears_boobies.exe"?', then the user clicks yes.
;)

> > At the moment there are 209 external links to .exe files from the main
> > namespace of English Wikipedia.
>
> Is there a list of where these links are, so they can be reviewed?

I've listed them in the past and went through and fixed a bunch of
them myself. I think there were far feaer then and I removed many of
them... :(

I've put up a list:
http://en.wikipedia.org/wiki/User:Gmaxwell/extff/exe

You can see the older version in the history of the page.. I think
that might have been the list after I'd already made one pass at
removing them.

> Or
> have they indeed already been reviewed? If they're linking to freeware
> or open source programs, for example, they likely should all be
> linking to a product page, not directly to the download.

You are absolutely correct.

I'd say we should deny, by policy and possibly technical means,
external linking to URLs with certian names or which transmit certian
mime types...

Actually pulling it off might be hard: a number of the exe's are
really just ZIP files converted into self-extracting archives. The
data in them may not be easily available in other forms. There is
almost certantly a launch page for these, but finding them when all
you know is the deep link name can be hard.

_______________________________________________
foundation-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Code detecting bots?

Gregory Maxwell
In reply to this post by David Gerard-2
On 8/2/07, David Gerard <[hidden email]> wrote:
> Really? I thought we ran "file" on uploads as well as looking at the extension.

We do. And if it doesn't match what we think it will be... we put a
notice that no one notices on the image page.

Example:
http://commons.wikimedia.org/wiki/Image:Edvard_Grieg_-_03_-_In_The_Hall_Of_The_Mountain_King.ogg

(that file is an mp3 renamed to ogg, normally I just transcode all
these but I've left that one sitting around because the copyright
status on it looked suspect)

There is a constant slow stream of misnamed files that come in....
every few months I hit one and get a wild hair to go convert or delete
all the ones on commons and enwiki.  I've found some weird stuff, even
suspicious, but not yet something which I am confident was malicious.

[[User:Gmaxwell]] gets bored and takes a pushbroom it it once a year
is not a scalable method of handling this stuff.

> Though I suppose that wouldn't protect against the "specially crafted
> malicious file" of security notice fame.

Even if we had the more aggressive filtering that you thought we had
the risk of such files would remain.  For the most part the handlers
for the formats we do support tend to be pretty robust, internet grade
stuff though..

_______________________________________________
foundation-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Code detecting bots?

Gregory Maxwell
In reply to this post by Dan Rosenthal
On 8/2/07, Dan Rosenthal <[hidden email]> wrote:
> Does the bot/script detect things like say, .php download pages that
> automatically download a .exe file upon loading?

I'm not aware of anyone who has checked our EL's for sites which give
executable mime types for URLs we wouldn't expect to be executable.

It could be done and should be done.. but doing it in bulk takes time
because we have a *huge* number of external links.

_______________________________________________
foundation-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Code detecting bots?

Thomas Dalton
In reply to this post by Gregory Maxwell
> We do. And if it doesn't match what we think it will be... we put a
> notice that no one notices on the image page.

The reason nobody notices it is because it looks generic. It should
say why the file is suspicious. As it is, it looks like a message that
could be applied to any file.

_______________________________________________
foundation-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Code detecting bots?

Brion Vibber-3
In reply to this post by Gregory Maxwell
Gregory Maxwell wrote:
> On 8/2/07, David Gerard <[hidden email]> wrote:
>> Really? I thought we ran "file" on uploads as well as looking at the extension.
>
> We do. And if it doesn't match what we think it will be... we put a
> notice that no one notices on the image page.

That's incorrect.

If the detected filetype doesn't match the defined filetype for the
extension, then the upload is rejected.

(However note that at this moment we don't have very solid detection for
OGG.)

The warning on image pages about malicious code is bullshit -- we should
remove it, since it has nothing to do with reality.

Greg, don't be afraid to pop things into bugzilla or work with us over
in SVN to fix things up. :)

-- brion vibber (brion @ wikimedia.org)

_______________________________________________
foundation-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Code detecting bots?

Brion Vibber-3
In reply to this post by Thomas Dalton
Thomas Dalton wrote:
>> Does the bot/script detect things like say, .php download pages that
>> automatically download a .exe file upon loading?
>
> I doubt it, but no decent browser would fool for that without giving
> the user a pretty big warning.

Just the same warning you get when you click a link that ends in .exe.

-- brion vibber (brion @ wikimedia.org)

_______________________________________________
foundation-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Code detecting bots?

Gregory Maxwell
In reply to this post by Brion Vibber-3
On 8/2/07, Brion Vibber <[hidden email]> wrote:
> > We do. And if it doesn't match what we think it will be... we put a
> > notice that no one notices on the image page.
>
> That's incorrect.
> If the detected filetype doesn't match the defined filetype for the
> extension, then the upload is rejected.
>
> (However note that at this moment we don't have very solid detection for
> OGG.)

O_o. I still find a lot of random crud uploaded as other things on commons.

We reliably detect Ogg as far as I can tell, at least in the sense
that when I've checked in the past all the files on commons that had
the bad mime data in the database were actually not ogg files.

I'll have to check more carefully but if we are, as I believe,
correctly detecting Ogg files then we could turn on limiting on those
files.

> The warning on image pages about malicious code is bullshit -- we should
> remove it, since it has nothing to do with reality.

I just conducted a test:
[gmaxwell@bessel ~]$ file ./.wine/drive_c/windows/system32/cmd.exe
./.wine/drive_c/windows/system32/cmd.exe: MS-DOS executable PE  for MS
Windows (console) Intel 80386

http://commons.wikimedia.org/wiki/Image:Winecmdexe.sxd
http://commons.wikimedia.org/wiki/Image:Winecmdexe.svg
http://commons.wikimedia.org/wiki/Image:Winecmdexe.xcf
http://commons.wikimedia.org/wiki/Image:Winecmdexe.mid
http://commons.wikimedia.org/wiki/Image:Winecmdexe.sxw
http://commons.wikimedia.org/wiki/Image:Winecmdexe.pdf
http://commons.wikimedia.org/wiki/Image:Winecmdexe.ogg

It did reject the exe renamed to both png and jpg but thats it.

> Greg, don't be afraid to pop things into bugzilla or work with us over
> in SVN to fix things up. :)

I'm not, but I honestly thought this was 'works as designed'.

At least in the ogg case we may already have reliable enough
detection.. if something is lacking there it should be trivial to fix
ogg is easy to detect robustly. I don't know about the other file
types.

_______________________________________________
foundation-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Code detecting bots?

Brion Vibber-3
Gregory Maxwell wrote:
[snip]

Let's conduct this discussion on bugzilla instead of the Wikimedia
Foundation list, maybe? :)

-- brion vibber (brion @ wikimedia.org)

_______________________________________________
foundation-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Code detecting bots?

Uber Halogen
In reply to this post by Gregory Maxwell
On 02/08/2007, Gregory Maxwell <[hidden email]> wrote:
> On 8/2/07, Dan Rosenthal <[hidden email]> wrote:
> > Does the bot/script detect things like say, .php download pages that
> > automatically download a .exe file upon loading?
>
> I'm not aware of anyone who has checked our EL's for sites which give
> executable mime types for URLs we wouldn't expect to be executable.
>
> It could be done and should be done.. but doing it in bulk takes time
> because we have a *huge* number of external links.

I have started work on a script to do this. It will take me a few more
days to complete (trying to track down some funny bugs), but it will
probably take even longer for the script to actually run.

>
> _______________________________________________
> foundation-l mailing list
> [hidden email]
> http://lists.wikimedia.org/mailman/listinfo/foundation-l
>

_UH.

_______________________________________________
foundation-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/foundation-l