Anonymous editors & IP addresses

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

Anonymous editors & IP addresses

Gilles Dubuc
This interesting bot showed up on hackernews today:
https://news.ycombinator.com/item?id=8018284

While in this instance the access to anonymous' editors IP addresses is
definitely useful in terms of identifying edits with probable conflict of
interest, it makes me wonder what the history is behind the fact that
anonymous editors are identified by their IP addresses on WMF-hosted wikis.

IP addresses are closely guarded for registered users, why wouldn't
anonymous users be identified by a hash of their IP address in order to
protect their privacy as well? The exact same functionality of being able
to see all edits by a given anonymous IP would still exist, the IP itself
just wouldn't be publicly available, protected with the same access rights
as registered users'.

The "use case" that makes me think of that is someone living in a
totalitarian regime making a sensitive edit and forgetting that they're
logged out. Or just being unaware that being anonymous on the wiki doesn't
mean that their local authorities can figure out who they are based on IP
address and time. Understanding that they're somewhat protected when logged
in and not when logged out requires a certain level of technical
understanding. The easy way out of this argument is to state that these
users should be using Tor or something similar. But I still wonder why we
have this double standard of protecting registered users' privacy in
regards to IP addresses and not applying the same for anonymous users, when
simple hashing would do the job.
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Anonymous editors & IP addresses

Tyler Romeo
I agree that it’s a double standard, but looking at the bright side, it becomes a big encouragement to anonymous users to register and log in. The Account Creation Experience Team (or whoever the hell is in charge of that) can correct me, but I would imagine that we would see a big drop in registered accounts if IPs were hashed.

Also, it’d be really annoying to have hashes as usernames, so we’d have to think of an alternative scheme that makes things more readable.
-- 
Tyler Romeo
0x405D34A7C86B42DF

From: Gilles Dubuc <[hidden email]>
Reply: Wikimedia developers <[hidden email]>>
Date: July 11, 2014 at 9:34:18
To: Wikimedia developers <[hidden email]>>
Subject:  [Wikitech-l] Anonymous editors & IP addresses  

This interesting bot showed up on hackernews today:
https://news.ycombinator.com/item?id=8018284

While in this instance the access to anonymous' editors IP addresses is
definitely useful in terms of identifying edits with probable conflict of
interest, it makes me wonder what the history is behind the fact that
anonymous editors are identified by their IP addresses on WMF-hosted wikis.

IP addresses are closely guarded for registered users, why wouldn't
anonymous users be identified by a hash of their IP address in order to
protect their privacy as well? The exact same functionality of being able
to see all edits by a given anonymous IP would still exist, the IP itself
just wouldn't be publicly available, protected with the same access rights
as registered users'.

The "use case" that makes me think of that is someone living in a
totalitarian regime making a sensitive edit and forgetting that they're
logged out. Or just being unaware that being anonymous on the wiki doesn't
mean that their local authorities can figure out who they are based on IP
address and time. Understanding that they're somewhat protected when logged
in and not when logged out requires a certain level of technical
understanding. The easy way out of this argument is to state that these
users should be using Tor or something similar. But I still wonder why we
have this double standard of protecting registered users' privacy in
regards to IP addresses and not applying the same for anonymous users, when
simple hashing would do the job.
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

signature.asc (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Anonymous editors & IP addresses

Gilles Dubuc
>
> I would imagine that we would see a big drop in registered accounts if IPs
> were hashed.
>

Why? Most casual web users don't even know what an IP address is, let alone
what their own address is. In fact the evolution of browsers tends to even
hide the URL. This is the sort of technical information that an
ever-shrinking portion of web users know about these days.

an alternative scheme that makes things more readable
>

A hash can take many forms. In fact it could be formatted just like an IP
address. Even if the hash format mixes letters and numbers, as long as the
length is similar, I don't see how IP addresses are superior in terms of
readability.


On Fri, Jul 11, 2014 at 10:25 AM, Tyler Romeo <[hidden email]> wrote:

> I agree that it’s a double standard, but looking at the bright side, it
> becomes a big encouragement to anonymous users to register and log in. The
> Account Creation Experience Team (or whoever the hell is in charge of that)
> can correct me, but I would imagine that we would see a big drop in
> registered accounts if IPs were hashed.
>
> Also, it’d be really annoying to have hashes as usernames, so we’d have to
> think of an alternative scheme that makes things more readable.
> --
> Tyler Romeo
> 0x405D34A7C86B42DF
>
> From: Gilles Dubuc <[hidden email]> <[hidden email]>
> Reply: Wikimedia developers <[hidden email]>>
> <[hidden email]>
> Date: July 11, 2014 at 9:34:18
> To: Wikimedia developers <[hidden email]>>
> <[hidden email]>
> Subject:  [Wikitech-l] Anonymous editors & IP addresses
>
> This interesting bot showed up on hackernews today:
> https://news.ycombinator.com/item?id=8018284
>
> While in this instance the access to anonymous' editors IP addresses is
> definitely useful in terms of identifying edits with probable conflict of
> interest, it makes me wonder what the history is behind the fact that
> anonymous editors are identified by their IP addresses on WMF-hosted wikis.
>
> IP addresses are closely guarded for registered users, why wouldn't
> anonymous users be identified by a hash of their IP address in order to
> protect their privacy as well? The exact same functionality of being able
> to see all edits by a given anonymous IP would still exist, the IP itself
> just wouldn't be publicly available, protected with the same access rights
> as registered users'.
>
> The "use case" that makes me think of that is someone living in a
> totalitarian regime making a sensitive edit and forgetting that they're
> logged out. Or just being unaware that being anonymous on the wiki doesn't
> mean that their local authorities can figure out who they are based on IP
> address and time. Understanding that they're somewhat protected when logged
> in and not when logged out requires a certain level of technical
> understanding. The easy way out of this argument is to state that these
> users should be using Tor or something similar. But I still wonder why we
> have this double standard of protecting registered users' privacy in
> regards to IP addresses and not applying the same for anonymous users, when
> simple hashing would do the job.
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Anonymous editors & IP addresses

Risker
In reply to this post by Tyler Romeo
This is one of those perennial proposals that never quite seems to take
off; I can remember having some version of this discussion back in 2008,
and I know that some of our earliest edits show a partially obscured IP
address, not the whole thing. It might require Brion or Tim or someone else
of that length of experience to explain the original thinking.

Some of the "pros" of keeping the IP address as the "username" for
unregistered users:

   - Even in this day and age, there are plenty of people with stable IPs;
   they choose to edit as unregistered users for philosophical reasons, and
   their IP's edit history is essentially their own editing history
   - Especially on smaller projects (but also big ones), range blocks are
   usually calculated and applied by administrators, not checkusers/stewards.


Some of the "cons" of publishing the IP address as the username:

   - Privacy - IPv6 addresses in particular are including more and more
   very specific information that could be used to link RealLife Name with the
   edits. (My own ISP now gives enough information in many cases to narrow
   geolocation down to a one-block radius - a big change from 2 years ago when
   geolocation was about an 800 mile radius.)
   - Privacy - more and more jurisdictions consider a person's IP address
   to be "private" information.  Our page histories could be considered one
   gigantic privacy violation.
   - Increasingly dynamic IP addresses, often rotating within very large
   ranges that no longer link with any certainty to geolocation
   - Freaked out new users who didn't really get that their IP address was
   going to be very publicly displayed.


I'm pretty sure there are a whole pile more pros and cons that we can pull
out of the archives from various mailing lists, and I know that there have
periodically been discussions amongst developers and the rest of the
engineering team to try to come up with a "better way" - but like many
other interesting, good and even potentially necessary ideas, it's never
made it to the top of the priority heap.

Putting on my checkuser hat for just a minute...it's essential information
for having any chance at all of identifying multiple accounts or pattern
editing; however, the tables used by checkusers are non-public so
Checkusers continuing to have access to IP data should not be an issue.

Risker/Anne


On 11 July 2014 10:25, Tyler Romeo <[hidden email]> wrote:

> I agree that it’s a double standard, but looking at the bright side, it
> becomes a big encouragement to anonymous users to register and log in. The
> Account Creation Experience Team (or whoever the hell is in charge of that)
> can correct me, but I would imagine that we would see a big drop in
> registered accounts if IPs were hashed.
>
> Also, it’d be really annoying to have hashes as usernames, so we’d have to
> think of an alternative scheme that makes things more readable.
> --
> Tyler Romeo
> 0x405D34A7C86B42DF
>
> From: Gilles Dubuc <[hidden email]>
> Reply: Wikimedia developers <[hidden email]>>
> Date: July 11, 2014 at 9:34:18
> To: Wikimedia developers <[hidden email]>>
> Subject:  [Wikitech-l] Anonymous editors & IP addresses
>
> This interesting bot showed up on hackernews today:
> https://news.ycombinator.com/item?id=8018284
>
> While in this instance the access to anonymous' editors IP addresses is
> definitely useful in terms of identifying edits with probable conflict of
> interest, it makes me wonder what the history is behind the fact that
> anonymous editors are identified by their IP addresses on WMF-hosted wikis.
>
> IP addresses are closely guarded for registered users, why wouldn't
> anonymous users be identified by a hash of their IP address in order to
> protect their privacy as well? The exact same functionality of being able
> to see all edits by a given anonymous IP would still exist, the IP itself
> just wouldn't be publicly available, protected with the same access rights
> as registered users'.
>
> The "use case" that makes me think of that is someone living in a
> totalitarian regime making a sensitive edit and forgetting that they're
> logged out. Or just being unaware that being anonymous on the wiki doesn't
> mean that their local authorities can figure out who they are based on IP
> address and time. Understanding that they're somewhat protected when logged
> in and not when logged out requires a certain level of technical
> understanding. The easy way out of this argument is to state that these
> users should be using Tor or something similar. But I still wonder why we
> have this double standard of protecting registered users' privacy in
> regards to IP addresses and not applying the same for anonymous users, when
> simple hashing would do the job.
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Anonymous editors & IP addresses

Gilles Dubuc
> Even in this day and age, there are plenty of people with stable IPs
>

With hashing, a given IP would always give the same hash. So this
uniqueness property would remain for people with stable IPs.


On Fri, Jul 11, 2014 at 10:55 AM, Risker <[hidden email]> wrote:

> This is one of those perennial proposals that never quite seems to take
> off; I can remember having some version of this discussion back in 2008,
> and I know that some of our earliest edits show a partially obscured IP
> address, not the whole thing. It might require Brion or Tim or someone else
> of that length of experience to explain the original thinking.
>
> Some of the "pros" of keeping the IP address as the "username" for
> unregistered users:
>
>    - Even in this day and age, there are plenty of people with stable IPs;
>    they choose to edit as unregistered users for philosophical reasons, and
>    their IP's edit history is essentially their own editing history
>    - Especially on smaller projects (but also big ones), range blocks are
>    usually calculated and applied by administrators, not
> checkusers/stewards.
>
>
> Some of the "cons" of publishing the IP address as the username:
>
>    - Privacy - IPv6 addresses in particular are including more and more
>    very specific information that could be used to link RealLife Name with
> the
>    edits. (My own ISP now gives enough information in many cases to narrow
>    geolocation down to a one-block radius - a big change from 2 years ago
> when
>    geolocation was about an 800 mile radius.)
>    - Privacy - more and more jurisdictions consider a person's IP address
>    to be "private" information.  Our page histories could be considered one
>    gigantic privacy violation.
>    - Increasingly dynamic IP addresses, often rotating within very large
>    ranges that no longer link with any certainty to geolocation
>    - Freaked out new users who didn't really get that their IP address was
>    going to be very publicly displayed.
>
>
> I'm pretty sure there are a whole pile more pros and cons that we can pull
> out of the archives from various mailing lists, and I know that there have
> periodically been discussions amongst developers and the rest of the
> engineering team to try to come up with a "better way" - but like many
> other interesting, good and even potentially necessary ideas, it's never
> made it to the top of the priority heap.
>
> Putting on my checkuser hat for just a minute...it's essential information
> for having any chance at all of identifying multiple accounts or pattern
> editing; however, the tables used by checkusers are non-public so
> Checkusers continuing to have access to IP data should not be an issue.
>
> Risker/Anne
>
>
> On 11 July 2014 10:25, Tyler Romeo <[hidden email]> wrote:
>
> > I agree that it’s a double standard, but looking at the bright side, it
> > becomes a big encouragement to anonymous users to register and log in.
> The
> > Account Creation Experience Team (or whoever the hell is in charge of
> that)
> > can correct me, but I would imagine that we would see a big drop in
> > registered accounts if IPs were hashed.
> >
> > Also, it’d be really annoying to have hashes as usernames, so we’d have
> to
> > think of an alternative scheme that makes things more readable.
> > --
> > Tyler Romeo
> > 0x405D34A7C86B42DF
> >
> > From: Gilles Dubuc <[hidden email]>
> > Reply: Wikimedia developers <[hidden email]>>
> > Date: July 11, 2014 at 9:34:18
> > To: Wikimedia developers <[hidden email]>>
> > Subject:  [Wikitech-l] Anonymous editors & IP addresses
> >
> > This interesting bot showed up on hackernews today:
> > https://news.ycombinator.com/item?id=8018284
> >
> > While in this instance the access to anonymous' editors IP addresses is
> > definitely useful in terms of identifying edits with probable conflict of
> > interest, it makes me wonder what the history is behind the fact that
> > anonymous editors are identified by their IP addresses on WMF-hosted
> wikis.
> >
> > IP addresses are closely guarded for registered users, why wouldn't
> > anonymous users be identified by a hash of their IP address in order to
> > protect their privacy as well? The exact same functionality of being able
> > to see all edits by a given anonymous IP would still exist, the IP itself
> > just wouldn't be publicly available, protected with the same access
> rights
> > as registered users'.
> >
> > The "use case" that makes me think of that is someone living in a
> > totalitarian regime making a sensitive edit and forgetting that they're
> > logged out. Or just being unaware that being anonymous on the wiki
> doesn't
> > mean that their local authorities can figure out who they are based on IP
> > address and time. Understanding that they're somewhat protected when
> logged
> > in and not when logged out requires a certain level of technical
> > understanding. The easy way out of this argument is to state that these
> > users should be using Tor or something similar. But I still wonder why we
> > have this double standard of protecting registered users' privacy in
> > regards to IP addresses and not applying the same for anonymous users,
> when
> > simple hashing would do the job.
> > _______________________________________________
> > Wikitech-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >
> > _______________________________________________
> > Wikitech-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Anonymous editors & IP addresses

Tyler Romeo
As a quick implementation note, we would not be using a hash for the IP address.

Most likely, we would encrypt the IP with AES or something using a configuration-based secret key. That way checkusers can still reverse the hash back into normal IP addresses without having to store the mapping in the database.

-- 
Tyler Romeo
0x405D34A7C86B42DF

From: Gilles Dubuc <[hidden email]>
Reply: Wikimedia developers <[hidden email]>>
Date: July 11, 2014 at 10:59:55
To: Wikimedia developers <[hidden email]>>
Subject:  Re: [Wikitech-l] Anonymous editors & IP addresses  

With hashing, a given IP would always give the same hash. So this
uniqueness property would remain for people with stable IPs.
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Anonymous editors & IP addresses

Ole Palnatoke Andersen
In reply to this post by Gilles Dubuc
To my knowledge, there are currently six of these Twitter bots
(Canada, Denmark, France, Sweden, UK, US). I have collected them in a
Twitter list: https://twitter.com/palnatoke/lists/wikiedit

Please speak up if you notice more, so I can include them in the list, too.


Regards,
Ole

On Fri, Jul 11, 2014 at 3:34 PM, Gilles Dubuc <[hidden email]> wrote:

> This interesting bot showed up on hackernews today:
> https://news.ycombinator.com/item?id=8018284
>
> While in this instance the access to anonymous' editors IP addresses is
> definitely useful in terms of identifying edits with probable conflict of
> interest, it makes me wonder what the history is behind the fact that
> anonymous editors are identified by their IP addresses on WMF-hosted wikis.
>
> IP addresses are closely guarded for registered users, why wouldn't
> anonymous users be identified by a hash of their IP address in order to
> protect their privacy as well? The exact same functionality of being able
> to see all edits by a given anonymous IP would still exist, the IP itself
> just wouldn't be publicly available, protected with the same access rights
> as registered users'.
>
> The "use case" that makes me think of that is someone living in a
> totalitarian regime making a sensitive edit and forgetting that they're
> logged out. Or just being unaware that being anonymous on the wiki doesn't
> mean that their local authorities can figure out who they are based on IP
> address and time. Understanding that they're somewhat protected when logged
> in and not when logged out requires a certain level of technical
> understanding. The easy way out of this argument is to state that these
> users should be using Tor or something similar. But I still wonder why we
> have this double standard of protecting registered users' privacy in
> regards to IP addresses and not applying the same for anonymous users, when
> simple hashing would do the job.
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



--
http://palnatoke.org * @palnatoke * +4522934588

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Anonymous editors & IP addresses

Daniel Kinzler
In reply to this post by Tyler Romeo
Am 11.07.2014 17:19, schrieb Tyler Romeo:
> Most likely, we would encrypt the IP with AES or something using a
> configuration-based secret key. That way checkusers can still reverse the
> hash back into normal IP addresses without having to store the mapping in the
> database.

There are two problems with this, I think.

1) No forward secrecy. If that key is ever leaked, all IPs become "plain". And
it will be, sooner or later. This would probably not be obvious, so this feature
would instill a false sense of security.

2) No range blocks. It's often quite useful to be able to block a range of IPs.
This is an important tool in the fight against spammers, taking it away would be
a problem.

-- daniel

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Anonymous editors & IP addresses

Brion Vibber-4
In reply to this post by Risker
On Friday, July 11, 2014, Risker <[hidden email]> wrote:

> This is one of those perennial proposals that never quite seems to take
> off; I can remember having some version of this discussion back in 2008,
> and I know that some of our earliest edits show a partially obscured IP
> address, not the whole thing. It might require Brion or Tim or someone else
> of that length of experience to explain the original thinking.


As I recall, UseModWiki (the perl-based wiki software we used before
switching to a custom solution which evolved into MediaWiki) obscured the
last octet of the IP address, which still left you with enough information
in most cases to track down an ISP or school/business/govt institution. I
think UseMod also exposed the IP addresses of logged-in users, but the way
logins worked were very different and it was possible to set your name to
someone else's name or some such oddities...

I'm not sure offhand if there was explicit discussion of switching to not
obscuring the last octet in the PHP software/nascent MediaWiki... But this
was back in 2001 when the internet was a little younger and everybody was
spewing their IP addresses all over their email and newsgroup posts too.
Folks are a lot more paranoid about that today.


In general I favor migrating away from publicly exposing IP addresses, but
not sure to what exactly would be best... I kinda like the idea of an
anonymous-but-consistent "proto-account" that can be transformed into a
named login if desired, but it needs to be thought out in more detail to
resolve potential difficulties.

-- brion
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Anonymous editors & IP addresses

Chris Steipp
In reply to this post by Daniel Kinzler
On Friday, July 11, 2014, Daniel Kinzler <[hidden email]> wrote:

> Am 11.07.2014 17:19, schrieb Tyler Romeo:
> > Most likely, we would encrypt the IP with AES or something using a
> > configuration-based secret key. That way checkusers can still reverse the
> > hash back into normal IP addresses without having to store the mapping
> in the
> > database.
>
> There are two problems with this, I think.
>
> 1) No forward secrecy. If that key is ever leaked, all IPs become "plain".
> And
> it will be, sooner or later. This would probably not be obvious, so this
> feature
> would instill a false sense of security.
>

This is probably the biggest issue. Even if we hmac it, it's trivial to
brute force the entire ipv4 (and with intelligent assumptions about
generation, most of the ipv6) range in seconds, if the key was ever known.


>
> 2) No range blocks. It's often quite useful to be able to block a range of
> IPs.
> This is an important tool in the fight against spammers, taking it away
> would be
> a problem.
>

Range blocks, I imagine, would continue working the same way they do.
Someone would have to identify the correct range (which is very difficult
when administrators can't see IP's), but on submission, we have the IP
address to check against the blocks. (Unless someone proposes to store
block ranges as hashes, that would definitely get rid of range blocks).


>
> -- daniel
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email] <javascript:;>
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Anonymous editors & IP addresses

Gilles Dubuc
In reply to this post by Brion Vibber-4
> I kinda like the idea of an
> anonymous-but-consistent "proto-account" that can be transformed into a
> named login if desired, but it needs to be thought out in more detail to
> resolve potential difficulties.


One could automatically create a pseudo-account ("Anonymous #12345") upon
first edit. And that account would always be authenticated automaticallly
upon future edits coming from the same IP address. I don't think it should
be allowed to turn those pseudo-accounts into proper accounts, though,
they'd be marked as anonymous pseudo-accounts forever. Otherwise having a
way to upgrade to a proper account while conserving edits which were
potentially written by other people could get hairy, especially from a
legal standpoint.

Maybe it's a cookie-based approach you had in mind? Where we automatically
create an account tied to the user agent. That would mitigate the issue of
converting a pseudo-account that might have been shared between several
people to a proper account, but not completely get rid of it.


On Fri, Jul 11, 2014 at 11:45 AM, Brion Vibber <[hidden email]>
wrote:

> On Friday, July 11, 2014, Risker <[hidden email]> wrote:
>
> > This is one of those perennial proposals that never quite seems to take
> > off; I can remember having some version of this discussion back in 2008,
> > and I know that some of our earliest edits show a partially obscured IP
> > address, not the whole thing. It might require Brion or Tim or someone
> else
> > of that length of experience to explain the original thinking.
>
>
> As I recall, UseModWiki (the perl-based wiki software we used before
> switching to a custom solution which evolved into MediaWiki) obscured the
> last octet of the IP address, which still left you with enough information
> in most cases to track down an ISP or school/business/govt institution. I
> think UseMod also exposed the IP addresses of logged-in users, but the way
> logins worked were very different and it was possible to set your name to
> someone else's name or some such oddities...
>
> I'm not sure offhand if there was explicit discussion of switching to not
> obscuring the last octet in the PHP software/nascent MediaWiki... But this
> was back in 2001 when the internet was a little younger and everybody was
> spewing their IP addresses all over their email and newsgroup posts too.
> Folks are a lot more paranoid about that today.
>
>
> In general I favor migrating away from publicly exposing IP addresses, but
> not sure to what exactly would be best... I kinda like the idea of an
> anonymous-but-consistent "proto-account" that can be transformed into a
> named login if desired, but it needs to be thought out in more detail to
> resolve potential difficulties.
>
> -- brion
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Anonymous editors & IP addresses

Happy Melon-2
On 11 July 2014 17:10, Gilles Dubuc <[hidden email]> wrote:

>
> Maybe it's a cookie-based approach you had in mind? Where we automatically
> create an account tied to the user agent. That would mitigate the issue of
> converting a pseudo-account that might have been shared between several
> people to a proper account, but not completely get rid of it.
>

I'd have thought the chain of events "go to a library computer, do some
edits, decide to upgrade to a real account, do so, realise you've
inadvertently swept up all the unsalubrious penis vandalism that has been
made on that computer previously" would be unacceptably common.

--HM
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Anonymous editors & IP addresses

Matthew Flaschen-2
In reply to this post by Brion Vibber-4
On 07/11/2014 11:45 AM, Brion Vibber wrote:
> As I recall, UseModWiki (the perl-based wiki software we used before
> switching to a custom solution which evolved into MediaWiki) obscured the
> last octet of the IP address, which still left you with enough information
> in most cases to track down an ISP or school/business/govt institution. I
> think UseMod also exposed the IP addresses of logged-in users, but the way
> logins worked were very different and it was possible to set your name to
> someone else's name or some such oddities...

Yeah, the main benefit to the current setup (which probably doesn't
really require the last octet in most cases) is detecting casual abuse,
which includes (but is not limited to) both blatant vandalism and
conflict of interest edits.  (People have lunch breaks, and I don't
claim every edit from a organizational IP is a conflict of interest, but
many true COI edits have been caught this way).

If we look into something like proto-accounts or hashing or such, it
would be good to try to maintain this benefit (do the lookup on the
server, and expose who the IP block belongs to?), but I don't know if
it's possible to have it both ways.

Matt Flaschen


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Anonymous editors & IP addresses

Gryllida
In reply to this post by Gilles Dubuc
On Fri, 11 Jul 2014, at 23:34, Gilles Dubuc wrote:
> IP addresses are closely guarded for registered users, why wouldn't
> anonymous users be identified by a hash of their IP address in order to
> protect their privacy as well?

While I don't horribly mind some changes in the direction you're writing, I think that:

1) Privacy is defined as "The state of being free from unsanctioned intrusion". An IP, as a fundamental identifier, has as much to do with privacy as a car number you see on a street. (Anyone can look up a name by car number, in my area, which I expect to be common.)

Firefox folks are, iirc, considering providing IP-based links in the new tab with one of the next releases. These links would include local shops and restaurants. I've seen some argue that such decision goes against "privacy", but I think it's the wrong term.

2) There are other nicer things to enable for anonymous readers that would make their editing experience more efficient. Such things include enabling some preferences and features for these contributors, which may be useful to a group of people editing from one IP:

https://meta.wikimedia.org/wiki/Musings_about_unregistered_contributors#Examples

Gryllida.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Anonymous editors & IP addresses

Nick White
(a little off topic diversion)

On Tue, Jul 15, 2014 at 06:22:17PM +1000, Gryllida wrote:
> An IP, as a fundamental identifier, has as much to do with privacy
> as a car number you see on a street. (Anyone can look up a name by
> car number, in my area, which I expect to be common.)

Actually numberplates were originally conceived as a privacy
enhancing technology. The first numberplates had peoples' names on
them, but that was considered too intrusive.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Anonymous editors & IP addresses

Ricordisamoa
In reply to this post by Gilles Dubuc
The CC BY-SA license, used on most WMF projects, requires /attribution/.
Attribution for edits made by unregistered/unlogged users is done by the
exclusive means of their IP address.
By clicking the 'Save' button, they agreed to release their edits under
CC BY-SA, and that their IP address would have been the only form of
attribution of their changes to them.
While we can assume that there aren't any collisions between hashes of
IP addresses, and we could change the attribution requirements for new
edits, hiding or modifying the way IP addresses /of unregistered users
who edited before that change/ are shown would be a substantial CC BY-SA
infringement, as would be a change of registered users' names without
their consent and without public logs of that change.

Il 11/07/2014 15:34, Gilles Dubuc ha scritto:

> This interesting bot showed up on hackernews today:
> https://news.ycombinator.com/item?id=8018284
>
> While in this instance the access to anonymous' editors IP addresses is
> definitely useful in terms of identifying edits with probable conflict of
> interest, it makes me wonder what the history is behind the fact that
> anonymous editors are identified by their IP addresses on WMF-hosted wikis.
>
> IP addresses are closely guarded for registered users, why wouldn't
> anonymous users be identified by a hash of their IP address in order to
> protect their privacy as well? The exact same functionality of being able
> to see all edits by a given anonymous IP would still exist, the IP itself
> just wouldn't be publicly available, protected with the same access rights
> as registered users'.
>
> The "use case" that makes me think of that is someone living in a
> totalitarian regime making a sensitive edit and forgetting that they're
> logged out. Or just being unaware that being anonymous on the wiki doesn't
> mean that their local authorities can figure out who they are based on IP
> address and time. Understanding that they're somewhat protected when logged
> in and not when logged out requires a certain level of technical
> understanding. The easy way out of this argument is to state that these
> users should be using Tor or something similar. But I still wonder why we
> have this double standard of protecting registered users' privacy in
> regards to IP addresses and not applying the same for anonymous users, when
> simple hashing would do the job.
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Anonymous editors & IP addresses

Brian Wolff
On 7/18/14, Ricordisamoa <[hidden email]> wrote:

> The CC BY-SA license, used on most WMF projects, requires /attribution/.
> Attribution for edits made by unregistered/unlogged users is done by the
> exclusive means of their IP address.
> By clicking the 'Save' button, they agreed to release their edits under
> CC BY-SA, and that their IP address would have been the only form of
> attribution of their changes to them.
> While we can assume that there aren't any collisions between hashes of
> IP addresses, and we could change the attribution requirements for new
> edits, hiding or modifying the way IP addresses /of unregistered users
> who edited before that change/ are shown would be a substantial CC BY-SA
> infringement, as would be a change of registered users' names without
> their consent and without public logs of that change.

Additionally, if we used the same hash function as for new edits, it
would make it pretty trivial to figure out what most of the hashes
are. I think its safe to say we wouldn't modify old edits. After all,
you can still look at
https://en.wikipedia.org/wiki/Special:Contributions/216.143.215.xxx
despite us not using that scheme anymore.

--bawolff

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Anonymous editors & IP addresses

Adam Wight-2
In reply to this post by Chris Steipp
++the EFF for more ideas, they are actively doing great work on so-called
perfect forward secrecy.

There are simple things we could do to achieve a better balance between
privacy and sockpantsing, such as cryptolog [1], in which IP addresses are
hashed using a salt that changes every day.  In theory, nobody can reverse
the function to reveal the IP, but you can still correlate all of an
address's edits for the day, week, or whatever, making CheckUser possible.

IP range blocking obviously needs to happen up-front, before the IP is
mangled.  I have no suggestions, but maybe browser and preferences
fingerprinting would be more effective anyway, since: tor.

-Adam

[1] https://git.eff.org/?p=cryptolog.git;a=summary


On Fri, Jul 11, 2014 at 8:45 AM, Chris Steipp <[hidden email]> wrote:

> On Friday, July 11, 2014, Daniel Kinzler <[hidden email]> wrote:
>
> > Am 11.07.2014 17:19, schrieb Tyler Romeo:
> > > Most likely, we would encrypt the IP with AES or something using a
> > > configuration-based secret key. That way checkusers can still reverse
> the
> > > hash back into normal IP addresses without having to store the mapping
> > in the
> > > database.
> >
> > There are two problems with this, I think.
> >
> > 1) No forward secrecy. If that key is ever leaked, all IPs become
> "plain".
> > And
> > it will be, sooner or later. This would probably not be obvious, so this
> > feature
> > would instill a false sense of security.
> >
>
> This is probably the biggest issue. Even if we hmac it, it's trivial to
> brute force the entire ipv4 (and with intelligent assumptions about
> generation, most of the ipv6) range in seconds, if the key was ever known.
>
>
> >
> > 2) No range blocks. It's often quite useful to be able to block a range
> of
> > IPs.
> > This is an important tool in the fight against spammers, taking it away
> > would be
> > a problem.
> >
>
> Range blocks, I imagine, would continue working the same way they do.
> Someone would have to identify the correct range (which is very difficult
> when administrators can't see IP's), but on submission, we have the IP
> address to check against the blocks. (Unless someone proposes to store
> block ranges as hashes, that would definitely get rid of range blocks).
>
>
> >
> > -- daniel
> >
> > _______________________________________________
> > Wikitech-l mailing list
> > [hidden email] <javascript:;>
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Anonymous editors & IP addresses

Brian Wolff
>but maybe browser and preferences
> fingerprinting would be more effective anyway, since: tor.

Probably not as effective as straight up blocking tor as we do now? :P
(Although seriously - I would love if we didn't block tor like we do
now. However you can't abuse the site with tor when you can't use tor
at all)

I'm somewhat doubtful about fingerprinting (Without doing any research
on it, so I may be out of tune here). We have millions of users,
mostly using commodity software. I'm doubtful we would be able to get
a fingerprint specific enough to uniquely identify a single user. Not
to mention that a sophisticated attacker would probably be able to
easily modify their fingerprint, especially if the fingerprint
criteria is open source [OTOH, a sophisticated attacker can get around
an IP block too].

The cryptolog approach - This has the property that there's a specific
time where all anon identifiers suddenly change (e.g. Midnight every
day in the setup cryptolog uses). Having an arbitrary point in time
where suddenly identifiers shift is probably an unwanted property.
(Although maybe it doesn't matter that much in practice? Someone who
actually deals with abuse on wiki would be better able to answer
that).

I suppose a related approach could be something like
*If this is first time IP edits (recently), make a (pseudo?) random
salt for that IP, throw it in memcached with an expiry time of a week
*Hash the IP with the salt
*Next time IP edits, if salt can be accessed from memcached, use that,
and update the expiry time so that it expires a week from this edit,
otherwise start over with new salt.

This would have the property that if an IP is continuously editing,
their identifier doesn't change, but if they stop editing for a week,
then the identifier switches. Still has the downside that in order for
someone to effectively make a range block they would have to have
checkuser rights (Although perhaps one could make checkuser-lite right
that just exposes IPs of anons, which normal admins get access to).
Also it would be much harder for admins to notice patterns, such as if
a specific subnet seems to be dealing out similar abuse, or if a
specific IP has been blocked once a month for the last 2 years.

--bawolff

On 7/29/14, Adam Wight <[hidden email]> wrote:

> ++the EFF for more ideas, they are actively doing great work on so-called
> perfect forward secrecy.
>
> There are simple things we could do to achieve a better balance between
> privacy and sockpantsing, such as cryptolog [1], in which IP addresses are
> hashed using a salt that changes every day.  In theory, nobody can reverse
> the function to reveal the IP, but you can still correlate all of an
> address's edits for the day, week, or whatever, making CheckUser possible.
>
> IP range blocking obviously needs to happen up-front, before the IP is
> mangled.  I have no suggestions, but maybe browser and preferences
> fingerprinting would be more effective anyway, since: tor.
>
> -Adam
>
> [1] https://git.eff.org/?p=cryptolog.git;a=summary
>
>
> On Fri, Jul 11, 2014 at 8:45 AM, Chris Steipp <[hidden email]> wrote:
>
>> On Friday, July 11, 2014, Daniel Kinzler <[hidden email]> wrote:
>>
>> > Am 11.07.2014 17:19, schrieb Tyler Romeo:
>> > > Most likely, we would encrypt the IP with AES or something using a
>> > > configuration-based secret key. That way checkusers can still reverse
>> the
>> > > hash back into normal IP addresses without having to store the mapping
>> > in the
>> > > database.
>> >
>> > There are two problems with this, I think.
>> >
>> > 1) No forward secrecy. If that key is ever leaked, all IPs become
>> "plain".
>> > And
>> > it will be, sooner or later. This would probably not be obvious, so this
>> > feature
>> > would instill a false sense of security.
>> >
>>
>> This is probably the biggest issue. Even if we hmac it, it's trivial to
>> brute force the entire ipv4 (and with intelligent assumptions about
>> generation, most of the ipv6) range in seconds, if the key was ever known.
>>
>>
>> >
>> > 2) No range blocks. It's often quite useful to be able to block a range
>> of
>> > IPs.
>> > This is an important tool in the fight against spammers, taking it away
>> > would be
>> > a problem.
>> >
>>
>> Range blocks, I imagine, would continue working the same way they do.
>> Someone would have to identify the correct range (which is very difficult
>> when administrators can't see IP's), but on submission, we have the IP
>> address to check against the blocks. (Unless someone proposes to store
>> block ranges as hashes, that would definitely get rid of range blocks).
>>
>>
>> >
>> > -- daniel
>> >
>> > _______________________________________________
>> > Wikitech-l mailing list
>> > [hidden email] <javascript:;>
>> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>> _______________________________________________
>> Wikitech-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l