[Wikimedia-l] Data privacy, encrypted links and recent change captures

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[Wikimedia-l] Data privacy, encrypted links and recent change captures

John Mark Vandenberg
We know NSA wants Wikipedia data, as Wikipedia is listed in one of the
NSA slides:

https://commons.wikimedia.org/wiki/File:KS8-001.jpg

That slide is about HTTP, and the tech staff are moving the
user/reader base to HTTPS.

As we learn more about the NSA programs, we need to consider vectors
other than HTTP for the NSA to obtain the data they want.  And the
userbase needs to be aware of the current risks.

One question from the "Dells are backdored"[sic] thread that is worth
separate consideration is:

Are the Wikimedia transit links encrypted, especially for database replication?
MySQL has replication over SSL, so I assume the answer is Yes.

If not, is this necessary or useful, and feasible ?

However we also need to consider that SSL and other encryption may be
useless against NSA/etc, which means replicating non-public data
should be avoided wherever possible, as it becomes a single point of
failure.

Given how public our system is, we don't have a lot of non-public
data, so we might be able to design the architecture so that
information isnt replicated, and also ensure it isnt accessed over
insecure links.  I think the only parts of the dataset that are
private & valuable are
* passwords/login cookies,
* checkuser info - IPs and useragents,
* WMF analytics, which includes readers iirc, and
* hidden/deleted edits
* private wikis and mailing lists

Have I missed any?

Are passwords and/or checkuser info replicated?

Is there a data policy on WMF analytics data which prevents it flowing
over insecure links, and limits what is collected and ensures
destruction of the data within reasonable timeframes?  i.e. how about
not using cookies to track analytics of readers who are on HTTP
instead of HTTPS?

The private wikis can be restricted to https, depending on the value
of the data on those wikis in the wrong hands.  The private mailing
lists will be harder to secure, and at least the English Wikipedia
arbcom list contain a lot of valuable data about contributors.

Regarding hidden/deleted edits, the replication isnt the only source
of this data.  All edits are also exposed via Recent Changes
(https/api/etc) as they occur, and the value of these edits is
determined by the fact they are hidden afterwards (e.g. don't appear
in dumps).  Is there any way to control who is effectively capturing
all edits via Recent Changes?

--
John Vandenberg

_______________________________________________
Wikimedia-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:[hidden email]?subject=unsubscribe>
Reply | Threaded
Open this post in threaded view
|

Re: [Wikimedia-l] Data privacy, encrypted links and recent change captures

Jasper Deng
SSL makes it more difficult; some private wikis are already restricted to
SSL. We also have to consider that irc.wikimedia.org has a recent changes
feed.

At minimum, the transit links should be encrypted if feasible. A good
reason not to encrypt is that it's extra performance overhead.


On Sun, Dec 29, 2013 at 11:10 PM, John Vandenberg <[hidden email]> wrote:

> We know NSA wants Wikipedia data, as Wikipedia is listed in one of the
> NSA slides:
>
> https://commons.wikimedia.org/wiki/File:KS8-001.jpg
>
> That slide is about HTTP, and the tech staff are moving the
> user/reader base to HTTPS.
>
> As we learn more about the NSA programs, we need to consider vectors
> other than HTTP for the NSA to obtain the data they want.  And the
> userbase needs to be aware of the current risks.
>
> One question from the "Dells are backdored"[sic] thread that is worth
> separate consideration is:
>
> Are the Wikimedia transit links encrypted, especially for database
> replication?
> MySQL has replication over SSL, so I assume the answer is Yes.
>
> If not, is this necessary or useful, and feasible ?
>
> However we also need to consider that SSL and other encryption may be
> useless against NSA/etc, which means replicating non-public data
> should be avoided wherever possible, as it becomes a single point of
> failure.
>
> Given how public our system is, we don't have a lot of non-public
> data, so we might be able to design the architecture so that
> information isnt replicated, and also ensure it isnt accessed over
> insecure links.  I think the only parts of the dataset that are
> private & valuable are
> * passwords/login cookies,
> * checkuser info - IPs and useragents,
> * WMF analytics, which includes readers iirc, and
> * hidden/deleted edits
> * private wikis and mailing lists
>
> Have I missed any?
>
> Are passwords and/or checkuser info replicated?
>
> Is there a data policy on WMF analytics data which prevents it flowing
> over insecure links, and limits what is collected and ensures
> destruction of the data within reasonable timeframes?  i.e. how about
> not using cookies to track analytics of readers who are on HTTP
> instead of HTTPS?
>
> The private wikis can be restricted to https, depending on the value
> of the data on those wikis in the wrong hands.  The private mailing
> lists will be harder to secure, and at least the English Wikipedia
> arbcom list contain a lot of valuable data about contributors.
>
> Regarding hidden/deleted edits, the replication isnt the only source
> of this data.  All edits are also exposed via Recent Changes
> (https/api/etc) as they occur, and the value of these edits is
> determined by the fact they are hidden afterwards (e.g. don't appear
> in dumps).  Is there any way to control who is effectively capturing
> all edits via Recent Changes?
>
> --
> John Vandenberg
>
> _______________________________________________
> Wikimedia-l mailing list
> [hidden email]
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:[hidden email]?subject=unsubscribe>
_______________________________________________
Wikimedia-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:[hidden email]?subject=unsubscribe>
Reply | Threaded
Open this post in threaded view
|

Re: [Wikimedia-l] Data privacy, encrypted links and recent change captures

Federico Leva (Nemo)
In reply to this post by John Mark Vandenberg
John Vandenberg, 30/12/2013 08:10:
> Are the Wikimedia transit links encrypted, especially for database replication?
> MySQL has replication over SSL, so I assume the answer is Yes.
>
> If not, is this necessary or useful, and feasible ?

It's currently the last todo in
<https://wikitech.wikimedia.org/wiki/HTTPS/Future_work#Security_enhancements>,
AFAICS.
The status of that document/work is unknown.

Nemo

_______________________________________________
Wikimedia-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:[hidden email]?subject=unsubscribe>
Reply | Threaded
Open this post in threaded view
|

Re: [Wikimedia-l] Data privacy, encrypted links and recent change captures

John Mark Vandenberg
On 01/01/2014 9:11 AM, "Federico Leva (Nemo)" <[hidden email]> wrote:
>
> John Vandenberg, 30/12/2013 08:10:
>
>> Are the Wikimedia transit links encrypted, especially for database
replication?
>> MySQL has replication over SSL, so I assume the answer is Yes.
>>
>> If not, is this necessary or useful, and feasible ?
>
>
> It's currently the last todo in <
https://wikitech.wikimedia.org/wiki/HTTPS/Future_work#Security_enhancements>,
AFAICS.
> The status of that document/work is unknown..

Could we have an update on what is being done over the last year to protect
the privacy of user data sent between datacenters?

--
John Vandenberg
_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:[hidden email]?subject=unsubscribe>
Reply | Threaded
Open this post in threaded view
|

Re: [Wikimedia-l] Data privacy, encrypted links and recent change captures

Brion Vibber-4
On Tue, Mar 10, 2015 at 12:15 PM, John Mark Vandenberg <[hidden email]>
wrote:

>
> Could we have an update on what is being done over the last year to protect
> the privacy of user data sent between datacenters?
>

Someone in ops could add more detail on the actual work in progress, but
you can watch the public tickets in Phabricator for some updates:

https://phabricator.wikimedia.org/tag/interdatacenter-ipsec/
^ there is some infrastructure work required on getting ipsec going on the
links between data centers; I am given to understand more work is coming
soon on this. Hopefully there will be updates on this ticket. :)

https://phabricator.wikimedia.org/tag/https-by-default/
^ some HTTPS frontend work items

-- brion
_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:[hidden email]?subject=unsubscribe>