Wikipedia tracks user behaviour via third party companies #2

classic Classic list List threaded Threaded
53 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Wikipedia tracks user behaviour via third party companies #2

Peter Gervai-5
Hello,

I wasn't subscribed to this list, since I usually try to avoid the
politics around.

I was notified, however, that some interesting claims were made and
some steps taken (again) without any discussion whatsoever.

First, let me tell it here again - as I have told it on a different
list - that I am extremely disappointed by the lack of discussion
before someone from outside seriously interfere with other project
based on, as it turns out, incorrect informations. In the past people
with privileges (if we ever considered them that way instead of people
with work to be done) were more cautious. I would like you all
fast-handed guys to slow down and talk first, get informed, and act
later.

I already commented elsewhere on vls, in summary I miss the discussion
and I do not believe the case actually breached any privacy, but this
isn't my concern now (as I'm in a bit of hurry).

Regarding huwp, it would have been pretty easy to find out who to ask.
Apart from the obvious choice of "anyone with any flags on huwp", it
could've been easy to identify who made the changes, and ask them.
Like, for example me.

As far as I see, lots of wasted energies go around, like people
planning how to block javascript, how to block counters, etc. It is
the wrong way. The good way is, and I'm repeating myself again, is
FIRST to get to know WHY these scripts are there in the first hand,
what solution they have to solve. This is a crucial step, fellows,
which you neglected to take. (And we all know that the reason is to
create usage stats.)

Next step should be examining whether there is anything this violates,
like, Privacy Policy. In the case of Google this is debateable, since
I don't know what is the scope of the data retention.

However I completely do know about the Hungarian stats. Let me share
the real information here, briefly, since I have to go soon, but I do
not want to let you destroy something you're not aware of.

The stats (which have, by surprise, a dedicated domain under th hu
wikipedia domain) runs on a dedicated server, with nothing else on it.
Its sole purpose to gather and publish the stats. Basically nobody
have permission to log in the servers but me, and I since I happen to
be checkuser as well it wouldn't even be ntertaining to read it, even
if it wasn't big enough making this useless. I happen to be the one
who have created the Hungarian checkuser policy, which is, as far as I
know, the strictest one in WMF projects, and it's no joke, and I
intend to follow it. (And those who are unfamiliar with me, I happen
to be the founder of huwp as well, apart from my job in computer
security.)

If you would have gathered this knowledge (which means that the server
is closed and run by an identified user to WMF), then you could have
started the discussion.

As it is obvious, don't make any interfering moves while discussing it
for days, or even weeks, wouldn't change anything.

What have you achieved with removing the code? You killed our stats,
which provides us with the statistics originally WMF provided (same
data content), but later killed off.

We'll propose (huwp) some solutions on the problem, but I'll really
have to go now. Tgr can help discussing it, and I'll thank him for his
help in advance. :-)

So, think about these in the weekend, I'm back on monday. I hop there
can be an _useful_ discussion, with thinking people and not people
acting on impulses.

Peter Gervai
Hungary

_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Wikipedia tracks user behaviour via third party companies #2

Nathan Awrich
I can understand your frustration, Peter, but perhaps hu.wp could also have
taken a more collaborative approach. If you would like to use a method for
collecting statistics that others will view as violating the privacy policy,
or as presenting risks normally not considered throughout the rest of the
Wikimedia community of projects, then you should propose your method for
consideration prior to simply implementing it. As you note ("some steps
taken (again)") this has happened before, so some consultation with the rest
of the community before going out on a limb is advised.

Others have since discussed more centralised and secure methods for
providing these statistics via the WMF - this is the ideal outcome, and one
that might have been achieved earlier had you proposed your method rather
than simply going ahead alone.

Nathan
_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Wikipedia tracks user behaviour via third party companies #2

Unionhawk
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Nathan wrote:

> I can understand your frustration, Peter, but perhaps hu.wp could also have
> taken a more collaborative approach. If you would like to use a method for
> collecting statistics that others will view as violating the privacy policy,
> or as presenting risks normally not considered throughout the rest of the
> Wikimedia community of projects, then you should propose your method for
> consideration prior to simply implementing it. As you note ("some steps
> taken (again)") this has happened before, so some consultation with the rest
> of the community before going out on a limb is advised.
>
> Others have since discussed more centralised and secure methods for
> providing these statistics via the WMF - this is the ideal outcome, and one
> that might have been achieved earlier had you proposed your method rather
> than simply going ahead alone.
>
> Nathan
> _______________________________________________
> foundation-l mailing list
> [hidden email]
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

- From what I'm reading, the foundation already collects raw data
containing information collected by most normal websites (IP, I guess)
and such data can only be released under special circumstances.

External stats appear to violate the privacy policy, to me.
(http://meta.wikimedia.org/wiki/Meta:Privacy_policy)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkopYiUACgkQSPTq06lEuY8jeACfSIzcWQnOC0rbAYArBjV1QJoZ
CooAoKCFnx5tasAe5O3+y5YlBFhlvdKQ
=NU8H
-----END PGP SIGNATURE-----

_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Wikipedia tracks user behaviour via third party companies #2

Bence Damokos
In reply to this post by Nathan Awrich
I'd like to note in the interest of facts that the Huwp stats have been
implemented (without complaint till now, June 2009) since October 2006; the
current version of the privacy policy has been available in English since
October 2008.

I think it might not be very productive to judge the action of implementing
a stats engine in light of a privacy policy that has been adopted later than
the action was performed nor might it be fruitful to shift blame for not
discussing something three years ago (which could even have been discussed
in some way).
 Best regards,
Bence Damokos
On Fri, Jun 5, 2009 at 8:15 PM, Nathan <[hidden email]> wrote:

>
> Others have since discussed more centralised and secure methods for
> providing these statistics via the WMF - this is the ideal outcome, and one
> that might have been achieved earlier had you proposed your method rather
> than simply going ahead alone.
>
>
_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Wikipedia tracks user behaviour via third party companies #2

Effe iets anders
In reply to this post by Peter Gervai-5
2009/6/5 Peter Gervai <[hidden email]>

> <snip>
> The stats (which have, by surprise, a dedicated domain under th hu
> wikipedia domain) runs on a dedicated server, with nothing else on it.
> Its sole purpose to gather and publish the stats. Basically nobody
> have permission to log in the servers but me, and I since I happen to
> be checkuser as well it wouldn't even be ntertaining to read it, even
> if it wasn't big enough making this useless. I happen to be the one
> who have created the Hungarian checkuser policy, which is, as far as I
> know, the strictest one in WMF projects, and it's no joke, and I
> intend to follow it. (And those who are unfamiliar with me, I happen
> to be the founder of huwp as well, apart from my job in computer
> security.)
> <snip>
>

Just a remark on the checkuser argument. Checkuser actions and checks are
logged, and can be double checked by other checkusers and stewards. This
server can not. I can imagine that this would pose a problem.

eia
_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Wikipedia tracks user behaviour via third party companies #2

Alex Zaddach
effe iets anders wrote:

> 2009/6/5 Peter Gervai <[hidden email]>
>
>> <snip>
>> The stats (which have, by surprise, a dedicated domain under th hu
>> wikipedia domain) runs on a dedicated server, with nothing else on it.
>> Its sole purpose to gather and publish the stats. Basically nobody
>> have permission to log in the servers but me, and I since I happen to
>> be checkuser as well it wouldn't even be ntertaining to read it, even
>> if it wasn't big enough making this useless. I happen to be the one
>> who have created the Hungarian checkuser policy, which is, as far as I
>> know, the strictest one in WMF projects, and it's no joke, and I
>> intend to follow it. (And those who are unfamiliar with me, I happen
>> to be the founder of huwp as well, apart from my job in computer
>> security.)
>> <snip>
>>
>
> Just a remark on the checkuser argument. Checkuser actions and checks are
> logged, and can be double checked by other checkusers and stewards. This
> server can not. I can imagine that this would pose a problem.
>

Checkuser also only stores the data for a known period of time (3
months) and, with the fairly recent exception of user->user email, only
records actions that are publicly logged by MediaWiki (edits and other
logged actions), not individual pageviews.

--
Alex (wikipedia:en:User:Mr.Z-man)

_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Wikipedia tracks user behaviour via third party companies #2

Gergő Tisza
In reply to this post by Bence Damokos
Bence Damokos <bdamokos@...> writes:

> I'd like to note in the interest of facts that the Huwp stats have been
> implemented (without complaint till now, June 2009) since October 2006; the
> current version of the privacy policy has been available in English since
> October 2008.

It was implemented in October 2005, actually (not long after the knams stats
stopped IIRC); MediaWiki:Lastmodifiedat replaced an earlier message in 2006,
that is why the page history doesn't go back further.

More importantly, the privacy policy explicitly states that developers might
have access to the raw logs. The stat is thus in compliance with the letter of
the privacy policy, and I don't see why it would be countrary of its spirit. (As
stated, the only purpose is to provide statistics which include no personally
identifiable information; the operator is one of the most trusted users of the
hu.wp community, the founder of the community, the head of Wikimedia Hungary,
admin, bureaucrat, checkuser, whatnot; and the stat server was operated with the
knowledge and consent of the community. It is linked from the statistics page
and other relevant places, not exactly a secret.)


_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Wikipedia tracks user behaviour via third party companies #2

Bennó
 
And that without any complain from 2005 onward (practically from the
beginning of huwiki's real existence).

B.

-----Original Message-----
It is linked from the statistics page and other relevant places, not exactly
a secret.)

 

__________ ESET Smart Security - Vírusdefiníciós adatbázis: 4134 (20090605)
__________

Az üzenetet az ESET Smart Security ellenorizte.

http://www.eset.hu
 


_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Wikipedia tracks user behaviour via third party companies #2

Mark (Markie)
In reply to this post by Gergő Tisza
On Fri, Jun 5, 2009 at 9:49 PM, Tisza Gergő <[hidden email]> wrote:

> Bence Damokos <bdamokos@...> writes:
>
> > I'd like to note in the interest of facts that the Huwp stats have been
> > implemented (without complaint till now, June 2009) since October 2006;
> the
> > current version of the privacy policy has been available in English since
> > October 2008.
>
> It was implemented in October 2005, actually (not long after the knams
> stats
> stopped IIRC); MediaWiki:Lastmodifiedat replaced an earlier message in
> 2006,
> that is why the page history doesn't go back further.
>
> More importantly, the privacy policy explicitly states that developers
> might
> have access to the raw logs. The stat is thus in compliance with the letter
> of
> the privacy policy, and I don't see why it would be countrary of its
> spirit. (As
> stated, the only purpose is to provide statistics which include no
> personally
> identifiable information; the operator is one of the most trusted users of
> the
> hu.wp community, the founder of the community, the head of Wikimedia
> Hungary,
> admin, bureaucrat, checkuser, whatnot; and the stat server was operated
> with the
> knowledge and consent of the community. It is linked from the statistics
> page
> and other relevant places, not exactly a secret.)
>

There are a few issues with this.  Devs have access to logs on WMF servers,
not random external servers.  The community cannot decide that Random_user1
and Random_user2 etc will agree with the communities view on the stats being
passed to an external server.  Also there *may* be issues with the security
of that server that means it could be compromised and could probably be
accessed by the web hosting company if they so wished.

I still fail to see how, at this point (not before when there was no policy)
this can be considered to be acceptable.  IP information etc is still being
passed to an external server, regardless of who it is being operated by.  As
we can see at http://meta.wikimedia.org/wiki/Privacy and copied below I
don't see where this is acceptable.

Release: Policy on Release of Data

It is the policy of Wikimedia that personally identifiable data collected in
the server logs, or through records in the database via the CheckUser
feature, or through other non-publicly-available methods, may be released by
Wikimedia volunteers or staff, in any of the following situations:

   1. In response to a valid subpoena or other compulsory request from law
   enforcement,
   2. With permission of the affected user,
   3. When necessary for investigation of abuse complaints,
   4. Where the information pertains to page views generated by a spider or
   bot and its dissemination is necessary to illustrate or resolve technical
   issues,
   5. Where the user has been vandalizing articles or persistently behaving
   in a disruptive way, data may be released to a service provider, carrier, or
   other third-party entity to assist in the targeting of IP blocks, or to
   assist in the formulation of a complaint to relevant Internet Service
   Providers,
   6. Where it is reasonably necessary to protect the rights, property or
   safety of the Wikimedia Foundation, its users or the public.

Except as described above, Wikimedia policy does not permit distribution of
personally identifiable information under any circumstances.


Regards


Mark


>
>
> _______________________________________________
> foundation-l mailing list
> [hidden email]
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Wikipedia tracks user behaviour via third party companies #2

Michael Snow-3
Mark (Markie) wrote:
> I still fail to see how, at this point (not before when there was no policy)
> this can be considered to be acceptable.
As I understand it, nobody is arguing that it's considered acceptable at
this point. People involved in the Hungarian Wikipedia have been
explaining the background, trying to establish that they shouldn't be
blamed for having this in place. That's understandable as well, and I
have no interest in seeing blame attached to anyone here. Let's just
make sure these external trackers are removed, and that we work on our
internal resources to collect information in a way consistent with the
privacy policy.

--Michael Snow

_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Wikipedia tracks user behaviour via third party companies #2

Gergő Tisza
In reply to this post by Nathan Awrich
Nathan <nawrich@...> writes:

> Others have since discussed more centralised and secure methods for
> providing these statistics via the WMF - this is the ideal outcome, and one
> that might have been achieved earlier had you proposed your method rather
> than simply going ahead alone.

Setting up an off-the-shelf awstats with an invisible pixel is web statistics
101, not something that needs to be invented. The reason nothing similar got
implemented is not that nobody thought of this method, but that it wouldn't work
with enwiki so nobody cared. Actually, the old knams stat (which also collected
referrers, so it was in some aspects superior) could have been easily kept
working by filtering out enwiki, and maybe the next few largest projects; again,
nobody cared. Features that only benefit the smaller projects rarely get enough
developer interest, which is understandable, but then it is only natural that
those smaller projects try to solve their issues for themselves. And we did it
with privacy in mind - we would have obviously preferred Google Analytics
ourselves, but we didn't switch because we didn't want the logs to leak to
servers not controlled by WM community, and because it shows data that can be
used to identify IP adresses.


_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Wikipedia tracks user behaviour via third party companies #2

Mark (Markie)
In reply to this post by Michael Snow-3
Apologies for this, I'm getting confused between multiple threads on this.
Regards

Mark

On Fri, Jun 5, 2009 at 10:22 PM, Michael Snow <[hidden email]> wrote:

> Mark (Markie) wrote:
> > I still fail to see how, at this point (not before when there was no
> policy)
> > this can be considered to be acceptable.
> As I understand it, nobody is arguing that it's considered acceptable at
> this point. People involved in the Hungarian Wikipedia have been
> explaining the background, trying to establish that they shouldn't be
> blamed for having this in place. That's understandable as well, and I
> have no interest in seeing blame attached to anyone here. Let's just
> make sure these external trackers are removed, and that we work on our
> internal resources to collect information in a way consistent with the
> privacy policy.
>
> --Michael Snow
>
> _______________________________________________
> foundation-l mailing list
> [hidden email]
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Wikipedia tracks user behaviour via third party companies #2

Aryeh Gregor
In reply to this post by Michael Snow-3
On Fri, Jun 5, 2009 at 5:22 PM, Michael Snow<[hidden email]> wrote:
> As I understand it, nobody is arguing that it's considered acceptable at
> this point.

Peter Gervai seemed to argue exactly that, unless I badly misread him:

> someone from outside seriously interfere with other project
> based on, as it turns out, incorrect informations. . . .
>
> . . . I do not believe the case actually breached any privacy . . .

And so did Tisza Gergő:

> More importantly, the privacy policy explicitly states that developers might
> have access to the raw logs. The stat is thus in compliance with the letter of
> the privacy policy, and I don't see why it would be countrary of its spirit.

The privacy policy clearly prohibits "release" of data to outside
sources for the purpose of statistical analysis, since that doesn't
fall within the six enumerated points under "Release: Policy on
Release of Data".  I suppose it's arguable by the letter of the policy
that sending the data to a server which only a single Wikipedian has
access to isn't "release".  However, I think it's clear that the
intent of the policy was otherwise, and Domas acted in accordance with
established policy and with full understanding of the nature of the
script he was removing.

It might be worth defining "release" more clearly to avoid any
confusion in the future.  Would it have been any different if it was
being sent to the toolserver instead of a totally third-party server,
for instance?  I'd think not, but it's not fully clear from reading
the policy.  How about a checkuser downloading some data to his
computer for analysis beyond that permitted by the web-based
interface?  Why is that not release if downloading it to a server is?
Does that depend on the amount, intent, or some other purpose?  (Or is
it release?  If so, why is it different from downloading web pages so
you can view them in your browser?)

Also, there are multiple places where the policy vaguely and
redundantly states that logs will not be publicized, in multiple ways:
"is not made public", "will not be published", "is not reproduced
publicly".  In general, there's a lot of repetition that makes the
policy hard to draw firm conclusions from.  If you just saw those
mentions, you might think it was just fine to reproduce it as long as
it wasn't actually *public*.  It could use more precise and condensed
wording.

_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Wikipedia tracks user behaviour via third party companies #2

Gergő Tisza
In reply to this post by Mark (Markie)
Mark (Markie <newsmarkie@...> writes:

> I still fail to see how, at this point (not before when there was no policy)
> this can be considered to be acceptable.  IP information etc is still being
> passed to an external server, regardless of who it is being operated by.  As
> we can see at http://meta.wikimedia.org/wiki/Privacy and copied below I
> don't see where this is acceptable.
>
> Release: Policy on Release of Data
>
> It is the policy of Wikimedia that personally identifiable data collected in
> the server logs, or through records in the database via the CheckUser
> feature, or through other non-publicly-available methods, may be released by
> Wikimedia volunteers or staff, in any of the following situations:
>
>    1. In response to a valid subpoena or other compulsory request from law
>    enforcement,
>    2. With permission of the affected user,
>    3. When necessary for investigation of abuse complaints,
>    4. Where the information pertains to page views generated by a spider or
>    bot and its dissemination is necessary to illustrate or resolve technical
>    issues,
>    5. Where the user has been vandalizing articles or persistently behaving
>    in a disruptive way, data may be released to a service provider, carrier, or
>    other third-party entity to assist in the targeting of IP blocks, or to
>    assist in the formulation of a complaint to relevant Internet Service
>    Providers,
>    6. Where it is reasonably necessary to protect the rights, property or
>    safety of the Wikimedia Foundation, its users or the public.
>
> Except as described above, Wikimedia policy does not permit distribution of
> personally identifiable information under any circumstances.

It also says, a few sentences earlier, that "Sharing information with other
privileged users is not considered distribution." And Peter has identified
himself to the foundation according to the access to nonpublic data policy, so
he is a privileged user. I still don't see any violation there - the point of
the privacy policy is to regulate release of personally identifiable information
from those who have access to those who have not, and in this case no such
release happened.

> Also there *may* be issues with the security
> of that server that means it could be compromised and could probably be
> accessed by the web hosting company if they so wished.

Peter is CTO of a Hungarian ISP; he is the one hosting the server, and he
certainly has the required expertise. Anyway, the privacy policy explicitly
disclaims any responsibility for unauthorized access; while the security of the
server is certainly a valid issue, it is not an issue with the privacy policy.


_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Wikipedia tracks user behaviour via third party companies #2

Pedro Sanchez-2
On Fri, Jun 5, 2009 at 4:44 PM, Tisza Gergő <[hidden email]> wrote:

> Mark (Markie <newsmarkie@...> writes:
>
> > I still fail to see how, at this point (not before when there was no
> policy)
> > this can be considered to be acceptable.  IP information etc is still
> being
> > passed to an external server, regardless of who it is being operated by.
>  As
> > we can see at http://meta.wikimedia.org/wiki/Privacy and copied below I
> > don't see where this is acceptable.
> >
> > Release: Policy on Release of Data
> >
> > It is the policy of Wikimedia that personally identifiable data collected
> in
> > the server logs, or through records in the database via the CheckUser
> > feature, or through other non-publicly-available methods, may be released
> by
> > Wikimedia volunteers or staff, in any of the following situations:
> >
> >    1. In response to a valid subpoena or other compulsory request from
> law
> >    enforcement,
> >    2. With permission of the affected user,
> >    3. When necessary for investigation of abuse complaints,
> >    4. Where the information pertains to page views generated by a spider
> or
> >    bot and its dissemination is necessary to illustrate or resolve
> technical
> >    issues,
> >    5. Where the user has been vandalizing articles or persistently
> behaving
> >    in a disruptive way, data may be released to a service provider,
> carrier, or
> >    other third-party entity to assist in the targeting of IP blocks, or
> to
> >    assist in the formulation of a complaint to relevant Internet Service
> >    Providers,
> >    6. Where it is reasonably necessary to protect the rights, property or
> >    safety of the Wikimedia Foundation, its users or the public.
> >
> > Except as described above, Wikimedia policy does not permit distribution
> of
> > personally identifiable information under any circumstances.
>
> It also says, a few sentences earlier, that "Sharing information with other
> privileged users is not considered distribution." And Peter has identified
> himself to the foundation according to the access to nonpublic data policy,
> so
> he is a privileged user. I still don't see any violation there - the point
> of
> the privacy policy is to regulate release of personally identifiable
> information
> from those who have access to those who have not, and in this case no such
> release happened.
>

Minor correction: Privacy-related trusted users are required to be
identified to the foundation. Yes.
But doesn't work the other way: just by sending id to the foundation doesn't
make you automatically a trusted user for private data.

Peter may well be knowledgeable and trusted, but not becuse he has
identified to the foundation
_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Wikipedia tracks user behaviour via third party companies #2

Gergő Tisza
In reply to this post by Michael Snow-3
Michael Snow <wikipedia@...> writes:
> As I understand it, nobody is arguing that it's considered acceptable at
> this point. People involved in the Hungarian Wikipedia have been
> explaining the background, trying to establish that they shouldn't be
> blamed for having this in place. That's understandable as well, and I
> have no interest in seeing blame attached to anyone here. Let's just
> make sure these external trackers are removed, and that we work on our
> internal resources to collect information in a way consistent with the
> privacy policy.

I do argue that it is not in violation of the privacy policy (whether the people
here find it acceptable is another question). The privacy policy and the
nonpublic access policy together place a very clear limit on the distribution of
personally identifiable data: it can never be passed to anyone who has not
identified himself to the WMF. We respected that limit; and I don't see any
stricter one in the policy. (I don't think it would be even reasonable to have
one; Simetrical already gave the arguments I intended to use.)

At any rate, the tracker has already been disabled by Domas, and obviously we
don't intend to switch it back without reaching consensus here.


_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Wikipedia tracks user behaviour via third party companies #2

Aryeh Gregor
On Fri, Jun 5, 2009 at 5:58 PM, Tisza Gergő<[hidden email]> wrote:
> I do argue that it is not in violation of the privacy policy (whether the people
> here find it acceptable is another question).

It may be within the letter of the privacy policy.  I think that's
entirely arguable, since the policy is so vague.  However, it's very
clearly against the *intent* of the privacy policy as dictated by the
Board.  Domas Mitzuas and Michael Snow are both Board members and have
both made it clear that they think there's no question that the script
in question violated the privacy policy.

I believe the major problems with the script are

1) It sent data to a server not directly controlled by the Wikimedia
Foundation.  No personally identifiable information should be sent in
bulk to any non-Wikimedia server.  Operation of any server hosting
significant amounts of sensitive information must be directly and
immediately accountable to Wikimedia's normal chain of command.

2) This use of data was not specifically authorized by the Wikimedia
Foundation, via either the Board or appropriate officers.  Peter may
be a checkuser, but that gives him authorization only to use checkuser
functions, not to collect or harvest other types of data.  As has been
noted, the data collected includes much more than checkusers can
access in the course of using their checkuser rights.

Neither of these points is made clear in the written privacy policy,
however, if they are in fact intended.

Last I heard, Erik Zachte is working on improved statistics for all
Wikimedia projects.  These are running on Wikimedia servers and
specifically approved by Wikimedia.  It seems like the best course of
action would be for people to point out what they think is lacking in
his statistics, and perhaps offer to help improve them.

_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Wikipedia tracks user behaviour via third party companies #2

Gergő Tisza
In reply to this post by Gergő Tisza
Tisza Gergő <gtisza@...> writes:
> I do argue that it is not in violation of the privacy policy (whether
> the people here find it acceptable is another question).

Just to make it clear, I don't think accordance with the privacy policy
automatically entitles one to do something. The PP is a minimum set of
requirements strong enough to assure users and weak enough to not hinder
ourselves (as it is difficult to change it); if something is permitted by the
policy, but the WMF or the developers or the relevant community is against it,
then it will not be done. So instead of talking about the privacy policy (which
would be routinely violated if spread of IP data to non-WMF-owned servers would
indeed be a violation - consider WikiMiniAtlas, for example) it would be more
productive to talk about whether such a use is acceptable, and if not, what can
be done to make it so. (For example, would it help if WM-HU took ownership? We
could also write a complementary privacy policy for it, stating that it will
never be used for any other reason than statistics, who has access, how long the
raw logs are kept etc.)


_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Wikipedia tracks user behaviour via third party companies #2

Michael Snow-3
In reply to this post by Aryeh Gregor
Aryeh Gregor wrote:

> On Fri, Jun 5, 2009 at 5:22 PM, Michael Snow<[hidden email]> wrote:
>  
>> As I understand it, nobody is arguing that it's considered acceptable at
>> this point.
>>    
> Peter Gervai seemed to argue exactly that, unless I badly misread him:
>
>
> And so did Tisza Gergő:
>  
Maybe it's just the lawyer in me, but I read those comments primarily as
a defense against a perceived "prosecution" for allegedly violating the
privacy policy. Not, and this is the distinction I was trying to get at,
as positive arguments that this particular approach should be accepted
going forward.
> I suppose it's arguable by the letter of the policy
> that sending the data to a server which only a single Wikipedian has
> access to isn't "release".  However, I think it's clear that the
> intent of the policy was otherwise, and Domas acted in accordance with
> established policy and with full understanding of the nature of the
> script he was removing.
>  
I agree that regardless of whether there was a technical policy
violation, the setup was problematic, and I trust Domas's judgment in
addressing the situation.
> Also, there are multiple places where the policy vaguely and
> redundantly states that logs will not be publicized, in multiple ways:
> "is not made public", "will not be published", "is not reproduced
> publicly".  In general, there's a lot of repetition that makes the
> policy hard to draw firm conclusions from.  If you just saw those
> mentions, you might think it was just fine to reproduce it as long as
> it wasn't actually *public*.  It could use more precise and condensed
> wording.
>  
Policies being what they are, at some level it must state principles and
will not be able to anticipate every single case. Implementation then
depends on people exercising judgment when those cases arise. Some of
the redundancy is possibly for emphasis, or out of an abundance of
caution, so that people don't think an exception arises when something
is not explicitly stated. That being said, suggestions for particular
improvements are always welcome.

--Michael Snow


_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Reply | Threaded
Open this post in threaded view
|

Re: Wikipedia tracks user behaviour via third party companies #2

Gergő Tisza
In reply to this post by Aryeh Gregor
Aryeh Gregor <Simetrical+wikilist@...> writes:
 
> I believe the major problems with the script are
>
> 1) It sent data to a server not directly controlled by the Wikimedia
> Foundation.  No personally identifiable information should be sent in
> bulk to any non-Wikimedia server.  Operation of any server hosting
> significant amounts of sensitive information must be directly and
> immediately accountable to Wikimedia's normal chain of command.

I don't think thats reasonable. WikiMiniAtlas, for example, is hosted by WM-DE,
thus every time it is used, IP data is sent to a non-WMF server. (Users have to
click to load it, but it is linked from every page that has coordinates, so it
can be considered bulk. And when it gets replaced with OSM, static map snippets
will be loaded by default from a WM-DE-owned cache server, if I understand the
setup correctly.)

Of course, there should be *some* limit on what servers can receive data. As I
said, the obvious choice for me would be to tie it to chapters (maybe it could
even be included in the chapter agreement?). That, and maybe WMF staff should
have root access for emergencies?
 
> 2) This use of data was not specifically authorized by the Wikimedia
> Foundation, via either the Board or appropriate officers.  Peter may
> be a checkuser, but that gives him authorization only to use checkuser
> functions, not to collect or harvest other types of data.  As has been
> noted, the data collected includes much more than checkusers can
> access in the course of using their checkuser rights.

Agreed. So consider this as a request for authorization :)

> Last I heard, Erik Zachte is working on improved statistics for all
> Wikimedia projects.  These are running on Wikimedia servers and
> specifically approved by Wikimedia.  It seems like the best course of
> action would be for people to point out what they think is lacking in
> his statistics, and perhaps offer to help improve them.

Certainly, but that in itself is no reason not to have another system for the
time being. It is not unheard of that developement of new features get delayed
by a few years :) We have a working system in place; I don't think it should be
removed just becuase there will be a better one at some indefinite point in
time. It can removed at that time just as well.

As for statistics-related feature requests, I would have quite a few :) Unique
visits/visitors, referrer data, country/browser/OS distribution (I seem to
recall seeing something like this in Erik's stats, but I can't find it now),
breakdown by action and by user group, search term statistics (without the
wikistics.falsicon.de JS hack), gadget usage data. An API would also be nice (so
that for example a user script can query the data for all internal links on the
page, and show a colormap - it would be a nice tool for designing the layouts of
portals).

(It would be somewhat unfair to say Erik's starts are lacking these, since our
stat can't measure most of them either. What I would miss most would be visitor
counts and browser distribution. Also, I think stats.grok.se and
wikistics.falsicon.de give slightly incorrect page view results because they
don't take redirects and special pages into account.)


_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
123