[Wikimedia-l] Research showcase: Evolution of privacy loss in Wikipedia

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[Wikimedia-l] Research showcase: Evolution of privacy loss in Wikipedia

Dario Taraborelli-3
This month, our research showcase
<https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#March_2016> hosts
Andrei Rizoiu (Australian National University) to talk about his work
<http://cm.cecs.anu.edu.au/post/wikiprivacy/> on *how private traits of
Wikipedia editors can be exposed from public data* (such as edit histories)
using off-the-shelf machine learning techniques. (abstract below)

If you're interested in learning what the combination of machine learning
and public data mean for privacy and surveillance, come and join us
this *Wednesday
March 16* at *1pm Pacific Time*.

The event will be recorded and publicly streamed
<https://www.youtube.com/watch?v=Xle0oOFCNnk>. As usual, we will be hosting
the conversation with the speaker and Q&A on the #wikimedia-research
channel on IRC.

Looking forward to seeing you there,

Dario


Evolution of Privacy Loss in WikipediaThe cumulative effect of collective
online participation has an important and adverse impact on individual
privacy. As an online system evolves over time, new digital traces of
individual behavior may uncover previously hidden statistical links between
an individual’s past actions and her private traits. To quantify this
effect, we analyze the evolution of individual privacy loss by studying the
edit history of Wikipedia over 13 years, including more than 117,523
different users performing 188,805,088 edits. We trace each Wikipedia’s
contributor using apparently harmless features, such as the number of edits
performed on predefined broad categories in a given time period (e.g.
Mathematics, Culture or Nature). We show that even at this unspecific level
of behavior description, it is possible to use off-the-shelf machine
learning algorithms to uncover usually undisclosed personal traits, such as
gender, religion or education. We provide empirical evidence that the
prediction accuracy for almost all private traits consistently improves
over time. Surprisingly, the prediction performance for users who stopped
editing after a given time still improves. The activities performed by new
users seem to have contributed more to this effect than additional
activities from existing (but still active) users. Insights from this work
should help users, system designers, and policy makers understand and make
long-term design choices in online content creation systems.


*Dario Taraborelli  *Head of Research, Wikimedia Foundation
wikimediafoundation.org • nitens.org • @readermeter
<http://twitter.com/readermeter>
_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: [hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:[hidden email]?subject=unsubscribe>
Reply | Threaded
Open this post in threaded view
|

Re: [Wikimedia-l] [Wiki-research-l] Research showcase: Evolution of privacy loss in Wikipedia

Aaron Halfaker-3
Reminder, this showcase is starting in 5 minutes.  See the stream here:
https://www.youtube.com/watch?v=Xle0oOFCNnk

Join us on Freenode at #wikimedia-research
<http://webchat.freenode.net/?channels=wikimedia-research> to ask Andrei
questions.

-Aaron

On Tue, Mar 15, 2016 at 12:53 PM, Dario Taraborelli <
[hidden email]> wrote:

> This month, our research showcase
> <https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#March_2016> hosts
> Andrei Rizoiu (Australian National University) to talk about his work
> <http://cm.cecs.anu.edu.au/post/wikiprivacy/> on *how private traits of
> Wikipedia editors can be exposed from public data* (such as edit
> histories) using off-the-shelf machine learning techniques. (abstract below)
>
> If you're interested in learning what the combination of machine learning
> and public data mean for privacy and surveillance, come and join us this *Wednesday
> March 16* at *1pm Pacific Time*.
>
> The event will be recorded and publicly streamed
> <https://www.youtube.com/watch?v=Xle0oOFCNnk>. As usual, we will be
> hosting the conversation with the speaker and Q&A on the
> #wikimedia-research channel on IRC.
>
> Looking forward to seeing you there,
>
> Dario
>
>
> Evolution of Privacy Loss in WikipediaThe cumulative effect of collective
> online participation has an important and adverse impact on individual
> privacy. As an online system evolves over time, new digital traces of
> individual behavior may uncover previously hidden statistical links between
> an individual’s past actions and her private traits. To quantify this
> effect, we analyze the evolution of individual privacy loss by studying
> the edit history of Wikipedia over 13 years, including more than 117,523
> different users performing 188,805,088 edits. We trace each Wikipedia’s
> contributor using apparently harmless features, such as the number of edits
> performed on predefined broad categories in a given time period (e.g.
> Mathematics, Culture or Nature). We show that even at this unspecific level
> of behavior description, it is possible to use off-the-shelf machine
> learning algorithms to uncover usually undisclosed personal traits, such as
> gender, religion or education. We provide empirical evidence that the
> prediction accuracy for almost all private traits consistently improves
> over time. Surprisingly, the prediction performance for users who stopped
> editing after a given time still improves. The activities performed by new
> users seem to have contributed more to this effect than additional
> activities from existing (but still active) users. Insights from this work
> should help users, system designers, and policy makers understand and make
> long-term design choices in online content creation systems.
>
>
> *Dario Taraborelli  *Head of Research, Wikimedia Foundation
> wikimediafoundation.org • nitens.org • @readermeter
> <http://twitter.com/readermeter>
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: [hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:[hidden email]?subject=unsubscribe>
Reply | Threaded
Open this post in threaded view
|

Re: [Wikimedia-l] [Wiki-research-l] Research showcase: Evolution of privacy loss in Wikipedia

SarahSV
Dario and Aaron, thanks for letting us know about this. Is the research
available in writing for people who don't want to sit through the video?

Sarah

On Wed, Mar 16, 2016 at 12:55 PM, Aaron Halfaker <[hidden email]>
wrote:

> Reminder, this showcase is starting in 5 minutes.  See the stream here:
> https://www.youtube.com/watch?v=Xle0oOFCNnk
>
> Join us on Freenode at #wikimedia-research
> <http://webchat.freenode.net/?channels=wikimedia-research> to ask Andrei
> questions.
>
> -Aaron
>
> On Tue, Mar 15, 2016 at 12:53 PM, Dario Taraborelli <
> [hidden email]> wrote:
>
> > This month, our research showcase
> > <https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#March_2016>
> hosts
> > Andrei Rizoiu (Australian National University) to talk about his work
> > <http://cm.cecs.anu.edu.au/post/wikiprivacy/> on *how private traits of
> > Wikipedia editors can be exposed from public data* (such as edit
> > histories) using off-the-shelf machine learning techniques. (abstract
> below)
> >
> > If you're interested in learning what the combination of machine learning
> > and public data mean for privacy and surveillance, come and join us this
> *Wednesday
> > March 16* at *1pm Pacific Time*.
> >
> > The event will be recorded and publicly streamed
> > <https://www.youtube.com/watch?v=Xle0oOFCNnk>. As usual, we will be
> > hosting the conversation with the speaker and Q&A on the
> > #wikimedia-research channel on IRC.
> >
> > Looking forward to seeing you there,
> >
> > Dario
> >
> >
> > Evolution of Privacy Loss in WikipediaThe cumulative effect of collective
> > online participation has an important and adverse impact on individual
> > privacy. As an online system evolves over time, new digital traces of
> > individual behavior may uncover previously hidden statistical links
> between
> > an individual’s past actions and her private traits. To quantify this
> > effect, we analyze the evolution of individual privacy loss by studying
> > the edit history of Wikipedia over 13 years, including more than 117,523
> > different users performing 188,805,088 edits. We trace each Wikipedia’s
> > contributor using apparently harmless features, such as the number of
> edits
> > performed on predefined broad categories in a given time period (e.g.
> > Mathematics, Culture or Nature). We show that even at this unspecific
> level
> > of behavior description, it is possible to use off-the-shelf machine
> > learning algorithms to uncover usually undisclosed personal traits, such
> as
> > gender, religion or education. We provide empirical evidence that the
> > prediction accuracy for almost all private traits consistently improves
> > over time. Surprisingly, the prediction performance for users who stopped
> > editing after a given time still improves. The activities performed by
> new
> > users seem to have contributed more to this effect than additional
> > activities from existing (but still active) users. Insights from this
> work
> > should help users, system designers, and policy makers understand and
> make
> > long-term design choices in online content creation systems.
> >
> >
> > *Dario Taraborelli  *Head of Research, Wikimedia Foundation
> > wikimediafoundation.org • nitens.org • @readermeter
> > <http://twitter.com/readermeter>
> >
> > _______________________________________________
> > Wiki-research-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> >
> _______________________________________________
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> New messages to: [hidden email]
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:[hidden email]?subject=unsubscribe>
>
_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: [hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:[hidden email]?subject=unsubscribe>
Reply | Threaded
Open this post in threaded view
|

Re: [Wikimedia-l] [Wiki-research-l] Research showcase: Evolution of privacy loss in Wikipedia

Dario Taraborelli-3
On Wed, Mar 16, 2016 at 7:53 PM, SarahSV <[hidden email]> wrote:

> Dario and Aaron, thanks for letting us know about this. Is the research
> available in writing for people who don't want to sit through the video?
>
> Sarah
>

Sarah – yes, see http://cm.cecs.anu.edu.au/post/wikiprivacy/

On Wed, Mar 16, 2016 at 12:55 PM, Aaron Halfaker <[hidden email]>

> wrote:
>
> > Reminder, this showcase is starting in 5 minutes.  See the stream here:
> > https://www.youtube.com/watch?v=Xle0oOFCNnk
> >
> > Join us on Freenode at #wikimedia-research
> > <http://webchat.freenode.net/?channels=wikimedia-research> to ask Andrei
> > questions.
> >
> > -Aaron
> >
> > On Tue, Mar 15, 2016 at 12:53 PM, Dario Taraborelli <
> > [hidden email]> wrote:
> >
> > > This month, our research showcase
> > > <https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#March_2016
> >
> > hosts
> > > Andrei Rizoiu (Australian National University) to talk about his work
> > > <http://cm.cecs.anu.edu.au/post/wikiprivacy/> on *how private traits
> of
> > > Wikipedia editors can be exposed from public data* (such as edit
> > > histories) using off-the-shelf machine learning techniques. (abstract
> > below)
> > >
> > > If you're interested in learning what the combination of machine
> learning
> > > and public data mean for privacy and surveillance, come and join us
> this
> > *Wednesday
> > > March 16* at *1pm Pacific Time*.
> > >
> > > The event will be recorded and publicly streamed
> > > <https://www.youtube.com/watch?v=Xle0oOFCNnk>. As usual, we will be
> > > hosting the conversation with the speaker and Q&A on the
> > > #wikimedia-research channel on IRC.
> > >
> > > Looking forward to seeing you there,
> > >
> > > Dario
> > >
> > >
> > > Evolution of Privacy Loss in WikipediaThe cumulative effect of
> collective
> > > online participation has an important and adverse impact on individual
> > > privacy. As an online system evolves over time, new digital traces of
> > > individual behavior may uncover previously hidden statistical links
> > between
> > > an individual’s past actions and her private traits. To quantify this
> > > effect, we analyze the evolution of individual privacy loss by studying
> > > the edit history of Wikipedia over 13 years, including more than
> 117,523
> > > different users performing 188,805,088 edits. We trace each Wikipedia’s
> > > contributor using apparently harmless features, such as the number of
> > edits
> > > performed on predefined broad categories in a given time period (e.g.
> > > Mathematics, Culture or Nature). We show that even at this unspecific
> > level
> > > of behavior description, it is possible to use off-the-shelf machine
> > > learning algorithms to uncover usually undisclosed personal traits,
> such
> > as
> > > gender, religion or education. We provide empirical evidence that the
> > > prediction accuracy for almost all private traits consistently improves
> > > over time. Surprisingly, the prediction performance for users who
> stopped
> > > editing after a given time still improves. The activities performed by
> > new
> > > users seem to have contributed more to this effect than additional
> > > activities from existing (but still active) users. Insights from this
> > work
> > > should help users, system designers, and policy makers understand and
> > make
> > > long-term design choices in online content creation systems.
> > >
> > >
> > > *Dario Taraborelli  *Head of Research, Wikimedia Foundation
> > > wikimediafoundation.org • nitens.org • @readermeter
> > > <http://twitter.com/readermeter>
> > >
> > > _______________________________________________
> > > Wiki-research-l mailing list
> > > [hidden email]
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > >
> > >
> > _______________________________________________
> > Wikimedia-l mailing list, guidelines at:
> > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> > New messages to: [hidden email]
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > <mailto:[hidden email]?subject=unsubscribe>
> >
> _______________________________________________
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> New messages to: [hidden email]
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:[hidden email]?subject=unsubscribe>




--


*Dario Taraborelli  *Head of Research, Wikimedia Foundation
wikimediafoundation.org • nitens.org • @readermeter
<http://twitter.com/readermeter>
_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: [hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:[hidden email]?subject=unsubscribe>