[Input requested] Data Lake Edit release input request

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[Input requested] Data Lake Edit release input request

Leila Zia
In a nutshell:
We are asking for your input to help us learn how to release the
historical edit data of Wikimedia projects in a more efficient way.
Please provide your feedback via
https://docs.google.com/forms/d/e/1FAIpQLScc15eSeFrVvAh_ydpX_1p0v6-WSx2qe3EcBHxIdshSd6omow/viewform
by 2019-09-03.

******
Dear researchers,

The Analytics team at Wikimedia Foundation [1] has been working on
building a data lake [2] for Wikimedia edits [3] to enable the
research and analysis of Wikimedia's edit data in a more efficient
way. This data is a history of activity on Wikimedia projects as
complete and research-friendly as possible. Edits have context, such
as whether they were reverted, in the same line as the edit itself. So
you can focus more on what you want to find out instead of writing
code to wrestle the data. Each line of the data released will include
the following and more (see full specification [3a], [3b], [3c]):
* editor edit count, groups, blocks, bot status, name, current and
historical (time of edit)
* seconds since this editor's last edit
* page context, current and historical (namespace, seconds since last
revision, etc.)
* seconds to identity revert or deletion, if applicable
* revision tags (mobile edit, ve edit, etc.)

The first instance of this data will be released in the coming months
and to make this release as useful as possible for you all, the users
of the data, the team needs to hear your thoughts on how to slice and
dice the data at publishing time. You can provide your input at
https://docs.google.com/forms/d/e/1FAIpQLScc15eSeFrVvAh_ydpX_1p0v6-WSx2qe3EcBHxIdshSd6omow/viewform
.

Please provide your input to this survey no later than 2019-09-03.

Best,
Leila

[1] https://wikitech.wikimedia.org/wiki/Analytics
[2] https://en.wikipedia.org/wiki/Data_lake
[3] https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits
        a) https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_history
        b) https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_user_history
        c) https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_page_history


--
Leila Zia
Principal Research Scientist, Head of Research
Wikimedia Foundation

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: [Input requested] Data Lake Edit release input request

RhinosF1 Wikipedia
Hello,

I've just tried to use the form and got resource unavailable.

RhinosF1
Volunteer
Miraheze

On Tue, 20 Aug 2019 at 22:07, Leila Zia <[hidden email]> wrote:

> In a nutshell:
> We are asking for your input to help us learn how to release the
> historical edit data of Wikimedia projects in a more efficient way.
> Please provide your feedback via
>
> https://docs.google.com/forms/d/e/1FAIpQLScc15eSeFrVvAh_ydpX_1p0v6-WSx2qe3EcBHxIdshSd6omow/viewform
> by 2019-09-03.
>
> ******
> Dear researchers,
>
> The Analytics team at Wikimedia Foundation [1] has been working on
> building a data lake [2] for Wikimedia edits [3] to enable the
> research and analysis of Wikimedia's edit data in a more efficient
> way. This data is a history of activity on Wikimedia projects as
> complete and research-friendly as possible. Edits have context, such
> as whether they were reverted, in the same line as the edit itself. So
> you can focus more on what you want to find out instead of writing
> code to wrestle the data. Each line of the data released will include
> the following and more (see full specification [3a], [3b], [3c]):
> * editor edit count, groups, blocks, bot status, name, current and
> historical (time of edit)
> * seconds since this editor's last edit
> * page context, current and historical (namespace, seconds since last
> revision, etc.)
> * seconds to identity revert or deletion, if applicable
> * revision tags (mobile edit, ve edit, etc.)
>
> The first instance of this data will be released in the coming months
> and to make this release as useful as possible for you all, the users
> of the data, the team needs to hear your thoughts on how to slice and
> dice the data at publishing time. You can provide your input at
>
> https://docs.google.com/forms/d/e/1FAIpQLScc15eSeFrVvAh_ydpX_1p0v6-WSx2qe3EcBHxIdshSd6omow/viewform
> .
>
> Please provide your input to this survey no later than 2019-09-03.
>
> Best,
> Leila
>
> [1] https://wikitech.wikimedia.org/wiki/Analytics
> [2] https://en.wikipedia.org/wiki/Data_lake
> [3] https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits
>         a)
> https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_history
>         b)
> https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_user_history
>         c)
> https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_page_history
>
>
> --
> Leila Zia
> Principal Research Scientist, Head of Research
> Wikimedia Foundation
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: [Input requested] Data Lake Edit release input request

Leila Zia
I'm sorry. This is fixed now. Try again:
https://docs.google.com/forms/d/e/1FAIpQLScc15eSeFrVvAh_ydpX_1p0v6-WSx2qe3EcBHxIdshSd6omow/viewform


On Tue, Aug 20, 2019 at 2:22 PM RhinosF1 <[hidden email]> wrote:

>
> Hello,
>
> I've just tried to use the form and got resource unavailable.
>
> RhinosF1
> Volunteer
> Miraheze
>
> On Tue, 20 Aug 2019 at 22:07, Leila Zia <[hidden email]> wrote:
>>
>> In a nutshell:
>> We are asking for your input to help us learn how to release the
>> historical edit data of Wikimedia projects in a more efficient way.
>> Please provide your feedback via
>> https://docs.google.com/forms/d/e/1FAIpQLScc15eSeFrVvAh_ydpX_1p0v6-WSx2qe3EcBHxIdshSd6omow/viewform
>> by 2019-09-03.
>>
>> ******
>> Dear researchers,
>>
>> The Analytics team at Wikimedia Foundation [1] has been working on
>> building a data lake [2] for Wikimedia edits [3] to enable the
>> research and analysis of Wikimedia's edit data in a more efficient
>> way. This data is a history of activity on Wikimedia projects as
>> complete and research-friendly as possible. Edits have context, such
>> as whether they were reverted, in the same line as the edit itself. So
>> you can focus more on what you want to find out instead of writing
>> code to wrestle the data. Each line of the data released will include
>> the following and more (see full specification [3a], [3b], [3c]):
>> * editor edit count, groups, blocks, bot status, name, current and
>> historical (time of edit)
>> * seconds since this editor's last edit
>> * page context, current and historical (namespace, seconds since last
>> revision, etc.)
>> * seconds to identity revert or deletion, if applicable
>> * revision tags (mobile edit, ve edit, etc.)
>>
>> The first instance of this data will be released in the coming months
>> and to make this release as useful as possible for you all, the users
>> of the data, the team needs to hear your thoughts on how to slice and
>> dice the data at publishing time. You can provide your input at
>> https://docs.google.com/forms/d/e/1FAIpQLScc15eSeFrVvAh_ydpX_1p0v6-WSx2qe3EcBHxIdshSd6omow/viewform
>> .
>>
>> Please provide your input to this survey no later than 2019-09-03.
>>
>> Best,
>> Leila
>>
>> [1] https://wikitech.wikimedia.org/wiki/Analytics
>> [2] https://en.wikipedia.org/wiki/Data_lake
>> [3] https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits
>>         a) https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_history
>>         b) https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_user_history
>>         c) https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_page_history
>>
>>
>> --
>> Leila Zia
>> Principal Research Scientist, Head of Research
>> Wikimedia Foundation
>>
>> _______________________________________________
>> Wiki-research-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: [Input requested] Data Lake Edit release input request

Leila Zia
In reply to this post by Leila Zia
A friendly reminder that you have time until 2019-09-03 to let us know
your wishes/constraints for the new data-release discussed below.
(Thanks to those of you who have already responded.)


On Tue, Aug 20, 2019 at 2:06 PM Leila Zia <[hidden email]> wrote:

>
> In a nutshell:
> We are asking for your input to help us learn how to release the
> historical edit data of Wikimedia projects in a more efficient way.
> Please provide your feedback via
> https://docs.google.com/forms/d/e/1FAIpQLScc15eSeFrVvAh_ydpX_1p0v6-WSx2qe3EcBHxIdshSd6omow/viewform
> by 2019-09-03.
>
> ******
> Dear researchers,
>
> The Analytics team at Wikimedia Foundation [1] has been working on
> building a data lake [2] for Wikimedia edits [3] to enable the
> research and analysis of Wikimedia's edit data in a more efficient
> way. This data is a history of activity on Wikimedia projects as
> complete and research-friendly as possible. Edits have context, such
> as whether they were reverted, in the same line as the edit itself. So
> you can focus more on what you want to find out instead of writing
> code to wrestle the data. Each line of the data released will include
> the following and more (see full specification [3a], [3b], [3c]):
> * editor edit count, groups, blocks, bot status, name, current and
> historical (time of edit)
> * seconds since this editor's last edit
> * page context, current and historical (namespace, seconds since last
> revision, etc.)
> * seconds to identity revert or deletion, if applicable
> * revision tags (mobile edit, ve edit, etc.)
>
> The first instance of this data will be released in the coming months
> and to make this release as useful as possible for you all, the users
> of the data, the team needs to hear your thoughts on how to slice and
> dice the data at publishing time. You can provide your input at
> https://docs.google.com/forms/d/e/1FAIpQLScc15eSeFrVvAh_ydpX_1p0v6-WSx2qe3EcBHxIdshSd6omow/viewform
> .
>
> Please provide your input to this survey no later than 2019-09-03.
>
> Best,
> Leila
>
> [1] https://wikitech.wikimedia.org/wiki/Analytics
> [2] https://en.wikipedia.org/wiki/Data_lake
> [3] https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits
>         a) https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_history
>         b) https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_user_history
>         c) https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_page_history
>
>
> --
> Leila Zia
> Principal Research Scientist, Head of Research
> Wikimedia Foundation

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l