question about Pageviews dumps

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

question about Pageviews dumps

Marc Miquel

Hello, 

I have a question for you regarding pageviews datadumps. 

I am considering to study reader engagement for different article topics in different languages. Because of this, I would like to know if there is any plan to make available pageviews dumps detailing activity log at session level per user - in a similar way to editor sessions.

Since this would be for a research project I might ask funding for it, I would like to know if I could count on that, what is the nature of the available data, and what would be the procedure to obtain this data and if there would be any implication because of privacy concerns.

Thank you very much!

Best,

Marc Miquel


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: question about Pageviews dumps

Leila Zia
+ Analytics


On Tue, Jun 28, 2016 at 6:36 AM, Marc Miquel <[hidden email]> wrote:

Hello, 

I have a question for you regarding pageviews datadumps. 

I am considering to study reader engagement for different article topics in different languages. Because of this, I would like to know if there is any plan to make available pageviews dumps detailing activity log at session level per user - in a similar way to editor sessions.

Since this would be for a research project I might ask funding for it, I would like to know if I could count on that, what is the nature of the available data, and what would be the procedure to obtain this data and if there would be any implication because of privacy concerns.

Thank you very much!

Best,

Marc Miquel


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: [Analytics] question about Pageviews dumps

Marc Miquel
Hello!

I was thinking about user sessions, yes, so this would mean to aggregate pageviews visited by a user during a short amount of time (I should check the cutoff, but it could be around an hour or less). 

I am particularly interested in understanding the order in which pages are seen (start, end), duration, etc. 
I wouldn't need data from a long period neither, but I think data from multiple languages would be helpful.

I imagined reader data could be sensitive to privacy, but would an NDA with my university and some sort of data encoding help with this? As I said, it is for a scientific purpose. 

Thanks,

Marc

El dt., 28 juny 2016 a les 21:09, Nuria Ruiz (<[hidden email]>) va escriure:

Hello!

>I am considering to study reader engagement for different article topics in different languages. Because of this, I would like to know if there is >any plan to make available pageviews dumps detailing activity log at session level per user - in a similar way to editor sessions.

Are you thinking of "all-pageviews-visited-by-a-certain-user"? If so, no we do not have any projects to provide that data as due to privacy concerns we neither have nor keep that information.

Thanks, 

Nuria



On Tue, Jun 28, 2016 at 6:55 PM, Leila Zia <[hidden email]> wrote:
+ Analytics


On Tue, Jun 28, 2016 at 6:36 AM, Marc Miquel <[hidden email]> wrote:

Hello, 

I have a question for you regarding pageviews datadumps. 

I am considering to study reader engagement for different article topics in different languages. Because of this, I would like to know if there is any plan to make available pageviews dumps detailing activity log at session level per user - in a similar way to editor sessions.

Since this would be for a research project I might ask funding for it, I would like to know if I could count on that, what is the nature of the available data, and what would be the procedure to obtain this data and if there would be any implication because of privacy concerns.

Thank you very much!

Best,

Marc Miquel


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



_______________________________________________
Analytics mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/analytics


_______________________________________________
Analytics mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/analytics

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: [Analytics] question about Pageviews dumps

Oliver Keyes-5
If historic data is okay, there's already a dataset released (https://figshare.com/articles/Activity_Sessions_datasets/1291033) that was designed specifically to answer questions around how to best calculate session length with regards to Wikipedia (http://arxiv.org/abs/1411.2878)

On Tue, Jun 28, 2016 at 3:42 PM, Marc Miquel <[hidden email]> wrote:
Hello!

I was thinking about user sessions, yes, so this would mean to aggregate pageviews visited by a user during a short amount of time (I should check the cutoff, but it could be around an hour or less). 

I am particularly interested in understanding the order in which pages are seen (start, end), duration, etc. 
I wouldn't need data from a long period neither, but I think data from multiple languages would be helpful.

I imagined reader data could be sensitive to privacy, but would an NDA with my university and some sort of data encoding help with this? As I said, it is for a scientific purpose. 

Thanks,

Marc

El dt., 28 juny 2016 a les 21:09, Nuria Ruiz (<[hidden email]>) va escriure:

Hello!

>I am considering to study reader engagement for different article topics in different languages. Because of this, I would like to know if there is >any plan to make available pageviews dumps detailing activity log at session level per user - in a similar way to editor sessions.

Are you thinking of "all-pageviews-visited-by-a-certain-user"? If so, no we do not have any projects to provide that data as due to privacy concerns we neither have nor keep that information.

Thanks, 

Nuria



On Tue, Jun 28, 2016 at 6:55 PM, Leila Zia <[hidden email]> wrote:
+ Analytics


On Tue, Jun 28, 2016 at 6:36 AM, Marc Miquel <[hidden email]> wrote:

Hello, 

I have a question for you regarding pageviews datadumps. 

I am considering to study reader engagement for different article topics in different languages. Because of this, I would like to know if there is any plan to make available pageviews dumps detailing activity log at session level per user - in a similar way to editor sessions.

Since this would be for a research project I might ask funding for it, I would like to know if I could count on that, what is the nature of the available data, and what would be the procedure to obtain this data and if there would be any implication because of privacy concerns.

Thank you very much!

Best,

Marc Miquel


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



_______________________________________________
Analytics mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/analytics


_______________________________________________
Analytics mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/analytics

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: [Analytics] question about Pageviews dumps

Marc Miquel

Thanks for the answer, Oliver. But I am not sure it answers my questions. I'd like to study aspects like how much time is spent in certain pages, as a proxy of how content is approached/read/understood. I'd be happy with time of entering the page, time of leaving. This is not entirely centered on 'user activity', but I said that because I imagined data would be stored in a similar way to editor sessions, or in a database and I would need to do the time calculations.

Cheers,

Marc


El dc., 29 juny, 2016 03:11, Oliver Keyes <[hidden email]> va escriure:
If historic data is okay, there's already a dataset released (https://figshare.com/articles/Activity_Sessions_datasets/1291033) that was designed specifically to answer questions around how to best calculate session length with regards to Wikipedia (http://arxiv.org/abs/1411.2878)

On Tue, Jun 28, 2016 at 3:42 PM, Marc Miquel <[hidden email]> wrote:
Hello!

I was thinking about user sessions, yes, so this would mean to aggregate pageviews visited by a user during a short amount of time (I should check the cutoff, but it could be around an hour or less). 

I am particularly interested in understanding the order in which pages are seen (start, end), duration, etc. 
I wouldn't need data from a long period neither, but I think data from multiple languages would be helpful.

I imagined reader data could be sensitive to privacy, but would an NDA with my university and some sort of data encoding help with this? As I said, it is for a scientific purpose. 

Thanks,

Marc

El dt., 28 juny 2016 a les 21:09, Nuria Ruiz (<[hidden email]>) va escriure:

Hello!

>I am considering to study reader engagement for different article topics in different languages. Because of this, I would like to know if there is >any plan to make available pageviews dumps detailing activity log at session level per user - in a similar way to editor sessions.

Are you thinking of "all-pageviews-visited-by-a-certain-user"? If so, no we do not have any projects to provide that data as due to privacy concerns we neither have nor keep that information.

Thanks, 

Nuria



On Tue, Jun 28, 2016 at 6:55 PM, Leila Zia <[hidden email]> wrote:
+ Analytics


On Tue, Jun 28, 2016 at 6:36 AM, Marc Miquel <[hidden email]> wrote:

Hello, 

I have a question for you regarding pageviews datadumps. 

I am considering to study reader engagement for different article topics in different languages. Because of this, I would like to know if there is any plan to make available pageviews dumps detailing activity log at session level per user - in a similar way to editor sessions.

Since this would be for a research project I might ask funding for it, I would like to know if I could count on that, what is the nature of the available data, and what would be the procedure to obtain this data and if there would be any implication because of privacy concerns.

Thank you very much!

Best,

Marc Miquel


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



_______________________________________________
Analytics mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/analytics


_______________________________________________
Analytics mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/analytics

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: [Analytics] question about Pageviews dumps

Marc Miquel
Hi Joseph. Perhaps these approximations could already provide me valuable information. If it is possible to distinguish between mobile and pc visits, then I could filter the mobile and keep the more reliable pc-based data.

This is all I wanted to know by now to prepare my project. In case I need to progress with it, I will contact you. Thank you very much for the answer. 

Cheers,
Marc



El dc., 29 juny 2016 a les 10:24, Joseph Allemandou (<[hidden email]>) va escriure:
Hi Marc,

The information you're after is not available in the data we collect, for at least two reasons
  • We don't collect data allowing to detect user sessions (no id-cookie or identifier)
  • We don't collect time spent on page
Approximations could be made using finger-printing techniques as a proxy for sessions (with an important error on mobile due to ip-pooling), and successive events as boundaries for time spent on page.
These approximations would in any case need an NDA.

Cheers
Joseph

On Wed, Jun 29, 2016 at 9:16 AM, Marc Miquel <[hidden email]> wrote:

Thanks for the answer, Oliver. But I am not sure it answers my questions. I'd like to study aspects like how much time is spent in certain pages, as a proxy of how content is approached/read/understood. I'd be happy with time of entering the page, time of leaving. This is not entirely centered on 'user activity', but I said that because I imagined data would be stored in a similar way to editor sessions, or in a database and I would need to do the time calculations.

Cheers,

Marc


El dc., 29 juny, 2016 03:11, Oliver Keyes <[hidden email]> va escriure:
If historic data is okay, there's already a dataset released (https://figshare.com/articles/Activity_Sessions_datasets/1291033) that was designed specifically to answer questions around how to best calculate session length with regards to Wikipedia (http://arxiv.org/abs/1411.2878)

On Tue, Jun 28, 2016 at 3:42 PM, Marc Miquel <[hidden email]> wrote:
Hello!

I was thinking about user sessions, yes, so this would mean to aggregate pageviews visited by a user during a short amount of time (I should check the cutoff, but it could be around an hour or less). 

I am particularly interested in understanding the order in which pages are seen (start, end), duration, etc. 
I wouldn't need data from a long period neither, but I think data from multiple languages would be helpful.

I imagined reader data could be sensitive to privacy, but would an NDA with my university and some sort of data encoding help with this? As I said, it is for a scientific purpose. 

Thanks,

Marc

El dt., 28 juny 2016 a les 21:09, Nuria Ruiz (<[hidden email]>) va escriure:

Hello!

>I am considering to study reader engagement for different article topics in different languages. Because of this, I would like to know if there is >any plan to make available pageviews dumps detailing activity log at session level per user - in a similar way to editor sessions.

Are you thinking of "all-pageviews-visited-by-a-certain-user"? If so, no we do not have any projects to provide that data as due to privacy concerns we neither have nor keep that information.

Thanks, 

Nuria



On Tue, Jun 28, 2016 at 6:55 PM, Leila Zia <[hidden email]> wrote:
+ Analytics


On Tue, Jun 28, 2016 at 6:36 AM, Marc Miquel <[hidden email]> wrote:

Hello, 

I have a question for you regarding pageviews datadumps. 

I am considering to study reader engagement for different article topics in different languages. Because of this, I would like to know if there is any plan to make available pageviews dumps detailing activity log at session level per user - in a similar way to editor sessions.

Since this would be for a research project I might ask funding for it, I would like to know if I could count on that, what is the nature of the available data, and what would be the procedure to obtain this data and if there would be any implication because of privacy concerns.

Thank you very much!

Best,

Marc Miquel


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



_______________________________________________
Analytics mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/analytics


_______________________________________________
Analytics mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/analytics

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

_______________________________________________
Analytics mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/analytics




--
Joseph Allemandou
Data Engineer @ Wikimedia Foundation
IRC: joal
_______________________________________________
Analytics mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/analytics

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: [Analytics] question about Pageviews dumps

Oliver Keyes-5
In reply to this post by Marc Miquel
Aye, as Joseph says, the time-on-page or time-leaving is not collected, except as an extension of session reconstruction work. If you want a concrete time, you're not gonna get it.

While PC-based data is more reliable than mobile, that does not necessarily mean "reliable". I'm sort of confused, I guess, as to why the datasets I linked (unless I'm misremembering them?) don't help: you would have to do the calculation yourself but they should contain all the data necessary to make that calculation (unless you want to have the pageID or title associated with the time-on-page, in which case...yeah, that's an issue).

On Wed, Jun 29, 2016 at 3:16 AM, Marc Miquel <[hidden email]> wrote:

Thanks for the answer, Oliver. But I am not sure it answers my questions. I'd like to study aspects like how much time is spent in certain pages, as a proxy of how content is approached/read/understood. I'd be happy with time of entering the page, time of leaving. This is not entirely centered on 'user activity', but I said that because I imagined data would be stored in a similar way to editor sessions, or in a database and I would need to do the time calculations.

Cheers,

Marc


El dc., 29 juny, 2016 03:11, Oliver Keyes <[hidden email]> va escriure:
If historic data is okay, there's already a dataset released (https://figshare.com/articles/Activity_Sessions_datasets/1291033) that was designed specifically to answer questions around how to best calculate session length with regards to Wikipedia (http://arxiv.org/abs/1411.2878)

On Tue, Jun 28, 2016 at 3:42 PM, Marc Miquel <[hidden email]> wrote:
Hello!

I was thinking about user sessions, yes, so this would mean to aggregate pageviews visited by a user during a short amount of time (I should check the cutoff, but it could be around an hour or less). 

I am particularly interested in understanding the order in which pages are seen (start, end), duration, etc. 
I wouldn't need data from a long period neither, but I think data from multiple languages would be helpful.

I imagined reader data could be sensitive to privacy, but would an NDA with my university and some sort of data encoding help with this? As I said, it is for a scientific purpose. 

Thanks,

Marc

El dt., 28 juny 2016 a les 21:09, Nuria Ruiz (<[hidden email]>) va escriure:

Hello!

>I am considering to study reader engagement for different article topics in different languages. Because of this, I would like to know if there is >any plan to make available pageviews dumps detailing activity log at session level per user - in a similar way to editor sessions.

Are you thinking of "all-pageviews-visited-by-a-certain-user"? If so, no we do not have any projects to provide that data as due to privacy concerns we neither have nor keep that information.

Thanks, 

Nuria



On Tue, Jun 28, 2016 at 6:55 PM, Leila Zia <[hidden email]> wrote:
+ Analytics


On Tue, Jun 28, 2016 at 6:36 AM, Marc Miquel <[hidden email]> wrote:

Hello, 

I have a question for you regarding pageviews datadumps. 

I am considering to study reader engagement for different article topics in different languages. Because of this, I would like to know if there is any plan to make available pageviews dumps detailing activity log at session level per user - in a similar way to editor sessions.

Since this would be for a research project I might ask funding for it, I would like to know if I could count on that, what is the nature of the available data, and what would be the procedure to obtain this data and if there would be any implication because of privacy concerns.

Thank you very much!

Best,

Marc Miquel


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



_______________________________________________
Analytics mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/analytics


_______________________________________________
Analytics mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/analytics

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: [Analytics] question about Pageviews dumps

Marc Miquel
Yes, the whole thing is about page_title or page_ids. I think Wikipedia as a project provides very different types of information and it would be interesting to see how they are actually read, checked, etc. Likewise, I would need to see variations in different language editions. But not something large-scale or for long periods,...this is why a few days sample would be valuable.

Anyway, thanks for the datasets link, Oliver. 

Marc

El dc., 29 juny 2016 a les 13:58, Oliver Keyes (<[hidden email]>) va escriure:
Aye, as Joseph says, the time-on-page or time-leaving is not collected, except as an extension of session reconstruction work. If you want a concrete time, you're not gonna get it.

While PC-based data is more reliable than mobile, that does not necessarily mean "reliable". I'm sort of confused, I guess, as to why the datasets I linked (unless I'm misremembering them?) don't help: you would have to do the calculation yourself but they should contain all the data necessary to make that calculation (unless you want to have the pageID or title associated with the time-on-page, in which case...yeah, that's an issue).

On Wed, Jun 29, 2016 at 3:16 AM, Marc Miquel <[hidden email]> wrote:

Thanks for the answer, Oliver. But I am not sure it answers my questions. I'd like to study aspects like how much time is spent in certain pages, as a proxy of how content is approached/read/understood. I'd be happy with time of entering the page, time of leaving. This is not entirely centered on 'user activity', but I said that because I imagined data would be stored in a similar way to editor sessions, or in a database and I would need to do the time calculations.

Cheers,

Marc


El dc., 29 juny, 2016 03:11, Oliver Keyes <[hidden email]> va escriure:
If historic data is okay, there's already a dataset released (https://figshare.com/articles/Activity_Sessions_datasets/1291033) that was designed specifically to answer questions around how to best calculate session length with regards to Wikipedia (http://arxiv.org/abs/1411.2878)

On Tue, Jun 28, 2016 at 3:42 PM, Marc Miquel <[hidden email]> wrote:
Hello!

I was thinking about user sessions, yes, so this would mean to aggregate pageviews visited by a user during a short amount of time (I should check the cutoff, but it could be around an hour or less). 

I am particularly interested in understanding the order in which pages are seen (start, end), duration, etc. 
I wouldn't need data from a long period neither, but I think data from multiple languages would be helpful.

I imagined reader data could be sensitive to privacy, but would an NDA with my university and some sort of data encoding help with this? As I said, it is for a scientific purpose. 

Thanks,

Marc

El dt., 28 juny 2016 a les 21:09, Nuria Ruiz (<[hidden email]>) va escriure:

Hello!

>I am considering to study reader engagement for different article topics in different languages. Because of this, I would like to know if there is >any plan to make available pageviews dumps detailing activity log at session level per user - in a similar way to editor sessions.

Are you thinking of "all-pageviews-visited-by-a-certain-user"? If so, no we do not have any projects to provide that data as due to privacy concerns we neither have nor keep that information.

Thanks, 

Nuria



On Tue, Jun 28, 2016 at 6:55 PM, Leila Zia <[hidden email]> wrote:
+ Analytics


On Tue, Jun 28, 2016 at 6:36 AM, Marc Miquel <[hidden email]> wrote:

Hello, 

I have a question for you regarding pageviews datadumps. 

I am considering to study reader engagement for different article topics in different languages. Because of this, I would like to know if there is any plan to make available pageviews dumps detailing activity log at session level per user - in a similar way to editor sessions.

Since this would be for a research project I might ask funding for it, I would like to know if I could count on that, what is the nature of the available data, and what would be the procedure to obtain this data and if there would be any implication because of privacy concerns.

Thank you very much!

Best,

Marc Miquel


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



_______________________________________________
Analytics mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/analytics


_______________________________________________
Analytics mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/analytics

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: question about Pageviews dumps

Leila Zia
In reply to this post by Marc Miquel
Hi Marc,

On Tue, Jun 28, 2016 at 6:36 AM, Marc Miquel <[hidden email]> wrote:
 
Since this would be for a research project I might ask funding for it, I would like to know if I could count on that, what is the nature of the available data, and what would be the procedure to obtain this data and if there would be any implication because of privacy concerns.

​We grant access to webrequest log data and the non-public derivatives of it not very frequently. When we do, we do it through creating formal collaborations with the researchers. What these collaborations are and how we set them up are explained at https://www.mediawiki.org/wiki/Wikimedia_Research/Formal_collaborations.

To provide more context:

Requiring formal collaborations as a necessary step for accessing the data means that we cannot scale rapidly, i.e, each researcher on our team is only able to be involved in so many of them. The practical cap is somewhere around 3 collaborations per researcher in my experience. We understand that this is a problem as we would like more researchers to work with this data. We reconsider ways for expanding our capacity to collaborate frequently. We also always consider releasing more data-sets publicly since ultimately, that's one of the best ways for us to empower others do what they want to work on and find value in.

Best,
Leila
 

Thank you very much!

Best,

Marc Miquel


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: [Analytics] question about Pageviews dumps

Marc Miquel
Hi,

It is useful to know there would be a way using the Eventlog. Likewise, I totally understand using mediawiki to this purpose would require a formal collaboration. 

Since this is not an immediate project (I am asking funding for it at the moment), there would be time to arrange it and find the best way for both technical and formal parts.

By now I have the information I need. Thank you everyone.

Best,

Marc



El dv., 1 jul. 2016 a les 19:07, Leila Zia (<[hidden email]>) va escriure:
Hi Marc,

On Tue, Jun 28, 2016 at 6:36 AM, Marc Miquel <[hidden email]> wrote:
 
Since this would be for a research project I might ask funding for it, I would like to know if I could count on that, what is the nature of the available data, and what would be the procedure to obtain this data and if there would be any implication because of privacy concerns.

​We grant access to webrequest log data and the non-public derivatives of it not very frequently. When we do, we do it through creating formal collaborations with the researchers. What these collaborations are and how we set them up are explained at https://www.mediawiki.org/wiki/Wikimedia_Research/Formal_collaborations.

To provide more context:

Requiring formal collaborations as a necessary step for accessing the data means that we cannot scale rapidly, i.e, each researcher on our team is only able to be involved in so many of them. The practical cap is somewhere around 3 collaborations per researcher in my experience. We understand that this is a problem as we would like more researchers to work with this data. We reconsider ways for expanding our capacity to collaborate frequently. We also always consider releasing more data-sets publicly since ultimately, that's one of the best ways for us to empower others do what they want to work on and find value in.

Best,
Leila
 

Thank you very much!

Best,

Marc Miquel


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

_______________________________________________
Analytics mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/analytics

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l