Wikipedia search logs needed

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Wikipedia search logs needed

Simon Givoli
Thanks Oliver,


Sorry if I wasn't clear enough.
My dissertation will involve consented participants. Their search logs will be recorded while searching Wikipedia. The search logs will then be analyzed in order to find recurrent search patterns across participants.
Before beginning the experiment, I want to check that I can indeed find patterns in search logs, using several different algorithms. The idea is to check these algorithms on Wikipedia search logs already available. Hence my request.

Simon
Message: 5
Date: Fri, 3 Apr 2015 09:37:33 +0300
From: Simon Givoli <[hidden email]>
To: [hidden email]
Subject: [Wiki-research-l] Wikipedia search logs needed
Message-ID:
        <CAN=[hidden email]>
Content-Type: text/plain; charset="utf-8"

Hi,

I'm looking for a dump or db of Wikipedia users search logs. I would like
it to be with recent data, but it doesn't have to be extensive, even a
small sample size would be sufficient. I aim to use this db to test a new
research tool I'm developing for my dissertation.

Can anyone point me to a relevant source?

Thanks'
Simon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/wiki-research-l/attachments/20150403/e51fd17c/attachment-0001.html>

------------------------------

Message: 6
Date: Fri, 3 Apr 2015 02:47:54 -0400
From: Oliver Keyes <[hidden email]>
To: Research into Wikimedia content and communities
        <[hidden email]>
Subject: Re: [Wiki-research-l] Wikipedia search logs needed
Message-ID:
        <[hidden email]>
Content-Type: text/plain; charset=UTF-8

Simon,

We really don't release search logs at the moment, very deliberately -
they're incredibly difficult to sanitise well. Could you be more
precise about what you're looking to investigate/study and what you'd
need than "a new research tool"?

On 3 April 2015 at 02:37, Simon Givoli <[hidden email]> wrote:
> Hi,
>
> I'm looking for a dump or db of Wikipedia users search logs. I would like it
> to be with recent data, but it doesn't have to be extensive, even a small
> sample size would be sufficient. I aim to use this db to test a new research
> tool I'm developing for my dissertation.
>
> Can anyone point me to a relevant source?
>
> Thanks'
> Simon
>
>
> _______________________________________________
> Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>



--
Oliver Keyes
Research Analyst
Wikimedia Foundation


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Wikipedia search logs needed

Pine W

Hi Oliver,

Do we even record search logs? It might be a good idea if we didn't.

Pine

On Apr 3, 2015 6:16 AM, "Simon Givoli" <[hidden email]> wrote:
Thanks Oliver,


Sorry if I wasn't clear enough.
My dissertation will involve consented participants. Their search logs will be recorded while searching Wikipedia. The search logs will then be analyzed in order to find recurrent search patterns across participants.
Before beginning the experiment, I want to check that I can indeed find patterns in search logs, using several different algorithms. The idea is to check these algorithms on Wikipedia search logs already available. Hence my request.

Simon
Message: 5
Date: Fri, 3 Apr 2015 09:37:33 +0300
From: Simon Givoli <[hidden email]>
To: [hidden email]
Subject: [Wiki-research-l] Wikipedia search logs needed
Message-ID:
        <CAN=[hidden email]>
Content-Type: text/plain; charset="utf-8"

Hi,

I'm looking for a dump or db of Wikipedia users search logs. I would like
it to be with recent data, but it doesn't have to be extensive, even a
small sample size would be sufficient. I aim to use this db to test a new
research tool I'm developing for my dissertation.

Can anyone point me to a relevant source?

Thanks'
Simon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/wiki-research-l/attachments/20150403/e51fd17c/attachment-0001.html>

------------------------------

Message: 6
Date: Fri, 3 Apr 2015 02:47:54 -0400
From: Oliver Keyes <[hidden email]>
To: Research into Wikimedia content and communities
        <[hidden email]>
Subject: Re: [Wiki-research-l] Wikipedia search logs needed
Message-ID:
        <[hidden email]>
Content-Type: text/plain; charset=UTF-8

Simon,

We really don't release search logs at the moment, very deliberately -
they're incredibly difficult to sanitise well. Could you be more
precise about what you're looking to investigate/study and what you'd
need than "a new research tool"?

On 3 April 2015 at 02:37, Simon Givoli <[hidden email]> wrote:
> Hi,
>
> I'm looking for a dump or db of Wikipedia users search logs. I would like it
> to be with recent data, but it doesn't have to be extensive, even a small
> sample size would be sufficient. I aim to use this db to test a new research
> tool I'm developing for my dissertation.
>
> Can anyone point me to a relevant source?
>
> Thanks'
> Simon
>
>
> _______________________________________________
> Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>



--
Oliver Keyes
Research Analyst
Wikimedia Foundation


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Wikipedia search logs needed

Valerio Schiavoni
There has been at least one attempt to release such data:


Maybe someone managed to grab those logs before they took them offline.

Similar but older logs are available here:

best,
valerio


On Fri, Apr 3, 2015 at 4:09 PM, Pine W <[hidden email]> wrote:

Hi Oliver,

Do we even record search logs? It might be a good idea if we didn't.

Pine

On Apr 3, 2015 6:16 AM, "Simon Givoli" <[hidden email]> wrote:
Thanks Oliver,


Sorry if I wasn't clear enough.
My dissertation will involve consented participants. Their search logs will be recorded while searching Wikipedia. The search logs will then be analyzed in order to find recurrent search patterns across participants.
Before beginning the experiment, I want to check that I can indeed find patterns in search logs, using several different algorithms. The idea is to check these algorithms on Wikipedia search logs already available. Hence my request.

Simon
Message: 5
Date: Fri, 3 Apr 2015 09:37:33 +0300
From: Simon Givoli <[hidden email]>
To: [hidden email]
Subject: [Wiki-research-l] Wikipedia search logs needed
Message-ID:
        <CAN=[hidden email]>
Content-Type: text/plain; charset="utf-8"

Hi,

I'm looking for a dump or db of Wikipedia users search logs. I would like
it to be with recent data, but it doesn't have to be extensive, even a
small sample size would be sufficient. I aim to use this db to test a new
research tool I'm developing for my dissertation.

Can anyone point me to a relevant source?

Thanks'
Simon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/wiki-research-l/attachments/20150403/e51fd17c/attachment-0001.html>

------------------------------

Message: 6
Date: Fri, 3 Apr 2015 02:47:54 -0400
From: Oliver Keyes <[hidden email]>
To: Research into Wikimedia content and communities
        <[hidden email]>
Subject: Re: [Wiki-research-l] Wikipedia search logs needed
Message-ID:
        <[hidden email]>
Content-Type: text/plain; charset=UTF-8

Simon,

We really don't release search logs at the moment, very deliberately -
they're incredibly difficult to sanitise well. Could you be more
precise about what you're looking to investigate/study and what you'd
need than "a new research tool"?

On 3 April 2015 at 02:37, Simon Givoli <[hidden email]> wrote:
> Hi,
>
> I'm looking for a dump or db of Wikipedia users search logs. I would like it
> to be with recent data, but it doesn't have to be extensive, even a small
> sample size would be sufficient. I aim to use this db to test a new research
> tool I'm developing for my dissertation.
>
> Can anyone point me to a relevant source?
>
> Thanks'
> Simon
>
>
> _______________________________________________
> Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>



--
Oliver Keyes
Research Analyst
Wikimedia Foundation


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Wikipedia search logs needed

Aaron Halfaker-2
Maybe someone managed to grab those logs before they took them offline.

If they did, I hope they won't share.  They were taken offline due to privacy issues.

From the blog post: "We’ve temporarily taken down this data to make additional improvements to the anonymization protocol related to the search queries."

-Aaron

On Fri, Apr 3, 2015 at 9:32 AM, Valerio Schiavoni <[hidden email]> wrote:
There has been at least one attempt to release such data:


Maybe someone managed to grab those logs before they took them offline.

Similar but older logs are available here:

best,
valerio


On Fri, Apr 3, 2015 at 4:09 PM, Pine W <[hidden email]> wrote:

Hi Oliver,

Do we even record search logs? It might be a good idea if we didn't.

Pine

On Apr 3, 2015 6:16 AM, "Simon Givoli" <[hidden email]> wrote:
Thanks Oliver,


Sorry if I wasn't clear enough.
My dissertation will involve consented participants. Their search logs will be recorded while searching Wikipedia. The search logs will then be analyzed in order to find recurrent search patterns across participants.
Before beginning the experiment, I want to check that I can indeed find patterns in search logs, using several different algorithms. The idea is to check these algorithms on Wikipedia search logs already available. Hence my request.

Simon
Message: 5
Date: Fri, 3 Apr 2015 09:37:33 +0300
From: Simon Givoli <[hidden email]>
To: [hidden email]
Subject: [Wiki-research-l] Wikipedia search logs needed
Message-ID:
        <CAN=[hidden email]>
Content-Type: text/plain; charset="utf-8"

Hi,

I'm looking for a dump or db of Wikipedia users search logs. I would like
it to be with recent data, but it doesn't have to be extensive, even a
small sample size would be sufficient. I aim to use this db to test a new
research tool I'm developing for my dissertation.

Can anyone point me to a relevant source?

Thanks'
Simon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/wiki-research-l/attachments/20150403/e51fd17c/attachment-0001.html>

------------------------------

Message: 6
Date: Fri, 3 Apr 2015 02:47:54 -0400
From: Oliver Keyes <[hidden email]>
To: Research into Wikimedia content and communities
        <[hidden email]>
Subject: Re: [Wiki-research-l] Wikipedia search logs needed
Message-ID:
        <[hidden email]>
Content-Type: text/plain; charset=UTF-8

Simon,

We really don't release search logs at the moment, very deliberately -
they're incredibly difficult to sanitise well. Could you be more
precise about what you're looking to investigate/study and what you'd
need than "a new research tool"?

On 3 April 2015 at 02:37, Simon Givoli <[hidden email]> wrote:
> Hi,
>
> I'm looking for a dump or db of Wikipedia users search logs. I would like it
> to be with recent data, but it doesn't have to be extensive, even a small
> sample size would be sufficient. I aim to use this db to test a new research
> tool I'm developing for my dissertation.
>
> Can anyone point me to a relevant source?
>
> Thanks'
> Simon
>
>
> _______________________________________________
> Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>



--
Oliver Keyes
Research Analyst
Wikimedia Foundation


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Wikipedia search logs needed

Valerio Schiavoni
Those logs could have been cleaned up further and re-released, especially since the privacy issues had an impact only on "small percentage of queries ".
Frankly, it's a pity that after the initial announcement they had to quickly retract.  

Nonetheless Wikimedia could release them for research purposes, asking interested users to sign NDA or such.
I would be very surprised to discover that in 2015 there are no means to properly anonymize datasets and release them to the public.


best,
Valerio

On Fri, Apr 3, 2015 at 4:38 PM, Aaron Halfaker <[hidden email]> wrote:
Maybe someone managed to grab those logs before they took them offline.

If they did, I hope they won't share.  They were taken offline due to privacy issues.

From the blog post: "We’ve temporarily taken down this data to make additional improvements to the anonymization protocol related to the search queries."

-Aaron

On Fri, Apr 3, 2015 at 9:32 AM, Valerio Schiavoni <[hidden email]> wrote:
There has been at least one attempt to release such data:


Maybe someone managed to grab those logs before they took them offline.

Similar but older logs are available here:

best,
valerio


On Fri, Apr 3, 2015 at 4:09 PM, Pine W <[hidden email]> wrote:

Hi Oliver,

Do we even record search logs? It might be a good idea if we didn't.

Pine

On Apr 3, 2015 6:16 AM, "Simon Givoli" <[hidden email]> wrote:
Thanks Oliver,


Sorry if I wasn't clear enough.
My dissertation will involve consented participants. Their search logs will be recorded while searching Wikipedia. The search logs will then be analyzed in order to find recurrent search patterns across participants.
Before beginning the experiment, I want to check that I can indeed find patterns in search logs, using several different algorithms. The idea is to check these algorithms on Wikipedia search logs already available. Hence my request.

Simon
Message: 5
Date: Fri, 3 Apr 2015 09:37:33 +0300
From: Simon Givoli <[hidden email]>
To: [hidden email]
Subject: [Wiki-research-l] Wikipedia search logs needed
Message-ID:
        <CAN=[hidden email]>
Content-Type: text/plain; charset="utf-8"

Hi,

I'm looking for a dump or db of Wikipedia users search logs. I would like
it to be with recent data, but it doesn't have to be extensive, even a
small sample size would be sufficient. I aim to use this db to test a new
research tool I'm developing for my dissertation.

Can anyone point me to a relevant source?

Thanks'
Simon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/wiki-research-l/attachments/20150403/e51fd17c/attachment-0001.html>

------------------------------

Message: 6
Date: Fri, 3 Apr 2015 02:47:54 -0400
From: Oliver Keyes <[hidden email]>
To: Research into Wikimedia content and communities
        <[hidden email]>
Subject: Re: [Wiki-research-l] Wikipedia search logs needed
Message-ID:
        <[hidden email]>
Content-Type: text/plain; charset=UTF-8

Simon,

We really don't release search logs at the moment, very deliberately -
they're incredibly difficult to sanitise well. Could you be more
precise about what you're looking to investigate/study and what you'd
need than "a new research tool"?

On 3 April 2015 at 02:37, Simon Givoli <[hidden email]> wrote:
> Hi,
>
> I'm looking for a dump or db of Wikipedia users search logs. I would like it
> to be with recent data, but it doesn't have to be extensive, even a small
> sample size would be sufficient. I aim to use this db to test a new research
> tool I'm developing for my dissertation.
>
> Can anyone point me to a relevant source?
>
> Thanks'
> Simon
>
>
> _______________________________________________
> Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>



--
Oliver Keyes
Research Analyst
Wikimedia Foundation


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Wikipedia search logs needed

Aaron Halfaker-2
It turns out that anonymization is hard(see [1,2,3]).  A quick web search would have made that clear.  

We do sometimes provide researchers with NDAs for the purposes of anonymizing data.  Again, we have limited time an energy, so such NDAs have been (1) limited to work that is immediately relevant to our own and (2) exists for the purpose of anonymizing and making the data public -- so that everyone can benefit.  

For example, see an project aimed to release anonymized view logs[4].  That proposal has been in process for more than a year though because legal agreements with national research labs are Hard.  It seems like search logs are a candidate for that process, but we'd need to see an anonymization proposal before moving forward.


-Aaron

On Fri, Apr 3, 2015 at 9:47 AM, Valerio Schiavoni <[hidden email]> wrote:
Those logs could have been cleaned up further and re-released, especially since the privacy issues had an impact only on "small percentage of queries ".
Frankly, it's a pity that after the initial announcement they had to quickly retract.  

Nonetheless Wikimedia could release them for research purposes, asking interested users to sign NDA or such.
I would be very surprised to discover that in 2015 there are no means to properly anonymize datasets and release them to the public.


best,
Valerio

On Fri, Apr 3, 2015 at 4:38 PM, Aaron Halfaker <[hidden email]> wrote:
Maybe someone managed to grab those logs before they took them offline.

If they did, I hope they won't share.  They were taken offline due to privacy issues.

From the blog post: "We’ve temporarily taken down this data to make additional improvements to the anonymization protocol related to the search queries."

-Aaron

On Fri, Apr 3, 2015 at 9:32 AM, Valerio Schiavoni <[hidden email]> wrote:
There has been at least one attempt to release such data:


Maybe someone managed to grab those logs before they took them offline.

Similar but older logs are available here:

best,
valerio


On Fri, Apr 3, 2015 at 4:09 PM, Pine W <[hidden email]> wrote:

Hi Oliver,

Do we even record search logs? It might be a good idea if we didn't.

Pine

On Apr 3, 2015 6:16 AM, "Simon Givoli" <[hidden email]> wrote:
Thanks Oliver,


Sorry if I wasn't clear enough.
My dissertation will involve consented participants. Their search logs will be recorded while searching Wikipedia. The search logs will then be analyzed in order to find recurrent search patterns across participants.
Before beginning the experiment, I want to check that I can indeed find patterns in search logs, using several different algorithms. The idea is to check these algorithms on Wikipedia search logs already available. Hence my request.

Simon
Message: 5
Date: Fri, 3 Apr 2015 09:37:33 +0300
From: Simon Givoli <[hidden email]>
To: [hidden email]
Subject: [Wiki-research-l] Wikipedia search logs needed
Message-ID:
        <CAN=[hidden email]>
Content-Type: text/plain; charset="utf-8"

Hi,

I'm looking for a dump or db of Wikipedia users search logs. I would like
it to be with recent data, but it doesn't have to be extensive, even a
small sample size would be sufficient. I aim to use this db to test a new
research tool I'm developing for my dissertation.

Can anyone point me to a relevant source?

Thanks'
Simon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/wiki-research-l/attachments/20150403/e51fd17c/attachment-0001.html>

------------------------------

Message: 6
Date: Fri, 3 Apr 2015 02:47:54 -0400
From: Oliver Keyes <[hidden email]>
To: Research into Wikimedia content and communities
        <[hidden email]>
Subject: Re: [Wiki-research-l] Wikipedia search logs needed
Message-ID:
        <[hidden email]>
Content-Type: text/plain; charset=UTF-8

Simon,

We really don't release search logs at the moment, very deliberately -
they're incredibly difficult to sanitise well. Could you be more
precise about what you're looking to investigate/study and what you'd
need than "a new research tool"?

On 3 April 2015 at 02:37, Simon Givoli <[hidden email]> wrote:
> Hi,
>
> I'm looking for a dump or db of Wikipedia users search logs. I would like it
> to be with recent data, but it doesn't have to be extensive, even a small
> sample size would be sufficient. I aim to use this db to test a new research
> tool I'm developing for my dissertation.
>
> Can anyone point me to a relevant source?
>
> Thanks'
> Simon
>
>
> _______________________________________________
> Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>



--
Oliver Keyes
Research Analyst
Wikimedia Foundation


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Wikipedia search logs needed

Oliver Keyes-4
+1. Valerio, I assume you're a researcher familiar with anonymisation;
you should cast your eye over the AOL search log debacle. The only way
to completely sanitise the logs is to remove all the query strings.

Simon, it sounds like performing this kind of sanitisation would
undermine the work you're doing, which is unfortunate :(. However, if
you can make a strong pitch for this being of value to the Wikimedia
communit(y|ies) I would encourage you to make that pitch; maybe we can
look into an NDA! You might want to check out
https://meta.wikimedia.org/wiki/Research:Geo-aggregation_of_Wikipedia_edits
for an example of an anonymisation proposal Shilad submitted to
accompany a request for dataset releases (mind you, I generally think
everyone should read everything Shilad writes, but that's besides the
point ;p)

On 3 April 2015 at 11:03, Aaron Halfaker <[hidden email]> wrote:

> It turns out that anonymization is hard(see [1,2,3]).  A quick web search
> would have made that clear.
>
> We do sometimes provide researchers with NDAs for the purposes of
> anonymizing data.  Again, we have limited time an energy, so such NDAs have
> been (1) limited to work that is immediately relevant to our own and (2)
> exists for the purpose of anonymizing and making the data public -- so that
> everyone can benefit.
>
> For example, see an project aimed to release anonymized view logs[4].  That
> proposal has been in process for more than a year though because legal
> agreements with national research labs are Hard.  It seems like search logs
> are a candidate for that process, but we'd need to see an anonymization
> proposal before moving forward.
>
> 1. https://en.wikipedia.org/wiki/K-anonymity
> 2. https://en.wikipedia.org/wiki/L-diversity
> 3. https://en.wikipedia.org/wiki/T-closeness
> 4.
> https://meta.wikimedia.org/wiki/Research:Geo-aggregation_of_Wikipedia_pageviews
>
> -Aaron
>
> On Fri, Apr 3, 2015 at 9:47 AM, Valerio Schiavoni
> <[hidden email]> wrote:
>>
>> Those logs could have been cleaned up further and re-released, especially
>> since the privacy issues had an impact only on "small percentage of queries
>> ".
>> Frankly, it's a pity that after the initial announcement they had to
>> quickly retract.
>>
>> Nonetheless Wikimedia could release them for research purposes, asking
>> interested users to sign NDA or such.
>> I would be very surprised to discover that in 2015 there are no means to
>> properly anonymize datasets and release them to the public.
>>
>>
>> best,
>> Valerio
>>
>> On Fri, Apr 3, 2015 at 4:38 PM, Aaron Halfaker <[hidden email]>
>> wrote:
>>>>
>>>> Maybe someone managed to grab those logs before they took them offline.
>>>
>>>
>>> If they did, I hope they won't share.  They were taken offline due to
>>> privacy issues.
>>>
>>> From the blog post: "We’ve temporarily taken down this data to make
>>> additional improvements to the anonymization protocol related to the search
>>> queries."
>>>
>>> -Aaron
>>>
>>> On Fri, Apr 3, 2015 at 9:32 AM, Valerio Schiavoni
>>> <[hidden email]> wrote:
>>>>
>>>> There has been at least one attempt to release such data:
>>>>
>>>>
>>>> http://blog.wikimedia.org/2012/09/19/what-are-readers-looking-for-wikipedia-search-data-now-available/
>>>>
>>>> Maybe someone managed to grab those logs before they took them offline.
>>>>
>>>> Similar but older logs are available here:
>>>> http://www.wikibench.eu/
>>>>
>>>> best,
>>>> valerio
>>>>
>>>>
>>>> On Fri, Apr 3, 2015 at 4:09 PM, Pine W <[hidden email]> wrote:
>>>>>
>>>>> Hi Oliver,
>>>>>
>>>>> Do we even record search logs? It might be a good idea if we didn't.
>>>>>
>>>>> Pine
>>>>>
>>>>> On Apr 3, 2015 6:16 AM, "Simon Givoli" <[hidden email]> wrote:
>>>>>>
>>>>>> Thanks Oliver,
>>>>>>
>>>>>>
>>>>>> Sorry if I wasn't clear enough.
>>>>>> My dissertation will involve consented participants. Their search logs
>>>>>> will be recorded while searching Wikipedia. The search logs will then be
>>>>>> analyzed in order to find recurrent search patterns across participants.
>>>>>> Before beginning the experiment, I want to check that I can indeed
>>>>>> find patterns in search logs, using several different algorithms. The idea
>>>>>> is to check these algorithms on Wikipedia search logs already available.
>>>>>> Hence my request.
>>>>>>
>>>>>> Simon
>>>>>> Message: 5
>>>>>> Date: Fri, 3 Apr 2015 09:37:33 +0300
>>>>>> From: Simon Givoli <[hidden email]>
>>>>>> To: [hidden email]
>>>>>> Subject: [Wiki-research-l] Wikipedia search logs needed
>>>>>> Message-ID:
>>>>>>
>>>>>> <CAN=[hidden email]>
>>>>>> Content-Type: text/plain; charset="utf-8"
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm looking for a dump or db of Wikipedia users search logs. I would
>>>>>> like
>>>>>> it to be with recent data, but it doesn't have to be extensive, even a
>>>>>> small sample size would be sufficient. I aim to use this db to test a
>>>>>> new
>>>>>> research tool I'm developing for my dissertation.
>>>>>>
>>>>>> Can anyone point me to a relevant source?
>>>>>>
>>>>>> Thanks'
>>>>>> Simon
>>>>>> -------------- next part --------------
>>>>>> An HTML attachment was scrubbed...
>>>>>> URL:
>>>>>> <https://lists.wikimedia.org/pipermail/wiki-research-l/attachments/20150403/e51fd17c/attachment-0001.html>
>>>>>>
>>>>>> ------------------------------
>>>>>>
>>>>>> Message: 6
>>>>>> Date: Fri, 3 Apr 2015 02:47:54 -0400
>>>>>> From: Oliver Keyes <[hidden email]>
>>>>>> To: Research into Wikimedia content and communities
>>>>>>         <[hidden email]>
>>>>>> Subject: Re: [Wiki-research-l] Wikipedia search logs needed
>>>>>> Message-ID:
>>>>>>
>>>>>> <[hidden email]>
>>>>>> Content-Type: text/plain; charset=UTF-8
>>>>>>
>>>>>> Simon,
>>>>>>
>>>>>> We really don't release search logs at the moment, very deliberately -
>>>>>> they're incredibly difficult to sanitise well. Could you be more
>>>>>> precise about what you're looking to investigate/study and what you'd
>>>>>> need than "a new research tool"?
>>>>>>
>>>>>> On 3 April 2015 at 02:37, Simon Givoli <[hidden email]> wrote:
>>>>>> > Hi,
>>>>>> >
>>>>>> > I'm looking for a dump or db of Wikipedia users search logs. I would
>>>>>> > like it
>>>>>> > to be with recent data, but it doesn't have to be extensive, even a
>>>>>> > small
>>>>>> > sample size would be sufficient. I aim to use this db to test a new
>>>>>> > research
>>>>>> > tool I'm developing for my dissertation.
>>>>>> >
>>>>>> > Can anyone point me to a relevant source?
>>>>>> >
>>>>>> > Thanks'
>>>>>> > Simon
>>>>>> >
>>>>>> >
>>>>>> > _______________________________________________
>>>>>> > Wiki-research-l mailing list
>>>>>> > [hidden email]
>>>>>> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>>>> >
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Oliver Keyes
>>>>>> Research Analyst
>>>>>> Wikimedia Foundation
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Wiki-research-l mailing list
>>>>>> [hidden email]
>>>>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Wiki-research-l mailing list
>>>>> [hidden email]
>>>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Wiki-research-l mailing list
>>>> [hidden email]
>>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>>
>>>
>>>
>>> _______________________________________________
>>> Wiki-research-l mailing list
>>> [hidden email]
>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>
>>
>>
>> _______________________________________________
>> Wiki-research-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>



--
Oliver Keyes
Research Analyst
Wikimedia Foundation

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Wikipedia search logs needed

Scott Hale
The search logs will then be analyzed in order to find recurrent search patterns across participants.
Before beginning the experiment, I want to check that I can indeed find patterns in search logs, using several different algorithms. The idea is to check these algorithms on Wikipedia search logs already available. Hence my request.

Perhaps look at the Click Dataset from Indiana University? I'm not super familiar with it, but it seems that it would have some search data fit for your purposes.
http://cnets.indiana.edu/groups/nan/webtraffic/click-dataset/

Cheers,
Scott





_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Wikipedia search logs needed

Taha Yasseri
Hi Simon, 

You might also find this dataset useful: "Wikispeedia navigation paths"

Best,
Taha

On Sat, Apr 4, 2015 at 3:46 AM, Scott Hale <[hidden email]> wrote:
The search logs will then be analyzed in order to find recurrent search patterns across participants.
Before beginning the experiment, I want to check that I can indeed find patterns in search logs, using several different algorithms. The idea is to check these algorithms on Wikipedia search logs already available. Hence my request.

Perhaps look at the Click Dataset from Indiana University? I'm not super familiar with it, but it seems that it would have some search data fit for your purposes.
http://cnets.indiana.edu/groups/nan/webtraffic/click-dataset/

Cheers,
Scott





_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l




--
.t

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l