Terms of use for wmf dump metadata?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Terms of use for wmf dump metadata?

Edward L Platt
Hi all,

We used the revision history metadata from the official wmf dumps site in a
forthcoming paper on coeditor network structure (linked below). We're
publishing our code and (hopefully) the derived networks on the UMich
DeepBlue archival server.

I can find information on the copyright/terms-of-use for text and image
data, but nothing explicit about the metadata. Anyone know if that exists?

Thanks!

E.L. Platt & D.M. Romero. "Network Structure, Efficiency, and Performance
in WikiProjects." ICWSM 2018.
https://arxiv.org/abs/1804.03763


--
Edward L. Platt
PhD Candidate, University of Michigan School of Information
he/him | https://elplatt.com | @elplatt | @[hidden email]

Tips for stopping email overload:
https://hbr.org/2012/02/stop-email-overload-1
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Terms of use for wmf dump metadata?

Federico Leva (Nemo)
Edward L Platt, 16/05/2018 18:57:
> I can find information on the copyright/terms-of-use for text and image
> data, but nothing explicit about the metadata.

Which metadata are you talking about? The copyright license applies to
the whole XML text.

Federico

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Terms of use for wmf dump metadata?

Edward L Platt
We're using the pages-meta-history XML files (user ids, timestamps, article
ids, etc). Everything I can find on the WMF site refers to "textual
content" which is a bit unclear about metadata. Our archival librarians
would be a lot more comfortable if I could point them to something very
explicit about terms of use for metadata and derivatives.

On Wed, May 16, 2018 at 12:15 PM, Federico Leva (Nemo) <[hidden email]>
wrote:

> Edward L Platt, 16/05/2018 18:57:
>
>> I can find information on the copyright/terms-of-use for text and image
>> data, but nothing explicit about the metadata.
>>
>
> Which metadata are you talking about? The copyright license applies to the
> whole XML text.
>
> Federico
>



--
Edward L. Platt
PhD Candidate, University of Michigan School of Information
he/him | https://elplatt.com | @elplatt | @[hidden email]

Tips for stopping email overload:
https://hbr.org/2012/02/stop-email-overload-1
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Terms of use for wmf dump metadata?

Federico Leva (Nemo)
Edward L Platt, 16/05/2018 19:23:
> We're using the pages-meta-history XML files (user ids, timestamps,
> article ids, etc). Everything I can find on the WMF site refers to
> "textual content" which is a bit unclear about metadata.

The legal page has been added only recently and it's probably unclear,
but "textual content" just means everything that is not multimedia
files. The word is used in the sense of the terms of use:
<https://meta.wikimedia.org/wiki/Terms_of_use#7d>

> Our archival
> librarians would be a lot more comfortable if I could point them to
> something very explicit about terms of use for metadata

 From our point of view, that's hardly even metadata. It's just
MediaWiki-internal material, which tells little if anything about the
data. It's also below the threshold of originality and produced
automatically by a software, therefore clearly copyright ineligible.

If this is about problems in EU, we can add a CC-0 note to waive any
hypothetical sui generis database rights on MediaWiki's internal
identifiers. But it's useless anyway, because those are generated,
stored and published in USA.

> and derivatives.
What derivatives?

Federico

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Terms of use for wmf dump metadata?

Edward L Platt
The derivatives in this case are coeditor networks for each WikiProject,
based on which editors have edited the same articles.

On Wed, May 16, 2018 at 2:06 PM, Federico Leva (Nemo) <[hidden email]>
wrote:

> Edward L Platt, 16/05/2018 19:23:
>
>> We're using the pages-meta-history XML files (user ids, timestamps,
>> article ids, etc). Everything I can find on the WMF site refers to "textual
>> content" which is a bit unclear about metadata.
>>
>
> The legal page has been added only recently and it's probably unclear, but
> "textual content" just means everything that is not multimedia files. The
> word is used in the sense of the terms of use:
> <https://meta.wikimedia.org/wiki/Terms_of_use#7d>
>
> Our archival librarians would be a lot more comfortable if I could point
>> them to something very explicit about terms of use for metadata
>>
>
> From our point of view, that's hardly even metadata. It's just
> MediaWiki-internal material, which tells little if anything about the data.
> It's also below the threshold of originality and produced automatically by
> a software, therefore clearly copyright ineligible.
>
> If this is about problems in EU, we can add a CC-0 note to waive any
> hypothetical sui generis database rights on MediaWiki's internal
> identifiers. But it's useless anyway, because those are generated, stored
> and published in USA.
>
> and derivatives.
>>
> What derivatives?
>
> Federico
>



--
Edward L. Platt
PhD Candidate, University of Michigan School of Information
he/him | https://elplatt.com | @elplatt | @[hidden email]

Tips for stopping email overload:
https://hbr.org/2012/02/stop-email-overload-1
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Terms of use for wmf dump metadata?

Federico Leva (Nemo)
Edward L Platt, 16/05/2018 23:16:
> The derivatives in this case are coeditor networks for each WikiProject,
> based on which editors have edited the same articles.

Is this something you produce yourself? I cannot find such a dataset in
<https://dumps.wikimedia.org/other/>.

Are you in EU?

Federico

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Terms of use for wmf dump metadata?

Edward L Platt
We're in the US. We created the dataset ourselves, based on the info in the
pages-meta-history XML files (combined with the WikiProject info from the
articles themselves).

On Wed, May 16, 2018 at 5:17 PM, Federico Leva (Nemo) <[hidden email]>
wrote:

> Edward L Platt, 16/05/2018 23:16:
>
>> The derivatives in this case are coeditor networks for each WikiProject,
>> based on which editors have edited the same articles.
>>
>
> Is this something you produce yourself? I cannot find such a dataset in <
> https://dumps.wikimedia.org/other/>.
>
> Are you in EU?
>
> Federico
>



--
Edward L. Platt
PhD Candidate, University of Michigan School of Information
he/him | https://elplatt.com | @elplatt | @[hidden email]

Tips for stopping email overload:
https://hbr.org/2012/02/stop-email-overload-1
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l