[Wikimedia-l] BabelNet is remixing Wikimedia content without following CC-By-SA terms

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[Wikimedia-l] BabelNet is remixing Wikimedia content without following CC-By-SA terms

Rob Speer
BabelNet (http://babelnet.org) is a multilingual knowledge resource that
defines words and phrases in many languages. I've noticed that it copies
large amounts of content from Wikimedia projects, including Wikipedia,
Wiktionary, and Wikiquote, while violating Wikimedia's CC-By-SA license by
placing the content under an incompatible CC-By-NC-SA license.

As one example, I can search BabelNet for "Timsort", a Wikipedia article
whose first sentence is one I wrote:
http://live.babelnet.org/synset?word=Timsort&lang=EN&details=1&orig=Timsort

The sentence I wrote appears at the top of the page (with credit to
Wikipedia). The rest of the page is also content remixed from Wikipedia,
including a gallery of images that are presented without credit. A scrolly
box in the footer of the page says the content is under the CC-By-NC-SA 3.0
license. Other pages, such as http://babelnet.org/synset?word=bn:00852566n,
combine data from multiple different resources.

The BabelNet creators are aware of the CC-By-SA licenses of the resources
they use (see http://babelnet.org/licenses/). In addition to the
non-commercial license they offer, their company, Babelscape (
http://babelscape.com/), sells commercial licenses to BabelNet.

I reached out to Roberto Navigli, who runs BabelNet and Babelscape, over
e-mail on March 23. I asked if the non-commercial license clause was simply
a mistake. In his reply, Navigli stated that BabelNet is not a derived
work, but is a CC-By-NC-SA-licensed collection made of several different
works. I responded that BabelNet doesn't meet the Creative Commons
definition of a "Collective Work", which would be necessary for it to not
be a derived work. Navigli responded:

"actually it is a collection of derivative work of several resources with
heretogeneous licenses, each of which clearly separated with separate
licenses and bundles. By transitivity derivative work is work with a
certain license, so it is work. Therefore, it is a collection of works with
different licenses and it can keep a separate license."

I believe this is nonsense on multiple levels. BabelNet is a derived work,
and if someone could disregard their obligation to share-alike their
derived work simply because they derived it from multiple resources, there
would be no point to putting ShareAlike clauses on data resources at all.

As a Wikipedia contributor (and a lapsed admin), I am sad to see BabelNet
appropriating the hard work of Wikimedians and others, placing a more
restrictive license on it, and selling it. This is also relevant for me
because I run ConceptNet (http://www.conceptnet.io/), a similar knowledge
resource, and I have made sure to follow Creative Commons license
requirements and to release all its data as CC-By-SA.

In a way I see BabelNet as a competitor, but ConceptNet is an open data
project and this space shouldn't have "competitors". If the Creative
Commons license were being used appropriately, then all of us working with
this kind of data would be collaborators in the world of Linked Open Data.
My preferred outcome would be to get BabelNet to change the copyright
notices and Creative Commons links on their site to remove the
"non-commercial" requirement, and to be able to download and use their data
under the CC-By-SA license that it should be under.

I'm sure Wikimedia has dealt with similar situations to this. What would be
the most effective next step to ensure that BabelNet follows the CC-By-SA
license?

-- Rob Speer
_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: [hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:[hidden email]?subject=unsubscribe>
Reply | Threaded
Open this post in threaded view
|

Re: [Wikimedia-l] BabelNet is remixing Wikimedia content without following CC-By-SA terms

Mike Peel
They also appear to be using photos from Wikimedia Commons without paying attention to the license. I can find photos of mine that are CC-BY-SA-4.0 licensed that are being used without any metadata at all, let alone attribution and the correct CC license info…

The same is also true for Everipedia, BTW.

Thanks,
Mike

> On 10 Apr 2018, at 14:43, Rob Speer <[hidden email]> wrote:
>
> BabelNet (http://babelnet.org) is a multilingual knowledge resource that
> defines words and phrases in many languages. I've noticed that it copies
> large amounts of content from Wikimedia projects, including Wikipedia,
> Wiktionary, and Wikiquote, while violating Wikimedia's CC-By-SA license by
> placing the content under an incompatible CC-By-NC-SA license.
>
> As one example, I can search BabelNet for "Timsort", a Wikipedia article
> whose first sentence is one I wrote:
> http://live.babelnet.org/synset?word=Timsort&lang=EN&details=1&orig=Timsort
>
> The sentence I wrote appears at the top of the page (with credit to
> Wikipedia). The rest of the page is also content remixed from Wikipedia,
> including a gallery of images that are presented without credit. A scrolly
> box in the footer of the page says the content is under the CC-By-NC-SA 3.0
> license. Other pages, such as http://babelnet.org/synset?word=bn:00852566n,
> combine data from multiple different resources.
>
> The BabelNet creators are aware of the CC-By-SA licenses of the resources
> they use (see http://babelnet.org/licenses/). In addition to the
> non-commercial license they offer, their company, Babelscape (
> http://babelscape.com/), sells commercial licenses to BabelNet.
>
> I reached out to Roberto Navigli, who runs BabelNet and Babelscape, over
> e-mail on March 23. I asked if the non-commercial license clause was simply
> a mistake. In his reply, Navigli stated that BabelNet is not a derived
> work, but is a CC-By-NC-SA-licensed collection made of several different
> works. I responded that BabelNet doesn't meet the Creative Commons
> definition of a "Collective Work", which would be necessary for it to not
> be a derived work. Navigli responded:
>
> "actually it is a collection of derivative work of several resources with
> heretogeneous licenses, each of which clearly separated with separate
> licenses and bundles. By transitivity derivative work is work with a
> certain license, so it is work. Therefore, it is a collection of works with
> different licenses and it can keep a separate license."
>
> I believe this is nonsense on multiple levels. BabelNet is a derived work,
> and if someone could disregard their obligation to share-alike their
> derived work simply because they derived it from multiple resources, there
> would be no point to putting ShareAlike clauses on data resources at all.
>
> As a Wikipedia contributor (and a lapsed admin), I am sad to see BabelNet
> appropriating the hard work of Wikimedians and others, placing a more
> restrictive license on it, and selling it. This is also relevant for me
> because I run ConceptNet (http://www.conceptnet.io/), a similar knowledge
> resource, and I have made sure to follow Creative Commons license
> requirements and to release all its data as CC-By-SA.
>
> In a way I see BabelNet as a competitor, but ConceptNet is an open data
> project and this space shouldn't have "competitors". If the Creative
> Commons license were being used appropriately, then all of us working with
> this kind of data would be collaborators in the world of Linked Open Data.
> My preferred outcome would be to get BabelNet to change the copyright
> notices and Creative Commons links on their site to remove the
> "non-commercial" requirement, and to be able to download and use their data
> under the CC-By-SA license that it should be under.
>
> I'm sure Wikimedia has dealt with similar situations to this. What would be
> the most effective next step to ensure that BabelNet follows the CC-By-SA
> license?
>
> -- Rob Speer
> _______________________________________________
> Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l
> New messages to: [hidden email]
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:[hidden email]?subject=unsubscribe>


_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: [hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:[hidden email]?subject=unsubscribe>
Reply | Threaded
Open this post in threaded view
|

Re: [Wikimedia-l] BabelNet is remixing Wikimedia content without following CC-By-SA terms

Rob Speer
Everipedia sounds even worse, because they sound like the kind of
move-fast-and-break-laws blockchain startup that thinks the legal system is
something that happens to other people. But  Roberto Navigli is a respected
academic and presumably has some interest in following the law, if he can
be convinced that his self-serving interpretation of the law will not hold
up.

Again, there has to be a process that's been followed before, right?
BabelNet and Everipedia can't be the first instances of people dumping all
the data from Wikimedia projects into their own projects without following
the license.

Another interesting twist: the CC-By-NC-SA download they offered to "people
wanting to use BabelNet for research purposes" has been taken offline "for
the Easter holiday", which approximately coincides with when Navigli
responded to my e-mail, but unless Easter is a very long holiday in Italy I
suspect that it's gone for the indefinite future. So they aren't sharing
_anything_ anymore.

I believe that what BabelNet needs to do is:

- Change the license of BabelNet from CC-By-NC-SA 3.0 to CC-By-SA 4.0
- Add attribution and license information to their images (or remove the
image galleries)
- Relicense or remove the dependencies of BabelNet that have non-commercial
licenses (they use a toolkit called JLTUtils that is developed at the same
university, under a CC-By-NC-SA license, which is strange because it
appears to be software and not content)
- Reinstate the downloadable version of the data, with no academic-only
restrictions

I don't want to end up issuing some sort of copyright takedown against
BabelNet. It's a project that should keep existing, but under the correct
license.


On Wed, 11 Apr 2018 at 09:49 Michael Peel <[hidden email]> wrote:

> They also appear to be using photos from Wikimedia Commons without paying
> attention to the license. I can find photos of mine that are CC-BY-SA-4.0
> licensed that are being used without any metadata at all, let alone
> attribution and the correct CC license info…
>
> The same is also true for Everipedia, BTW.
>
> Thanks,
> Mike
>
> > On 10 Apr 2018, at 14:43, Rob Speer <[hidden email]> wrote:
> >
> > BabelNet (http://babelnet.org) is a multilingual knowledge resource that
> > defines words and phrases in many languages. I've noticed that it copies
> > large amounts of content from Wikimedia projects, including Wikipedia,
> > Wiktionary, and Wikiquote, while violating Wikimedia's CC-By-SA license
> by
> > placing the content under an incompatible CC-By-NC-SA license.
> >
> > As one example, I can search BabelNet for "Timsort", a Wikipedia article
> > whose first sentence is one I wrote:
> >
> http://live.babelnet.org/synset?word=Timsort&lang=EN&details=1&orig=Timsort
> >
> > The sentence I wrote appears at the top of the page (with credit to
> > Wikipedia). The rest of the page is also content remixed from Wikipedia,
> > including a gallery of images that are presented without credit. A
> scrolly
> > box in the footer of the page says the content is under the CC-By-NC-SA
> 3.0
> > license. Other pages, such as
> http://babelnet.org/synset?word=bn:00852566n,
> > combine data from multiple different resources.
> >
> > The BabelNet creators are aware of the CC-By-SA licenses of the resources
> > they use (see http://babelnet.org/licenses/). In addition to the
> > non-commercial license they offer, their company, Babelscape (
> > http://babelscape.com/), sells commercial licenses to BabelNet.
> >
> > I reached out to Roberto Navigli, who runs BabelNet and Babelscape, over
> > e-mail on March 23. I asked if the non-commercial license clause was
> simply
> > a mistake. In his reply, Navigli stated that BabelNet is not a derived
> > work, but is a CC-By-NC-SA-licensed collection made of several different
> > works. I responded that BabelNet doesn't meet the Creative Commons
> > definition of a "Collective Work", which would be necessary for it to not
> > be a derived work. Navigli responded:
> >
> > "actually it is a collection of derivative work of several resources with
> > heretogeneous licenses, each of which clearly separated with separate
> > licenses and bundles. By transitivity derivative work is work with a
> > certain license, so it is work. Therefore, it is a collection of works
> with
> > different licenses and it can keep a separate license."
> >
> > I believe this is nonsense on multiple levels. BabelNet is a derived
> work,
> > and if someone could disregard their obligation to share-alike their
> > derived work simply because they derived it from multiple resources,
> there
> > would be no point to putting ShareAlike clauses on data resources at all.
> >
> > As a Wikipedia contributor (and a lapsed admin), I am sad to see BabelNet
> > appropriating the hard work of Wikimedians and others, placing a more
> > restrictive license on it, and selling it. This is also relevant for me
> > because I run ConceptNet (http://www.conceptnet.io/), a similar
> knowledge
> > resource, and I have made sure to follow Creative Commons license
> > requirements and to release all its data as CC-By-SA.
> >
> > In a way I see BabelNet as a competitor, but ConceptNet is an open data
> > project and this space shouldn't have "competitors". If the Creative
> > Commons license were being used appropriately, then all of us working
> with
> > this kind of data would be collaborators in the world of Linked Open
> Data.
> > My preferred outcome would be to get BabelNet to change the copyright
> > notices and Creative Commons links on their site to remove the
> > "non-commercial" requirement, and to be able to download and use their
> data
> > under the CC-By-SA license that it should be under.
> >
> > I'm sure Wikimedia has dealt with similar situations to this. What would
> be
> > the most effective next step to ensure that BabelNet follows the CC-By-SA
> > license?
> >
> > -- Rob Speer
> > _______________________________________________
> > Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
> > New messages to: [hidden email]
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:[hidden email]?subject=unsubscribe>
>
>
_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: [hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:[hidden email]?subject=unsubscribe>