Re: [Textbook-l] License information (was: PDF/Collection feature live on de.wikibooks)

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [Textbook-l] License information (was: PDF/Collection feature live on de.wikibooks)

Brianna Laugher
2008/10/14 Johannes Beigel <[hidden email]>:

>>> Secondly, current version of the tool does a plagiarism - beacause
>>> it does not mention
>>> image authors and does not provide any mean (like by making images
>>> clickable) to check
>>> these authors.
>>
>> Ouch, thanks for pointing that out. Tricky to do this automatically
>> since it's all wiki-text with templates, but we'll investigate a
>> solution here.
>
> We'd highly appreciate input from the community regarding this topic!
>
> The printed books from PediaPress contain a list of figures where the
> license of each image is listed, together with the URL to the image
> description page. As some kind of "hotfix" this solution could be
> implemented in the PDF export of the Collection extension, too. But
> this doesn't really solve the problem.
>
> We think it's more of a technical/software thing, so I cross-posted
> (and set Reply-To) to Wikitech-l.
>
> In our opinion, license management/handling must be a core feature of
> MediaWiki, because the software is explicitely developed for the
> collaborative distribution of free content. Licenses of the containing
> articles and images should not be represented via some agreed-upon
> convention but via structured (and machine-readable) information,
> available for each relevant object in the wiki.
>
> Some information that would be desired:
>
> - Full (official) name of the license(s).
> - Whether the full text of the license has to be included or a
> reference sufficient.
> - Reference to the full text of the license(s) (in some rigidly
> defined format like wikitext).
> - Whether attribution is required. If so: The list of required
> attributions.
>
> So, basically all the information that's required to check if it's
> possible to take some part of the MediaWiki and use it somewhere else
> and all the information that has to be included in that other place.
> This information could be made accessible via MediaWiki API, but
> ideally it's contained in the wikitext and/or XHTML, too.

Because different wikis implement licenses in different ways (ie there
are no naming conventions for license templates), I am not sure this
license information would belong in MediaWiki core. But I think that
definitely Wikimedia Commons, and perhaps other Wikimedia wikis that
accept freely licensed uploads, should work on providing a "community
API" layer. My thinking behind this is that the communities build a
lot of structure into their content via templates or categories or
whatever. It makes sense to provide an API to stop every third party
user having to reinvent the wheel.

On Wikimedia Commons a little bit of work has been done to this end:
<http://commons.wikimedia.org/wiki/Commons:Commons_API>

In particular this contains some of the license info you mentioned.
e.g. below is the info for the GFDL.

GFDL

full_name
    GNU Free Documentation License
attach_full_license_text
    1
attribute_author
    1
keep_under_same_license
    1
keep_under_similar_license
    0
license_logo_url
    http://upload.wikimedia.org/wikipedia/commons/thumb/2/22/Heckert_GNU_white.svg/64px-Heckert_GNU_white.svg.png
license_info_url
    http://www.gnu.org/copyleft/fdl.html
license_text_url
    http://www.gnu.org/licenses/fdl.txt

The "Commons API" also has an author field.
<http://toolserver.org/~magnus/commonsapi.php?image=Sa-warthog.jpg&meta>
I think at the moment this is being taken from the {{information}}
template. You can see in this example it includes a wiki link; it
should have already been resolved to a full URL, so there is
definitely still work to be done.

I would be interested to know if further development of the Commons
API would be "heading in the right direction" for PediaPress.

cheers,
Brianna

--
They've just been waiting in a mountain for the right moment:
http://modernthings.org/

_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l
Reply | Threaded
Open this post in threaded view
|

Re: [Textbook-l] License information (was: PDF/Collection feature live on de.wikibooks)

Plyd-2


On Thu, Jan 29, 2009 at 1:48 PM, Brianna Laugher <[hidden email]> wrote:
2008/10/14 Johannes Beigel <[hidden email]>:
>>> Secondly, current version of the tool does a plagiarism - beacause
>>> it does not mention
>>> image authors and does not provide any mean (like by making images
>>> clickable) to check
>>> these authors.
>>
>> Ouch, thanks for pointing that out. Tricky to do this automatically
>> since it's all wiki-text with templates, but we'll investigate a
>> solution here.
>
> We'd highly appreciate input from the community regarding this topic!
>
> The printed books from PediaPress contain a list of figures where the
> license of each image is listed, together with the URL to the image
> description page. As some kind of "hotfix" this solution could be
> implemented in the PDF export of the Collection extension, too. But
> this doesn't really solve the problem.
>
> We think it's more of a technical/software thing, so I cross-posted
> (and set Reply-To) to Wikitech-l.
>
> In our opinion, license management/handling must be a core feature of
> MediaWiki, because the software is explicitely developed for the
> collaborative distribution of free content. Licenses of the containing
> articles and images should not be represented via some agreed-upon
> convention but via structured (and machine-readable) information,
> available for each relevant object in the wiki.
>
> Some information that would be desired:
>
> - Full (official) name of the license(s).
> - Whether the full text of the license has to be included or a
> reference sufficient.
> - Reference to the full text of the license(s) (in some rigidly
> defined format like wikitext).
> - Whether attribution is required. If so: The list of required
> attributions.
>
> So, basically all the information that's required to check if it's
> possible to take some part of the MediaWiki and use it somewhere else
> and all the information that has to be included in that other place.
> This information could be made accessible via MediaWiki API, but
> ideally it's contained in the wikitext and/or XHTML, too.

Because different wikis implement licenses in different ways (ie there
are no naming conventions for license templates), I am not sure this
license information would belong in MediaWiki core. But I think that
definitely Wikimedia Commons, and perhaps other Wikimedia wikis that
accept freely licensed uploads, should work on providing a "community
API" layer. My thinking behind this is that the communities build a
lot of structure into their content via templates or categories or
whatever. It makes sense to provide an API to stop every third party
user having to reinvent the wheel.

On Wikimedia Commons a little bit of work has been done to this end:
<http://commons.wikimedia.org/wiki/Commons:Commons_API>

In particular this contains some of the license info you mentioned.
e.g. below is the info for the GFDL.

GFDL

full_name
   GNU Free Documentation License
attach_full_license_text
   1
attribute_author
   1
keep_under_same_license
   1
keep_under_similar_license
   0
license_logo_url
   http://upload.wikimedia.org/wikipedia/commons/thumb/2/22/Heckert_GNU_white.svg/64px-Heckert_GNU_white.svg.png
license_info_url
   http://www.gnu.org/copyleft/fdl.html
license_text_url
   http://www.gnu.org/licenses/fdl.txt

The "Commons API" also has an author field.
<http://toolserver.org/~magnus/commonsapi.php?image=Sa-warthog.jpg&meta>
I think at the moment this is being taken from the {{information}}
template. You can see in this example it includes a wiki link; it
should have already been resolved to a full URL, so there is
definitely still work to be done.

I would be interested to know if further development of the Commons
API would be "heading in the right direction" for PediaPress.

 Hello,

I'm speaking for the Poster Project of Fr-Wikipedia, but its needs are very similar to PediaPress.

We need to answer this question :
<< What is the minimum Credit line to provide when distributing the file? >>

We currently parse/provide the document, that's why such Commons API would help a lot.

But, even with this API, we still have to answer questions:
- do we have to provide author, origin, uploader or commons url?
- with the API, how can we get the shortest text to provide? (if possible without even checking the licence)

Example on the functionnality we would need:
GetMinimumCreditLine("Sa-warthog.jpg", "printable", "en")
 -> ("From Sanjay ach, under GFDL", FlagProvideGFDL)

or
GetMinimumCreditLine("Sa-warthog.jpg", "web", "en")
 -> ("From Sanjay ach, under <a href='urlgfdl'>GFDL</a>")

Cheers,
Plyd

_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l
Reply | Threaded
Open this post in threaded view
|

Re: License information

Platonides
In reply to this post by Brianna Laugher
Implementing it would allow implementing bugs 3361, 9294 and 9616 (with
a 'copyrighted' license). Also affect 14048.

Also note we will need some licensing system if the wikis move to
CC-BY-SA to differenciate GFDL+CC-BY-SA and CC-BY-SA only content.



_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l
Reply | Threaded
Open this post in threaded view
|

Re: [Wikitech-l] License information (was: PDF/Collection feature live on de.wikibooks)

Brianna Laugher
In reply to this post by Brianna Laugher
2009/1/30 Johannes Beigel <[hidden email]>:

> On 29.01.2009, at 13:48, Brianna Laugher wrote:
>  > On Wikimedia Commons a little bit of work has been done to this end:
>  > <http://commons.wikimedia.org/wiki/Commons:Commons_API>
>
> We've been aware of this page and Magnus' implementation, and we think
> it looks really good!
>
> The information is (AFAIK) scraped from the rendered XHTML of
> articles. This could be done in a less error-prone way (and more
> efficiently) if the data would be stored and accessed via database in
> some way. Of course this would require some discussion, formal
> decisions and code changes. But as I stated in an earlier post: I
> think MediaWiki is so widely used by people who want to share and
> collaborate on free content, that it's not too farfetched to build
> some "license infrastracture" into the software itself.

I agree that it makes a lot of sense. But because it would be a big
change, I fear that unless the lead developers show great enthusiasm
for the idea, it will take a very long time to be accepted and
completed. Whereas building an "add-on" tool can be faster to get to
point of functionality.

It may be a good idea to try and build the Commons API to mimic the
MediaWiki API, imagining that in the future such information will be
available via that. So then hopefully for now people could use the
Commons API, and in the future switch to the MediaWiki API by just
changing the API URL, and all their queries could stay the same.

How does that sound? Other ideas about how to approach it are welcome...

cheers
Brianna

--
They've just been waiting in a mountain for the right moment:
http://modernthings.org/

_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l
Reply | Threaded
Open this post in threaded view
|

Re: [Wikitech-l] License information

Daniel Kinzler
Brianna Laugher schrieb:
>
> I agree that it makes a lot of sense. But because it would be a big
> change, I fear that unless the lead developers show great enthusiasm
> for the idea, it will take a very long time to be accepted and
> completed. Whereas building an "add-on" tool can be faster to get to
> point of functionality.

Guys, before re-inventing several wheels, please look at what we already have.

Please have a look at
<http://commons.wikimedia.org/wiki/Commons:Tag_categories>, which defines a way
to make license tags machine readable. Using that scheme, it would be easy to
build a script on the toolserver that delivers metadata in a machine readable
form. No need for screen scraping.

Also, please consider <http://www.mediawiki.org/wiki/Extension:RDF> which
provides a way for mediawiki to serve machine readable metadata about anything
and everything. It would be easy to integrate it into license tags. It has been
around for years, all it needs is a little push from the community and some code
review.

-- daniel

_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l
Reply | Threaded
Open this post in threaded view
|

Re: [Wikitech-l] License information (was: PDF/Collection feature live on de.wikibooks)

Magnus Manske-2
In reply to this post by Brianna Laugher
On Fri, Jan 30, 2009 at 12:55 AM, Brianna Laugher
<[hidden email]> wrote:

> 2009/1/30 Johannes Beigel <[hidden email]>:
>> On 29.01.2009, at 13:48, Brianna Laugher wrote:
>>  > On Wikimedia Commons a little bit of work has been done to this end:
>>  > <http://commons.wikimedia.org/wiki/Commons:Commons_API>
>>
>> We've been aware of this page and Magnus' implementation, and we think
>> it looks really good!
>>
>> The information is (AFAIK) scraped from the rendered XHTML of
>> articles. This could be done in a less error-prone way (and more
>> efficiently) if the data would be stored and accessed via database in
>> some way. Of course this would require some discussion, formal
>> decisions and code changes. But as I stated in an earlier post: I
>> think MediaWiki is so widely used by people who want to share and
>> collaborate on free content, that it's not too farfetched to build
>> some "license infrastracture" into the software itself.
>
> I agree that it makes a lot of sense. But because it would be a big
> change, I fear that unless the lead developers show great enthusiasm
> for the idea, it will take a very long time to be accepted and
> completed. Whereas building an "add-on" tool can be faster to get to
> point of functionality.
>
> It may be a good idea to try and build the Commons API to mimic the
> MediaWiki API, imagining that in the future such information will be
> available via that. So then hopefully for now people could use the
> Commons API, and in the future switch to the MediaWiki API by just
> changing the API URL, and all their queries could stay the same.

There is a big conceptual difference between the two APIs, IMHO. The
MediaWiki API can be used to query technically defined things: Link
lists, categories, template usage and so on. A Commons API (mine or
someone elses) parses the content itself for data and relations that
are not technically defined.

One way would be to add some kind of license metadata per page into
the database. This is possible, but rather specific; also, it would
likely mean to create a separate interface just for that.

The better way (IMHO) is to store all used
"page:template:parameter:value" tuples in a wiki in a separate
database table, which could be queried by the MediaWiki API. This has
been suggested time and again by me and others. It would then be much
easier for a third-party API to get the relevant data for a page. The
functionality is part of Semantic Wikimedia, but would actually scale
as a project on its own ;-)

This approach would also aloow for the integration of tools like
TemplateTiger [1] directly into Wikipedia.

Magnus

[1] http://toolserver.org/~kolossos/templatetiger/tt-table4.php?template=Persondata&lang=en&where=&is=

_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l
Reply | Threaded
Open this post in threaded view
|

Re: [Wikitech-l] License information

Magnus Manske-2
In reply to this post by Daniel Kinzler
On Fri, Jan 30, 2009 at 8:24 AM, Daniel Kinzler <[hidden email]> wrote:

> Brianna Laugher schrieb:
>>
>> I agree that it makes a lot of sense. But because it would be a big
>> change, I fear that unless the lead developers show great enthusiasm
>> for the idea, it will take a very long time to be accepted and
>> completed. Whereas building an "add-on" tool can be faster to get to
>> point of functionality.
>
> Guys, before re-inventing several wheels, please look at what we already have.
>
> Please have a look at
> <http://commons.wikimedia.org/wiki/Commons:Tag_categories>, which defines a way
> to make license tags machine readable. Using that scheme, it would be easy to
> build a script on the toolserver that delivers metadata in a machine readable
> form. No need for screen scraping.

Yes there is. Not for the license name (which I get using categories
in my experimental API), but for things like name of author etc. These
are only available as either HTML tag IDs (which I use) or raw
wikitext.

Magnus

_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l
Reply | Threaded
Open this post in threaded view
|

Re: [Wikitech-l] License information

Daniel Kinzler
In reply to this post by Magnus Manske-2
Magnus Manske schrieb:
> The better way (IMHO) is to store all used
> "page:template:parameter:value" tuples in a wiki in a separate
> database table, which could be queried by the MediaWiki API. This has
> been suggested time and again by me and others. It would then be much
> easier for a third-party API to get the relevant data for a page. The
> functionality is part of Semantic Wikimedia, but would actually scale
> as a project on its own ;-)

Indeed. Here'S my take on it: <http://brightbyte.de/page/WikiData_light>.
I have proposed this as a project to the German chapter, maybe it'll actually be
taken on...

> No need for screen scraping.
>
> Yes there is. Not for the license name (which I get using categories
> in my experimental API), but for things like name of author etc. These
> are only available as either HTML tag IDs (which I use) or raw
> wikitext.

Yes you are right, i was only thinking of the meta-info about licenses
themselves. For authorship info, you'd need screen scarping -- or stored
page:template:parameter:value tuples.

I REALLY want that. It would be extremly useful for a LOT of things. And it's
not hard to do. CC-ing mediawiki-l.

-- daniel

_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l