Making it easier for non-Commons users to describe minimally described images

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Making it easier for non-Commons users to describe minimally described images

Gaurav Vaidya
Hi there!

Over the next year, the Missouri Botanical Gardens plans to identify
and extract illustrations from the BHL's 39.3 million scanned pages as
part of the Art of Life project [1], and then to publish those
illustrations to the Wikimedia Commons [2] (as well as to Flickr [3]
and ArtStor). My colleagues and I have spent the last few months
developed a metadata schema to provide structured information
describing an image -- subjects, "agents" (i.e. publishers, painters,
engravers and writers) and inscriptions. Within the Commons, we've
created a template to handle this structured data, which we call
"Information Art of Life" (based on the ubiquitous Information
template): http://commons.wikimedia.org/wiki/Template:Information_Art_of_Life

Since the BHL doesn't have the resources to comprehensively describe
all the images itself, our plan is for BHL staff members to minimally
describe the illustrations and then to rely on the Commons community
to improve metadata, descriptions and categorization. So when images
are uploaded to the Commons from the BHL, they will have basic
metadata in their "Information Art of Life" templates and basic
categorization, and nothing else. We hope to encourage users of BHL
illustrations (artists, biologists, humanities scholars, library staff
and educators, among others) to take it from there, improving the
metadata, descriptions and categorization on the uploaded images.
However, as many of them would not have much experience with
Wikipedia, we fear that the learning curve in understanding the
Commons' template-based metadata system might turn away potential
contributors.

To make it easier for non-Wikimedians to contribute, we have been
considering developing tools to simplify updating these templates,
such as by creating user scripts [5] to provide a form based interface
to our template; maybe something visually similar to the Index page
form that the ProofreadPage extension creates on Wikisource [6]. Do
such tools already exist for the Commons somewhere? What do you think
would be the easiest way to simplify the ways in which non-Wikimedians
can use the Commons' cataloging system?

Thanks so much for your attention!

cheers,
Gaurav
http://commons.wikimedia.org/wiki/User:Gaurav

[1] http://biodivlib.wikispaces.com/Art+of+Life
[2] http://commons.wikimedia.org/wiki/Category:Files_from_the_Biodiversity_Heritage_Library
[3] http://www.flickr.com/photos/biodivlibrary
[4] Based on an external links search, see:
http://commons.wikimedia.org/w/index.php?title=Special:LinkSearch&limit=500&offset=4050&target=http%3A%2F%2Fwww.biodiversitylibrary.org
[5] http://commons.wikimedia.org/wiki/Commons:User_scripts
[6] An example of an index page form created by the ProofreadPage
extension on Wikisource:
http://en.wikisource.org/w/index.php?title=Index:Field_Notes_of_Junius_Henderson,_Notebook_1.djvu&action=edit

_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l
Reply | Threaded
Open this post in threaded view
|

Re: Making it easier for non-Commons users to describe minimally described images

Federico Leva (Nemo)
What sort of information are you looking for? The few files I checked on
[[Category:Files from the Biodiversity Heritage Library]] (butterflies)
seem to be described in detail (mention in description + category for
each species), is this what you're aiming at?

Managing the information templates seems a nightmare, perhaps you should
aim at categories. HotCat works well enough per se, but you still need
to know the category guidelines (or better, the precise name of the
category). It would be great if the autocompletion could be fixed so
that 1) you don't need to know in advance whether the category you need
is e.g. "Churches of Finland" vs. "Finnish churches", 2) redirects and
soft-redirects are followed, e.g. from plural to singular and viceversa.
If such a feature existed, maybe even files uploaded with the
UploadWizard may at some point have categories.

Nemo

_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l
Reply | Threaded
Open this post in threaded view
|

Re: Making it easier for non-Commons users to describe minimally described images

Gaurav Vaidya
Heya,

A quick note for metadata fans: since my last e-mail, the BHL has
released the first version of the BHL illustration schema for feedback
at http://blog.biodiversitylibrary.org/2012/08/interested-in-improving-access-to.html
-- we'd love your feedback on what information you would like
associated with BHL illustrations which would make it easy for you to
find images you could use on Wikimedia projects, and then to reuse
those images on Wikimedia projects. Do we have adequate copyright
information, for instance? Please have a look at our schema and let us
know!

On 25 August 2012 00:08, Federico Leva (Nemo) <[hidden email]> wrote:
> What sort of information are you looking for? The few files I checked on
> [[Category:Files from the Biodiversity Heritage Library]] (butterflies) seem
> to be described in detail (mention in description + category for each
> species), is this what you're aiming at?
Nemo: SO sorry for the late reply! We're aiming for something like the
metadata on the following Commons images:
 - http://commons.wikimedia.org/wiki/File:Greenwaxbill.jpg
 - http://commons.wikimedia.org/wiki/File:Simonkai.jpg
 - http://commons.wikimedia.org/wiki/File:PasserMoabiticusWolf.jpg
 (other examples available at
http://commons.wikimedia.org/wiki/Template:Information_Art_of_Life/Gallery)

These images have textual descriptions in the {{Information Art of
Life}} template as well as corresponding categories to subjects; we
also use the {{inscription}} and {{Creator}} templates to provide more
information about what is actually in the image. The {{Creator}}
template automatically add the images to the appropriate creator
category.

> Managing the information templates seems a nightmare, perhaps you should aim
> at categories.
I'm hopeful that eventually we'll be able to use software to smoothen
this process: an {{Information Art of Life}} record would be
automatically generated from the basic metadata available at the BHL
when the image is uploaded to the Commons; a script could then
re-extract the metadata via the Mediawiki API or by reading hidden
"span" or "div" tags, for use in moving fully annotated images into
other image repositories, such as ArtStor. Until then, I hope the
Information Art of Life template will provide a way for Commons
editors to structure information about the illustration, especially as
pertains to biological species and other subjects.

One thing that would help would be for more templates which could help
categorize images. I recently wrote the {{Agent}} template (see
http://commons.wikimedia.org/wiki/Template:Agent) which uses
{{#ifexists}} to test for a Creator template for the given creator
name. If the name exists, it incorporates it into the page, adding the
file to the correct creator category in the process. If the name
doesn't exist, it instead creates a red-link to where the Creator page
should be.

> HotCat works well enough per se, but you still need to know
> the category guidelines (or better, the precise name of the category). It
> would be great if the autocompletion could be fixed so that 1) you don't
> need to know in advance whether the category you need is e.g. "Churches of
> Finland" vs. "Finnish churches", 2) redirects and soft-redirects are
> followed, e.g. from plural to singular and viceversa.
> If such a feature existed, maybe even files uploaded with the UploadWizard
> may at some point have categories.
That would be awesome to have! As something completely unrelated to
everything else, has anybody worked on extracting the Commons
categories as a Web Ontology Language (OWL) file? It'll be interesting
to use OWL inferencing to "check" that categorized as organized
consistently, although it would be a *huge* project to work on.

cheers,
Gaurav

_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l
Reply | Threaded
Open this post in threaded view
|

Re: Making it easier for non-Commons users to describe minimally described images

Toby Hudson
> Do we have adequate copyright information, for instance?

I've only looked at one file:
http://commons.wikimedia.org/wiki/File:Greenwaxbill.jpg

And it looks like you could improve the copyright info:

Here the copyright claim is that the author died more than 70 years ago, but there is no illustrator death date listed.  So to verify the claim, we would need to do some research.  So if you have the date of death, and if the book was published outside the US (here it was apparently London, UK), please provide it.

Also, note that the current copyright template says (after a big warning sign): "You must also include a United States public domain tag to indicate why this work is in the public domain in the United States."  In this case you should use http://commons.wikimedia.org/wiki/Template:PD-1923.

Toby / User:99of9

_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l
Reply | Threaded
Open this post in threaded view
|

Re: Making it easier for non-Commons users to describe minimally described images

Gaurav Vaidya
Hi Toby,

On 4 September 2012 22:06, Toby Hudson <[hidden email]> wrote:

>> Do we have adequate copyright information, for instance?
>
> I've only looked at one file:
> http://commons.wikimedia.org/wiki/File:Greenwaxbill.jpg
>
> And it looks like you could improve the copyright info:
>
> Here the copyright claim is that the author died more than 70 years ago, but
> there is no illustrator death date listed.  So to verify the claim, we would
> need to do some research.  So if you have the date of death, and if the book
> was published outside the US (here it was apparently London, UK), please
> provide it.
>
> Also, note that the current copyright template says (after a big warning
> sign): "You must also include a United States public domain tag to indicate
> why this work is in the public domain in the United States."  In this case
> you should use http://commons.wikimedia.org/wiki/Template:PD-1923.
>
> Toby / User:99of9

Ugh, good catch. It looks like
http://commons.wikimedia.org/wiki/File:Greenwaxbill.jpg might not
actually be out of copyright -- it was first published in the UK (not
the US as I thought) in 1899, so it remains in copyright for "70 years
from the end of the calendar year in which the last remaining author
of the work dies" (as per
http://www.copyrightservice.co.uk/copyright/p01_uk_copyright_law). F.
W. Frohawk, the illustrator, died in 1946 as per
http://en.wikipedia.org/wiki/Frederick_William_Frohawk, so none of his
works will enter the public domain until 1946+70+1 = 2017.

I've tagged it for deletion, thanks! We'd still love your feedback on
the other images!

cheers,
Gaurav

_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l
Reply | Threaded
Open this post in threaded view
|

Re: Making it easier for non-Commons users to describe minimally described images

Toby Hudson
Hi Gaurav,

Glad to be of help.  I don't have time to go through all your uploads, but here are some notes on a few random samples:

The general point I made in my other email still applies to some of your files

Here are some not currently tagged with a US PD notice (e.g. PD-1923 or PD-old-100):
http://commons.wikimedia.org/wiki/File:Atlides_halesus_CramerStoll.png
http://commons.wikimedia.org/wiki/File:Scotopelia_peliIbisV001P015AA.jpg

Or in some cases they are not tagged with a PD notice applicable to their country of publication:
http://commons.wikimedia.org/wiki/File:Bassin_Houiller_Du_Gard.jpg
http://commons.wikimedia.org/wiki/File:Die_Gattung_Nepenthes_illustration2.jpg
http://commons.wikimedia.org/wiki/File:Simonkai.jpg

Also, some have a license (implying a copyright claim) imported from Flickr.  Switching to PD would be more accurate
http://commons.wikimedia.org/wiki/File:Aesclepius,_Flora,_Ceres_and_Cupid_honouring_the_Bust_of_Linnaeus.jpg
http://commons.wikimedia.org/wiki/File:Flickr_-_BioDivLibrary_-_n102_w1150.jpg

Best regards, and thanks for the hard work you're doing.
Toby / User:99of9




On Thu, Sep 6, 2012 at 10:52 AM, Gaurav Vaidya <[hidden email]> wrote:
Hi Toby,

On 4 September 2012 22:06, Toby Hudson <[hidden email]> wrote:
>> Do we have adequate copyright information, for instance?
>
> I've only looked at one file:
> http://commons.wikimedia.org/wiki/File:Greenwaxbill.jpg
>
> And it looks like you could improve the copyright info:
>
> Here the copyright claim is that the author died more than 70 years ago, but
> there is no illustrator death date listed.  So to verify the claim, we would
> need to do some research.  So if you have the date of death, and if the book
> was published outside the US (here it was apparently London, UK), please
> provide it.
>
> Also, note that the current copyright template says (after a big warning
> sign): "You must also include a United States public domain tag to indicate
> why this work is in the public domain in the United States."  In this case
> you should use http://commons.wikimedia.org/wiki/Template:PD-1923.
>
> Toby / User:99of9

Ugh, good catch. It looks like
http://commons.wikimedia.org/wiki/File:Greenwaxbill.jpg might not
actually be out of copyright -- it was first published in the UK (not
the US as I thought) in 1899, so it remains in copyright for "70 years
from the end of the calendar year in which the last remaining author
of the work dies" (as per
http://www.copyrightservice.co.uk/copyright/p01_uk_copyright_law). F.
W. Frohawk, the illustrator, died in 1946 as per
http://en.wikipedia.org/wiki/Frederick_William_Frohawk, so none of his
works will enter the public domain until 1946+70+1 = 2017.

I've tagged it for deletion, thanks! We'd still love your feedback on
the other images!

cheers,
Gaurav

_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l


_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l
Reply | Threaded
Open this post in threaded view
|

Re: Making it easier for non-Commons users to describe minimally described images

Gaurav Vaidya
Hi Toby,

On 05-Sep-2012, at 9:23 PM, Toby Hudson wrote:

> Glad to be of help.  I don't have time to go through all your uploads, but here are some notes on a few random samples:
>
> The general point I made in my other email still applies to some of your files
>
> Here are some not currently tagged with a US PD notice (e.g. PD-1923 or PD-old-100):
> http://commons.wikimedia.org/wiki/File:Atlides_halesus_CramerStoll.png
> http://commons.wikimedia.org/wiki/File:Scotopelia_peliIbisV001P015AA.jpg
>
> Or in some cases they are not tagged with a PD notice applicable to their country of publication:
> http://commons.wikimedia.org/wiki/File:Bassin_Houiller_Du_Gard.jpg
> http://commons.wikimedia.org/wiki/File:Die_Gattung_Nepenthes_illustration2.jpg
> http://commons.wikimedia.org/wiki/File:Simonkai.jpg
>
> Also, some have a license (implying a copyright claim) imported from Flickr.  Switching to PD would be more accurate
> http://commons.wikimedia.org/wiki/File:Aesclepius,_Flora,_Ceres_and_Cupid_honouring_the_Bust_of_Linnaeus.jpg
> http://commons.wikimedia.org/wiki/File:Flickr_-_BioDivLibrary_-_n102_w1150.jpg
Thanks so much again for taking the time to check these images out! I've fixed all the images you mentioned, apart from http://commons.wikimedia.org/wiki/File:Simonkai.jpg, which was published in 1910, who authorship is unclear. Hopefully the (Hungarian) journal that image is from can tell us who he or she is; I've added this request to the Hungarian Village Pump on the Commons [1].

There's still some images that the Art of Life project is actively working on which need double-checking (see http://commons.wikimedia.org/wiki/Template:Information_Art_of_Life/Gallery), but the bigger task will be to sort out (1) the hundreds of images which I recently helped bulk-upload into the Commons [2], and (2) sorting out the thousands of BHL images already in the Commons [3], uploaded by different uploaders at different times. Fun!

I've added some information about dealing with UK/EU copyrights amongst BHL images to the BHL project page, emphasizing that both US and EU copyright tags are necessary for content published in the EU (as a lot of BHL's content is). Hopefully, that will help things a bit! It's at: http://commons.wikimedia.org/wiki/Commons:BHL#Copyrights

> Best regards, and thanks for the hard work you're doing.
Thanks for the encouragement -- it's much appreciated! :)

cheers,
Gaurav

[1] http://commons.wikimedia.org/wiki/Commons:Kocsmafal#Help_needed_to_check_authorship_on_a_photograph_from_a_Hungarian_journal
[2] http://commons.wikimedia.org/wiki/Commons:Flickr_batch_uploading/BHL_Art_of_Life_test_images
[3] http://commons.wikimedia.org/w/index.php?title=Special%3ALinkSearch&target=http%3A%2F%2Fwww.biodiversitylibrary.org%2F
_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l