[ANN] Experimental Wikimedia Commons RDF extraction with DBpedia

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[ANN] Experimental Wikimedia Commons RDF extraction with DBpedia

Gaurav Vaidya
Hi everybody,

We are happy to announce an experimental RDF dump of the Wikimedia Commons. A complete first draft is now available online at http://nl.dbpedia.org/downloads/commonswiki/20140705/, and will be eventually accesible from http://commons.dbpedia.org. A small sample dataset, which may be easier to browse, is available on Github at https://github.com/gaurav/commons-extraction/tree/master/commonswiki/20140101

The following datasets showcases some of the improvements that we’ve been working on over the last two months:
 - File information (*-file-information.*) is a completely new dataset that contains information on the files in the Commons, including file and thumbnail URLs, file extensions, file type classes and MIME types.
 - DBpedia’s Mappings Extractor (*-mappingbased-properties.*) uses templates stored on the Mapping server (http://mappings.dbpedia.org/) to create RDF for information-rich templates. This system still has some important limitations, such as not being able to process process embedded templates (e.g. license templates inside {{Information}}), but top-level templates are completely configurable. The existing mappings are available at http://mappings.dbpedia.org/index.php/Mapping_commons
 - This includes 363 license templates that indicate licensing for Commons files under public domain, Creative Commons and other open access licenses. These were created by bots and still require verification before use. They are listed at http://mappings.dbpedia.org/index.php/Category:Commons_media_license
 - The DBpedia Geoextractor (*-geo-coordinates.*) now extracts geographical coordinates from Commons files using the {{Location}} template.
 - The DBpedia SKOS Extractor (*-skos-categories.*) now identifies relationships between Commons categories, building a SKOS-based description of the entire Commons category tree.

Please have a look and let us know what you think. We’ll be working on a number of open tasks over the next three weeks, listed at https://github.com/gaurav/extraction-framework/issues?state=open -- if you see something wrong with what we’ve done above, or have an issue you’d particularly like us to tackle, please report it there or drop me an e-mail!

This work is sponsored by the Google Summer of Code program
(https://www.google-melange.com/gsoc/project/details/google/gsoc2014/gaurav/5676830073815040).

Thanks!

cheers,
The DBpedia Commons extraction team:
Gaurav Vaidya
Dimitris Kontokostas
Andrea Di Menna
Jimmy O’Regan
_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l
Reply | Threaded
Open this post in threaded view
|

Re: [ANN] Experimental Wikimedia Commons RDF extraction with DBpedia

Luis Villa

On Tue, Jul 29, 2014 at 7:32 PM, Gaurav Vaidya <[hidden email]> wrote:
 - This includes 363 license templates that indicate licensing for Commons files under public domain, Creative Commons and other open access licenses. These were created by bots and still require verification before use. They are listed at http://mappings.dbpedia.org/index.php/Category:Commons_media_license

Interesting! Is there documentation somewhere on how you ended up with those particular 363 licenses? Failing that, a pointer at the relevant code would be welcome :) 

Thanks-
Luis


--
Luis Villa
Deputy General Counsel
Wikimedia Foundation
415.839.6885 ext. 6810

This message may be confidential or legally privileged. If you have received it by accident, please delete it and let us know about the mistake. As an attorney for the Wikimedia Foundation, for legal/ethical reasons I cannot give legal advice to, or serve as a lawyer for, community members, volunteers, or staff members in their personal capacity. For more on what this means, please see our legal disclaimer.

_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l
Reply | Threaded
Open this post in threaded view
|

Fwd: [Wikidata-l] [ANN] Experimental Wikimedia Commons RDF extraction with DBpedia

Dimitris Kontokostas-2

On Wed, Jul 30, 2014 at 9:11 AM, Luis Villa <[hidden email]> wrote:

On Tue, Jul 29, 2014 at 7:32 PM, Gaurav Vaidya <[hidden email]> wrote:
 - This includes 363 license templates that indicate licensing for Commons files under public domain, Creative Commons and other open access licenses. These were created by bots and still require verification before use. They are listed at http://mappings.dbpedia.org/index.php/Category:Commons_media_license

Interesting!

Good to hear that :)
 
Is there documentation somewhere on how you ended up with those particular 363 licenses? Failing that, a pointer at the relevant code would be welcome :) 

This involved some manual work to gather the related templates and a bot to import them in the DBpedia mappings wiki. See the following links for details


The way we designed it with Gaurav there is no need to code anything to change an existing licence mapping or add a new one
you just need to request editor rights for the DBpedia mappings wiki (http://mappings.dbpedia.org)
Hard-coding this into code could probably give us more fine-grained control but it would be much harder to adjust.

Best,
Dimitris

Thanks-
Luis


--
Luis Villa
Deputy General Counsel
Wikimedia Foundation
415.839.6885 ext. 6810

This message may be confidential or legally privileged. If you have received it by accident, please delete it and let us know about the mistake. As an attorney for the Wikimedia Foundation, for legal/ethical reasons I cannot give legal advice to, or serve as a lawyer for, community members, volunteers, or staff members in their personal capacity. For more on what this means, please see our legal disclaimer.

_______________________________________________
Wikidata-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikidata-l




--
Kontokostas Dimitris



--
Kontokostas Dimitris

_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l