Quantcast

Some help needed

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Some help needed

Hugo Manguinhas
Hi everyone,

I am new to the Commons API and would like to know how to get (in a machine readable way) the metadata found within the Summary section of a page.

In particular, given a File page like this one: https://commons.wikimedia.org/wiki/File:African_Dusky_Nightjar_(Caprimulgus_pectoralis)_(W1CDR0000386_BD28).ogghttps://commons.wikimedia.org/wiki/File:African_Dusky_Nightjar_(Caprimulgus_pectoralis)_(W1CDR0000386_BD28).ogg

I would like to get the "Europeana link" part... it is enough for me to get the data as Wiki markup, but parsing the whole HTML would be too much :S

... btw, is there any way to query for such data? I have been using the API Sandbox (https://en.wikipedia.org/wiki/Special:ApiSandbox ) but could not find a method that could do this...

Your help is really appreciated! Thank you in advance!

Best regards,
Hugo
_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Some help needed

Magnus Manske-2
One option (old, unmaintained code, no support, no warranty, good luck) would be my attempt at parsing this:

On Fri, Nov 25, 2016 at 2:11 PM Hugo Manguinhas <[hidden email]> wrote:
Hi everyone,

I am new to the Commons API and would like to know how to get (in a machine readable way) the metadata found within the Summary section of a page.

In particular, given a File page like this one: https://commons.wikimedia.org/wiki/File:African_Dusky_Nightjar_(Caprimulgus_pectoralis)_(W1CDR0000386_BD28).ogghttps://commons.wikimedia.org/wiki/File:African_Dusky_Nightjar_(Caprimulgus_pectoralis)_(W1CDR0000386_BD28).ogg

I would like to get the "Europeana link" part... it is enough for me to get the data as Wiki markup, but parsing the whole HTML would be too much :S

... btw, is there any way to query for such data? I have been using the API Sandbox (https://en.wikipedia.org/wiki/Special:ApiSandbox ) but could not find a method that could do this...

Your help is really appreciated! Thank you in advance!

Best regards,
Hugo
_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l

_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Some help needed

Gaurav Vaidya
If you know what the external link looks like (does it always start with "http://www.europeana.eu/“?) and the page(s) you’re interested in, you can use ‘extlinks’ to find all external links on a set of pages:

 - https://commons.wikimedia.org/w/api.php?action=query&titles=File:African%20Dusky%20Nightjar%20(Caprimulgus%20pectoralis)%20(W1CDR0000386%20BD28).ogg&prop=extlinks

You can also get a list of every page on the Commons that has a URL containing "europeana.eu/portal/record”, like in Special:Linksearch:

 - https://commons.wikimedia.org/w/api.php?action=query&list=exturlusage&euquery=europeana.eu/portal/record&eulimit=500

I don’t think there’s an API to parse the Information template yet. DBpedia tries to do this (e.g. http://commons.dbpedia.org/page/File:These_three_geese.jpg), but I couldn’t find the file you were interested in on their website.

Hope that helps!

cheers,
Gaurav

> On 25 Nov 2016, at 9:21 AM, Magnus Manske <[hidden email]> wrote:
>
> One option (old, unmaintained code, no support, no warranty, good luck) would be my attempt at parsing this:
> https://tools.wmflabs.org/magnustools/commonsapi.php
>
> On Fri, Nov 25, 2016 at 2:11 PM Hugo Manguinhas <[hidden email]> wrote:
> Hi everyone,
>
> I am new to the Commons API and would like to know how to get (in a machine readable way) the metadata found within the Summary section of a page.
>
> In particular, given a File page like this one: https://commons.wikimedia.org/wiki/File:African_Dusky_Nightjar_(Caprimulgus_pectoralis)_(W1CDR0000386_BD28).ogghttps://commons.wikimedia.org/wiki/File:African_Dusky_Nightjar_(Caprimulgus_pectoralis)_(W1CDR0000386_BD28).ogg
>
> I would like to get the "Europeana link" part... it is enough for me to get the data as Wiki markup, but parsing the whole HTML would be too much :S
>
> ... btw, is there any way to query for such data? I have been using the API Sandbox (https://en.wikipedia.org/wiki/Special:ApiSandbox ) but could not find a method that could do this...
>
> Your help is really appreciated! Thank you in advance!
>
> Best regards,
> Hugo
> _______________________________________________
> Commons-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/commons-l
> _______________________________________________
> Commons-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/commons-l


_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Some help needed

Gergo Tisza
In reply to this post by Hugo Manguinhas
On Fri, Nov 25, 2016 at 6:11 AM, Hugo Manguinhas <[hidden email]> wrote:
In particular, given a File page like this one: https://commons.wikimedia.org/wiki/File:African_Dusky_Nightjar_(Caprimulgus_pectoralis)_(W1CDR0000386_BD28).ogghttps://commons.wikimedia.org/wiki/File:African_Dusky_Nightjar_(Caprimulgus_pectoralis)_(W1CDR0000386_BD28).ogg

I would like to get the "Europeana link" part... it is enough for me to get the data as Wiki markup, but parsing the whole HTML would be too much :S

... btw, is there any way to query for such data? I have been using the API Sandbox (https://en.wikipedia.org/wiki/Special:ApiSandbox ) but could not find a method that could do this...

I don't think it's possible. You can query the main fields of the information table (author, source etc) via prop=imageinfo&iiprop=extmetadata, but that field is just marked as a miscellaneous info field so there isn't really any way to find it.

_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Some help needed

Maarten Dammers
In reply to this post by Hugo Manguinhas
For the other people who are reading this: I also got this question.
Solved this by doing a query on the database, see
https://quarry.wmflabs.org/query/14350

Parsing wikitext is generally messy. Quite a few identifier templates on
Commons (like https://commons.wikimedia.org/wiki/Template:Rijksmonument 
) set a tracker category and use the identifier as the sorting key. This
way it's possible to keep track of what identifier is used on what page
(see https://www.mediawiki.org/wiki/Manual:Categorylinks_table for the
database layout). In this case no tracker category was set so the
externallinks table was used as a fallback (
https://www.mediawiki.org/wiki/Manual:Externallinks_table ).

Maarten


On 25-11-16 15:11, Hugo Manguinhas wrote:

> Hi everyone,
>
> I am new to the Commons API and would like to know how to get (in a machine readable way) the metadata found within the Summary section of a page.
>
> In particular, given a File page like this one: https://commons.wikimedia.org/wiki/File:African_Dusky_Nightjar_(Caprimulgus_pectoralis)_(W1CDR0000386_BD28).ogghttps://commons.wikimedia.org/wiki/File:African_Dusky_Nightjar_(Caprimulgus_pectoralis)_(W1CDR0000386_BD28).ogg
>
> I would like to get the "Europeana link" part... it is enough for me to get the data as Wiki markup, but parsing the whole HTML would be too much :S
>
> ... btw, is there any way to query for such data? I have been using the API Sandbox (https://en.wikipedia.org/wiki/Special:ApiSandbox ) but could not find a method that could do this...
>
> Your help is really appreciated! Thank you in advance!
>
> Best regards,
> Hugo
> _______________________________________________
> Commons-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/commons-l


_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l
Loading...