Quantcast

how to query WIkipedia for a list of people who died in a given year

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

how to query WIkipedia for a list of people who died in a given year

Alek Tarkowski-3
Hello everyone,

I've been until now a lurker on this list, let me introduce myself - I'm
a sociologist studying digital technologies, an activist (I run Creative
Commons Poland) and I run a digital think tank / NGO in Poland.

I'm hoping someone on this list might be able to help me: I'm involved
in the celebrations of the Public Domain Day - on the 1st of January
each year works pass into the public domain of authors who've died 70
years ago (at least in Poland, and in most countries, but it might
differ in some jurisdictions).

I'm looking for a good way to determine, who died in 1941 - and thought
that Wikipedia will be a good place to find this out. I know there are
lists of people who died in a given year, but they are not complete. Is
there any way to automatically query Wikipedia for such information? I
know that it's to some extent structured, as this information is
provided in templates for biographical articles, but I don't know
whether there is any mechanism for querying?

Any advice will be much appreciated.

All the best,

Alek

--
dyrektor, Centrum Cyfrowe Projekt: Polska
www: http://centrumcyfrowe.pl
identi.ca / twitter: @centrumcyfrowe

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
FT2
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: how to query WIkipedia for a list of people who died in a given year

FT2
There's a category for that sort of thing:
 
 
Other wikis might have similar categories.
 
FT2

On Fri, Dec 23, 2011 at 12:35 PM, Alek Tarkowski <[hidden email]> wrote:
Hello everyone,

I've been until now a lurker on this list, let me introduce myself - I'm
a sociologist studying digital technologies, an activist (I run Creative
Commons Poland) and I run a digital think tank / NGO in Poland.

I'm hoping someone on this list might be able to help me: I'm involved
in the celebrations of the Public Domain Day - on the 1st of January
each year works pass into the public domain of authors who've died 70
years ago (at least in Poland, and in most countries, but it might
differ in some jurisdictions).

I'm looking for a good way to determine, who died in 1941 - and thought
that Wikipedia will be a good place to find this out. I know there are
lists of people who died in a given year, but they are not complete. Is
there any way to automatically query Wikipedia for such information? I
know that it's to some extent structured, as this information is
provided in templates for biographical articles, but I don't know
whether there is any mechanism for querying?

Any advice will be much appreciated.

All the best,

Alek

--
dyrektor, Centrum Cyfrowe Projekt: Polska
www: http://centrumcyfrowe.pl
identi.ca / twitter: @centrumcyfrowe

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: how to query WIkipedia for a list of people who died in a given year

Jérémie Roquet
In reply to this post by Alek Tarkowski-3
Hi Alek,

2011/12/23 Alek Tarkowski <[hidden email]>:
> I'm looking for a good way to determine, who died in 1941 - and thought
> that Wikipedia will be a good place to find this out. I know there are
> lists of people who died in a given year, but they are not complete. Is
> there any way to automatically query Wikipedia for such information? I
> know that it's to some extent structured, as this information is
> provided in templates for biographical articles, but I don't know
> whether there is any mechanism for querying?

It's likely that most of those people are in a category¹ . Since no
category is ever complete, mixing the content of the category with the
equivalent of the biggest Wikipedia (German, French…²) could help.
Finding the equivalent categories is easy: start at ¹, then use the
interwiki links under “languages” in the left column. Also, don't
forget commons³.

Best regards,

¹ https://en.wikipedia.org/wiki/Category:1941_deaths
² https://www.wikipedia.org/
³ https://commons.wikimedia.org/wiki/Category:1941_deaths

--
Jérémie

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: how to query WIkipedia for a list of people who died in a given year

WereSpielChequers-2
Hi Alek,

Not every language version of Wikipedia has such categories, but at least 80 do. You can find a list at  http://meta.wikimedia.org/wiki/Death_anomalies_table of the eighty or so dead people categories - died in 1941 will be a subcategory of that. Someone with toolserver access could probably extract a list for you of all the people we have minus duplicates across languages. You could file a request for such a report at http://en.wikipedia.org/wiki/Wikipedia_talk:Database_reports

But this will only give you notable people who died in a particular year, if you want a list of people who died in a particular year you are better off looking at genealogy sites, and for 1941 military war grave sites. Tens of millions of people died that year and I doubt that we have even 0.1% of them. 

Hope that helps

WereSpielChequers

2011/12/23 Jérémie Roquet <[hidden email]>
Hi Alek,

2011/12/23 Alek Tarkowski <[hidden email]>:
> I'm looking for a good way to determine, who died in 1941 - and thought
> that Wikipedia will be a good place to find this out. I know there are
> lists of people who died in a given year, but they are not complete. Is
> there any way to automatically query Wikipedia for such information? I
> know that it's to some extent structured, as this information is
> provided in templates for biographical articles, but I don't know
> whether there is any mechanism for querying?

It's likely that most of those people are in a category¹ . Since no
category is ever complete, mixing the content of the category with the
equivalent of the biggest Wikipedia (German, French…²) could help.
Finding the equivalent categories is easy: start at ¹, then use the
interwiki links under “languages” in the left column. Also, don't
forget commons³.

Best regards,

¹ https://en.wikipedia.org/wiki/Category:1941_deaths
² https://www.wikipedia.org/
³ https://commons.wikimedia.org/wiki/Category:1941_deaths

--
Jérémie

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
FT2
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: how to query WIkipedia for a list of people who died in a given year

FT2
I think he's after writers and artists, and wants to identify people whose copyrights may have passed into public domain because they died that year.
 
"Known" authors and artists will be more relevant which should work well. It won't give a complete list of all deaths of anyone who created anything copyrighted, but the more known their works (and hence useful/interesting to know are PD), the more likely we are to have coverage.
 
FT2
 

 
On Fri, Dec 23, 2011 at 2:57 PM, WereSpielChequers <[hidden email]> wrote:
But this will only give you notable people who died in a particular year, if you want a list of people who died in a particular year you are better off looking at genealogy sites, and for 1941 military war grave sites. Tens of millions of people died that year and I doubt that we have even 0.1% of them. 

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: how to query WIkipedia for a list of people who died in a given year

Alek Tarkowski-3
In reply to this post by Jérémie Roquet
Jeremie, FT2,

thank you very much for your advice.

Do you have any idea how complete these lists are? Are they done by
hand, or is there a bot compiling these lists? And in any case, is there
any way to estimate how completely they cover a given category?

All the best,

Alek

--
dyrektor, Centrum Cyfrowe Projekt: Polska
www: http://centrumcyfrowe.pl
identi.ca / twitter: @centrumcyfrowe

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
FT2
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: how to query WIkipedia for a list of people who died in a given year

FT2
Categories are done by hand, at most one could write a bot that looked for infobox or introduction text containing date of birth/death and automatically add the category if it didn't exist, but as a rule it seems that if someone's died then a date of death is usually there and usually so are the categories you'd need.
 
The easiest and most exact way would be a database query, which could look for "born * died * 1941" or just "died * 1941" in the first paragraph, and also that at least one word like wrote / author / poet / painter or {{infobox person}} in the text, or "novelists | writers | painters | authors..." appear in at least one category. That should do exactly what you need but you'll need to find someone to set up and run the query for you.
 
If not, then these other options might help somewhat......
  
(1)  Biographies will often start like this:  NAME (born 18 May 1862, died 17 June 1941, Sweden) was a.....
 
So you could search for articles with the words died 1941 in them. Trouble is there are many reasons an article could have those words. Limiting it to biographical articles might help.  Some search engines allow you to search for pages where the specific words appear close together but Wikipedia's search doesn't have that feature, or not yet. Even so this search does turn up useful results, especially combined with the incategory: operator.  You can also narrow down by adding words that copyright creators are likely to have, such as "author" "playwright", "poet" "artist" etc. Try these searches:
 
born died 1941  (biographies with "died" will usually also have "born", use this to narrow down)
died 1941 author   (not so helpful)
born died 1941 wrote  (adding one "copyright-creator" word seems to work, just. Adding more seems to confuse things)
died 1941 incategory:"Polish writers"  (but doesn't pick up articles nested in subcategories)
 
(2)  Google has proximate word searching and can be told to list content from just one site. All Wikipedia articles are indexed on Google. But it's very limited in what it will show you and can't detect other things needed to narrow it down. Try this in Google search:
 
 
(3)  A third option which will pick up names of articles (but no further details) is this category search tool:
http://toolserver.org/~magnus/catscan_rewrite.php which lets you enter a category and search several layers deep. So nested categories will show up. Try entering "Novelists" under "categories" and "3" under "depth".
 
There may be other ways, such as common terms that only appear in biographies.  Perhaps someone else will have ideas.
 
 
 
FT2
 
 
 
On Fri, Dec 23, 2011 at 5:15 PM, Alek Tarkowski <[hidden email]> wrote:
Jeremie, FT2,

thank you very much for your advice.

Do you have any idea how complete these lists are? Are they done by
hand, or is there a bot compiling these lists? And in any case, is there
any way to estimate how completely they cover a given category?

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: how to query WIkipedia for a list of people who died in a given year

Kim Bruning
On Fri, Dec 23, 2011 at 06:02:37PM +0000, FT2 wrote:
> Categories are done by hand, at most one could write a bot that looked for
> infobox or introduction text containing date of birth/death and
> automatically add the category if it didn't exist, but as a rule it seems
> that if someone's died then a date of death is usually there and usually so
> are the categories you'd need.

This is why we need rollout of some sort of semantic engine. I understand that rolling out the existing SMW "will never happen"
on wikipedia :-(, but IIRC people were working on more lightweight systems?

sincerely,
        Kim Bruning
--

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: how to query WIkipedia for a list of people who died in a given year

WereSpielChequers-2
In reply to this post by Alek Tarkowski-3
Categorisation is done manually, completeness varies from project to project and by topic area in the project. On the English language wikipedia we probably do have most of our novelists categorised as such. Deaths are a very different matter as many of our articles never pick up on the subject's death. However the anomalies that I've found there tend to be among people whose notable careers started and ended in their youth - sportspeople particularly. So I would suggest that a query for "1941 births" and "writers" might well give you what you want.

WSC

On 23 December 2011 17:15, Alek Tarkowski <[hidden email]> wrote:
Jeremie, FT2,

thank you very much for your advice.

Do you have any idea how complete these lists are? Are they done by
hand, or is there a bot compiling these lists? And in any case, is there
any way to estimate how completely they cover a given category?

All the best,

Alek

--
dyrektor, Centrum Cyfrowe Projekt: Polska
www: http://centrumcyfrowe.pl
identi.ca / twitter: @centrumcyfrowe

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: how to query WIkipedia for a list of people who died in a given year

Dan Bolser-3
In reply to this post by Kim Bruning
On 23 December 2011 19:59, Kim Bruning <[hidden email]> wrote:
> On Fri, Dec 23, 2011 at 06:02:37PM +0000, FT2 wrote:
>> Categories are done by hand, at most one could write a bot that looked for
>> infobox or introduction text containing date of birth/death and
>> automatically add the category if it didn't exist, but as a rule it seems
>> that if someone's died then a date of death is usually there and usually so
>> are the categories you'd need.
>
> This is why we need rollout of some sort of semantic engine. I understand that rolling out the existing SMW "will never happen"

Why d'you say that so categorically?

Theoretically, I don't see "why this shouldn't happen at some point".



> on wikipedia :-(, but IIRC people were working on more lightweight systems?
>
> sincerely,
>        Kim Bruning
> --
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: how to query WIkipedia for a list of people who died in a given year

Sebastian Hellmann
In reply to this post by Alek Tarkowski-3
Dear Alek, dear list,
exactly for this use case DBpedia (http://dbpedia.org ) was created, so
you can query Wikipedia like a database. DBpedia already does the
rollout to a "semantic engine", which you can query.
Below I drafted some queries. These will give you all the Persons in
Wikipedia that have a "deathDate". Totally there are 187739, which
should be the most complete list you will find.
Then the queries is refined to all persons, which died in 1941 (yielding
1318 persons), then all artists that died in 1941 and then all artists
and their works!

Note that there is a static database which uses the latest dump:
http://dbpedia.org/sparql
http://dbpedia.org/snorql
as well as a live version, which is synchronized directly (each edit is
loaded into the engine)
http://live.dbpedia.org/sparql
Also for some of the other Wikipedias besides the English one, language
specific versions exist:
Polish: http://pl.dbpedia.org/
German: http://de.dbpedia.org
Greek: http://el.dbpedia.org

DBpedia has quite a large community, I would estimate that over 1000
volunteers from the area of computer science and Semantic Web  worked on
or with it since 2006. (This does not account for industry partners or a
like) .

@Alek I drafted some queries for you. There are a total of 5 result
formats to choose from. Maybe json, plain or html are the one you are
looking for. Here is a  link to some user interfaces:
http://wiki.dbpedia.org/OnlineAccess
http://wiki.dbpedia.org/Applications

Feel free to improve the data directly in Wikipedia (and use the live
endpoint 5 minutes later) or tailor the data how you like it at mappings
wiki: http://mappings.dbpedia.org  .  Actuallly the information
contained could help to clean up the infoboxes, which would also help
the start of WikiData.
Here is one hook though. The more precise the queries get, the worse
recall will be, as minor errors in the data add up with each constraint.

Hope I could help,
Sebastian

Queries below:
*******************
A count of all persons that have a deathDate [1]:
SELECT count (*) WHERE {
?person <http://dbpedia.org/ontology/deathDate> ?deathDate .
?person <http://xmlns.com/foaf/0.1/page> ?page .
}


All persons that died in 1941. Note that on http:dbpedia.org there is a
given limit of 1000, so you need to use OFFSET:

SELECT * WHERE {
?person <http://dbpedia.org/ontology/deathDate> ?deathDate .
?person <http://xmlns.com/foaf/0.1/page> ?page .
FILTER(?deathDate >= "1941-01-01"^^xsd:date)
FILTER(?deathDate <= "1942-01-01"^^xsd:date)
} order by ?deathDate
Limit 1000
OFFSET 0

SELECT * WHERE {
?person <http://dbpedia.org/ontology/deathDate> ?deathDate .
?person <http://xmlns.com/foaf/0.1/page> ?page .
FILTER(?deathDate >= "1941-01-01"^^xsd:date)
FILTER(?deathDate <= "1942-01-01"^^xsd:date)
} order by ?deathDate
Limit 1000
OFFSET 1000

All artists that died in 1941 [3]
SELECT * WHERE {
?person <http://dbpedia.org/ontology/deathDate> ?deathDate .
?person <http://xmlns.com/foaf/0.1/page> ?page .
?person rdf:type <http://dbpedia.org/ontology/Artist>
FILTER(?deathDate >= "1941-01-01"^^xsd:date)
FILTER(?deathDate <= "1942-01-01"^^xsd:date)
}

All artists and their work[4]:
SELECT * WHERE {
?person <http://dbpedia.org/ontology/deathDate> ?deathDate .
?person rdf:type <http://dbpedia.org/ontology/Artist> .
?person <http://xmlns.com/foaf/0.1/page> ?page .

OPTIONAL {
   ?person ?works ?work .
   FILTER (?works in (<http://dbpedia.org/property/works>,
<http://dbpedia.org/property/notableworks>,
<http://dbpedia.org/property/writer>) )
}
FILTER(?deathDate >= "1941-01-01"^^xsd:date) .
FILTER(?deathDate <= "1942-01-01"^^xsd:date) .
}



[1]
http://dbpedia.org/snorql/?query=SELECT+count+%28*%29+WHERE+{%0D%0A%3Fperson+%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2FdeathDate%3E+%3FdeathDate+.%0D%0A%3Fperson+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2Fpage%3E+%3Fpage+.%0D%0A}

[2]
http://dbpedia.org/snorql/?query=SELECT+*+WHERE+{%0D%0A%3Fperson+%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2FdeathDate%3E+%3FdeathDate+.%0D%0A%3Fperson+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2Fpage%3E+%3Fpage+.%0D%0AFILTER%28%3FdeathDate+%3E%3D+%221941-01-01%22^^xsd%3Adate%29+%0D%0AFILTER%28%3FdeathDate+%3C%3D+%221942-01-01%22^^xsd%3Adate%29+%0D%0A}+order+by+%3FdeathDate%0D%0ALimit+1000+%0D%0AOFFSET+0

[3]
http://dbpedia.org/snorql/?query=SELECT+*+WHERE+{%0D%0A%3Fperson+%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2FdeathDate%3E+%3FdeathDate+.%0D%0A%3Fperson+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2Fpage%3E+%3Fpage+.%0D%0A%3Fperson+rdf%3Atype+%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2FArtist%3E+%0D%0AFILTER%28%3FdeathDate+%3E%3D+%221941-01-01%22^^xsd%3Adate%29+%0D%0AFILTER%28%3FdeathDate+%3C%3D+%221942-01-01%22^^xsd%3Adate%29+%0D%0A}+

[4]
http://dbpedia.org/snorql/?query=SELECT+*+WHERE+{%0D%0A%3Fperson+%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2FdeathDate%3E+%3FdeathDate+.%0D%0A%3Fperson+rdf%3Atype+%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2FArtist%3E+.%0D%0A%3Fperson+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2Fpage%3E+%3Fpage+.%0D%0A%0D%0AOPTIONAL+{%0D%0A++%3Fperson+%3Fworks+%3Fwork+.%0D%0A++FILTER+%28%3Fworks+in+%28%3Chttp%3A%2F%2Fdbpedia.org%2Fproperty%2Fworks%3E%2C+%3Chttp%3A%2F%2Fdbpedia.org%2Fproperty%2Fnotableworks%3E%2C+%3Chttp%3A%2F%2Fdbpedia.org%2Fproperty%2Fwriter%3E%29+%29%0D%0A}%0D%0AFILTER%28%3FdeathDate+%3E%3D+%221941-01-01%22^^xsd%3Adate%29+.%0D%0AFILTER%28%3FdeathDate+%3C%3D+%221942-01-01%22^^xsd%3Adate%29+.%0D%0A}+%0D%0A


On 12/23/2011 01:35 PM, Alek Tarkowski wrote:

> Hello everyone,
>
> I've been until now a lurker on this list, let me introduce myself - I'm
> a sociologist studying digital technologies, an activist (I run Creative
> Commons Poland) and I run a digital think tank / NGO in Poland.
>
> I'm hoping someone on this list might be able to help me: I'm involved
> in the celebrations of the Public Domain Day - on the 1st of January
> each year works pass into the public domain of authors who've died 70
> years ago (at least in Poland, and in most countries, but it might
> differ in some jurisdictions).
>
> I'm looking for a good way to determine, who died in 1941 - and thought
> that Wikipedia will be a good place to find this out. I know there are
> lists of people who died in a given year, but they are not complete. Is
> there any way to automatically query Wikipedia for such information? I
> know that it's to some extent structured, as this information is
> provided in templates for biographical articles, but I don't know
> whether there is any mechanism for querying?
>
> Any advice will be much appreciated.
>
> All the best,
>
> Alek
>


--
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Projects:http://nlp2rdf.org  ,http://dbpedia.org
Homepage:http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group:http://aksw.org


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: how to query WIkipedia for a list of people who died in a given year

Piotr Konieczny-4
In reply to this post by FT2
On 12/23/2011 7:02 PM, FT2 wrote:

 
(3)  A third option which will pick up names of articles (but no further details) is this category search tool:
http://toolserver.org/~magnus/catscan_rewrite.php which lets you enter a category and search several layers deep. So nested categories will show up. Try entering "Novelists" under "categories" and "3" under "depth".

The new CatScan is (IMHO) very unfriendly, you may find the old one more usable. Main page for all versions: http://en.wikipedia.org/wiki/Wikipedia:CatScan

Either is a nice tool to get a list of content creators who died in a given year.

I wonder if there would be a way to set up a ping that would let you know whenever a biography has the year of death added or is written. For new articles, you could consider some form of new article report. This is currently done by TedderBot, see how it works in practice for our http://en.wikipedia.org/wiki/Wikipedia:POLAND#New_articles_announcements

I am not very familiar with the semantic side of things, it would be nice, and perhaps http://en.wikipedia.org/wiki/Template:Persondata could be of use.

-- 
Piotr Konieczny
PhD Candidate
Dept of Sociology
Uni of Pittsburgh

http://pittsburgh.academia.edu/PiotrKonieczny/
http://en.wikipedia.org/wiki/User:Piotrus

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Loading...