|
Hello everyone,
I've been until now a lurker on this list, let me introduce myself - I'm a sociologist studying digital technologies, an activist (I run Creative Commons Poland) and I run a digital think tank / NGO in Poland. I'm hoping someone on this list might be able to help me: I'm involved in the celebrations of the Public Domain Day - on the 1st of January each year works pass into the public domain of authors who've died 70 years ago (at least in Poland, and in most countries, but it might differ in some jurisdictions). I'm looking for a good way to determine, who died in 1941 - and thought that Wikipedia will be a good place to find this out. I know there are lists of people who died in a given year, but they are not complete. Is there any way to automatically query Wikipedia for such information? I know that it's to some extent structured, as this information is provided in templates for biographical articles, but I don't know whether there is any mechanism for querying? Any advice will be much appreciated. All the best, Alek -- dyrektor, Centrum Cyfrowe Projekt: Polska www: http://centrumcyfrowe.pl identi.ca / twitter: @centrumcyfrowe _______________________________________________ Wiki-research-l mailing list [hidden email] https://lists.wikimedia.org/mailman/listinfo/wiki-research-l |
|
There's a category for that sort of thing: Other wikis might have similar categories.
FT2 On Fri, Dec 23, 2011 at 12:35 PM, Alek Tarkowski <[hidden email]> wrote: Hello everyone, _______________________________________________ Wiki-research-l mailing list [hidden email] https://lists.wikimedia.org/mailman/listinfo/wiki-research-l |
|
In reply to this post by Alek Tarkowski-3
Hi Alek,
2011/12/23 Alek Tarkowski <[hidden email]>: > I'm looking for a good way to determine, who died in 1941 - and thought > that Wikipedia will be a good place to find this out. I know there are > lists of people who died in a given year, but they are not complete. Is > there any way to automatically query Wikipedia for such information? I > know that it's to some extent structured, as this information is > provided in templates for biographical articles, but I don't know > whether there is any mechanism for querying? It's likely that most of those people are in a category¹ . Since no category is ever complete, mixing the content of the category with the equivalent of the biggest Wikipedia (German, French…²) could help. Finding the equivalent categories is easy: start at ¹, then use the interwiki links under “languages” in the left column. Also, don't forget commons³. Best regards, ¹ https://en.wikipedia.org/wiki/Category:1941_deaths ² https://www.wikipedia.org/ ³ https://commons.wikimedia.org/wiki/Category:1941_deaths -- Jérémie _______________________________________________ Wiki-research-l mailing list [hidden email] https://lists.wikimedia.org/mailman/listinfo/wiki-research-l |
|
Hi Alek,
Not every language version of Wikipedia has such categories, but at least 80 do. You can find a list at
http://meta.wikimedia.org/wiki/Death_anomalies_table of the eighty or so dead people categories - died in 1941 will be a subcategory of that. Someone with toolserver access could probably extract a list for you of all the people we have minus duplicates across languages. You could file a request for such a report at http://en.wikipedia.org/wiki/Wikipedia_talk:Database_reports
But this will only give you notable people who died in a particular year, if you want a list of people who died in a particular year you are better off looking at genealogy sites, and for 1941 military war grave sites. Tens of millions of people died that year and I doubt that we have even 0.1% of them.
Hope that helps WereSpielChequers
2011/12/23 Jérémie Roquet <[hidden email]> Hi Alek, _______________________________________________ Wiki-research-l mailing list [hidden email] https://lists.wikimedia.org/mailman/listinfo/wiki-research-l |
|
I think he's after writers and artists, and wants to identify people whose copyrights may have passed into public domain because they died that year. "Known" authors and artists will be more relevant which should work well. It won't give a complete list of all deaths of anyone who created anything copyrighted, but the more known their works (and hence useful/interesting to know are PD), the more likely we are to have coverage.
FT2 On Fri, Dec 23, 2011 at 2:57 PM, WereSpielChequers <[hidden email]> wrote:
_______________________________________________ Wiki-research-l mailing list [hidden email] https://lists.wikimedia.org/mailman/listinfo/wiki-research-l |
|
In reply to this post by Jérémie Roquet
Jeremie, FT2,
thank you very much for your advice. Do you have any idea how complete these lists are? Are they done by hand, or is there a bot compiling these lists? And in any case, is there any way to estimate how completely they cover a given category? All the best, Alek -- dyrektor, Centrum Cyfrowe Projekt: Polska www: http://centrumcyfrowe.pl identi.ca / twitter: @centrumcyfrowe _______________________________________________ Wiki-research-l mailing list [hidden email] https://lists.wikimedia.org/mailman/listinfo/wiki-research-l |
|
Categories are done by hand, at most one could write a bot that looked for infobox or introduction text containing date of birth/death and automatically add the category if it didn't exist, but as a rule it seems that if someone's died then a date of death is usually there and usually so are the categories you'd need.
The easiest and most exact way would be a database query, which could look for "born * died * 1941" or just "died * 1941" in the first paragraph, and also that at least one word like wrote / author / poet / painter or {{infobox person}} in the text, or "novelists | writers | painters | authors..." appear in at least one category. That should do exactly what you need but you'll need to find someone to set up and run the query for you.
If not, then these other options might help somewhat...... (1) Biographies will often start like this: NAME (born 18 May 1862, died 17 June 1941, Sweden) was a.....
So you could search for articles with the words died 1941 in them. Trouble is there are many reasons an article could have those words. Limiting it to biographical articles might help. Some search engines allow you to search for pages where the specific words appear close together but Wikipedia's search doesn't have that feature, or not yet. Even so this search does turn up useful results, especially combined with the incategory: operator. You can also narrow down by adding words that copyright creators are likely to have, such as "author" "playwright", "poet" "artist" etc. Try these searches:
born died 1941 (biographies with "died" will usually also have "born", use this to narrow down)
died 1941 author (not so helpful) born died 1941 wrote (adding one "copyright-creator" word seems to work, just. Adding more seems to confuse things)
died 1941 incategory:"Polish writers" (but doesn't pick up articles nested in subcategories)
(2) Google has proximate word searching and can be told to list content from just one site. All Wikipedia articles are indexed on Google. But it's very limited in what it will show you and can't detect other things needed to narrow it down. Try this in Google search:
(3) A third option which will pick up names of articles (but no further details) is this category search tool:
http://toolserver.org/~magnus/catscan_rewrite.php which lets you enter a category and search several layers deep. So nested categories will show up. Try entering "Novelists" under "categories" and "3" under "depth". There may be other ways, such as common terms that only appear in biographies. Perhaps someone else will have ideas. FT2 On Fri, Dec 23, 2011 at 5:15 PM, Alek Tarkowski <[hidden email]> wrote:
Jeremie, FT2, _______________________________________________ Wiki-research-l mailing list [hidden email] https://lists.wikimedia.org/mailman/listinfo/wiki-research-l |
|
On Fri, Dec 23, 2011 at 06:02:37PM +0000, FT2 wrote:
> Categories are done by hand, at most one could write a bot that looked for > infobox or introduction text containing date of birth/death and > automatically add the category if it didn't exist, but as a rule it seems > that if someone's died then a date of death is usually there and usually so > are the categories you'd need. This is why we need rollout of some sort of semantic engine. I understand that rolling out the existing SMW "will never happen" on wikipedia :-(, but IIRC people were working on more lightweight systems? sincerely, Kim Bruning -- _______________________________________________ Wiki-research-l mailing list [hidden email] https://lists.wikimedia.org/mailman/listinfo/wiki-research-l |
|
In reply to this post by Alek Tarkowski-3
Categorisation is done manually, completeness varies from project to project and by topic area in the project. On the English language wikipedia we probably do have most of our novelists categorised as such. Deaths are a very different matter as many of our articles never pick up on the subject's death. However the anomalies that I've found there tend to be among people whose notable careers started and ended in their youth - sportspeople particularly. So I would suggest that a query for "1941 births" and "writers" might well give you what you want.
WSC
On 23 December 2011 17:15, Alek Tarkowski <[hidden email]> wrote: Jeremie, FT2, _______________________________________________ Wiki-research-l mailing list [hidden email] https://lists.wikimedia.org/mailman/listinfo/wiki-research-l |
|
In reply to this post by Kim Bruning
On 23 December 2011 19:59, Kim Bruning <[hidden email]> wrote:
> On Fri, Dec 23, 2011 at 06:02:37PM +0000, FT2 wrote: >> Categories are done by hand, at most one could write a bot that looked for >> infobox or introduction text containing date of birth/death and >> automatically add the category if it didn't exist, but as a rule it seems >> that if someone's died then a date of death is usually there and usually so >> are the categories you'd need. > > This is why we need rollout of some sort of semantic engine. I understand that rolling out the existing SMW "will never happen" Why d'you say that so categorically? Theoretically, I don't see "why this shouldn't happen at some point". > on wikipedia :-(, but IIRC people were working on more lightweight systems? > > sincerely, > Kim Bruning > -- > > _______________________________________________ > Wiki-research-l mailing list > [hidden email] > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l _______________________________________________ Wiki-research-l mailing list [hidden email] https://lists.wikimedia.org/mailman/listinfo/wiki-research-l |
|
In reply to this post by Alek Tarkowski-3
Dear Alek, dear list,
exactly for this use case DBpedia (http://dbpedia.org ) was created, so you can query Wikipedia like a database. DBpedia already does the rollout to a "semantic engine", which you can query. Below I drafted some queries. These will give you all the Persons in Wikipedia that have a "deathDate". Totally there are 187739, which should be the most complete list you will find. Then the queries is refined to all persons, which died in 1941 (yielding 1318 persons), then all artists that died in 1941 and then all artists and their works! Note that there is a static database which uses the latest dump: http://dbpedia.org/sparql http://dbpedia.org/snorql as well as a live version, which is synchronized directly (each edit is loaded into the engine) http://live.dbpedia.org/sparql Also for some of the other Wikipedias besides the English one, language specific versions exist: Polish: http://pl.dbpedia.org/ German: http://de.dbpedia.org Greek: http://el.dbpedia.org DBpedia has quite a large community, I would estimate that over 1000 volunteers from the area of computer science and Semantic Web worked on or with it since 2006. (This does not account for industry partners or a like) . @Alek I drafted some queries for you. There are a total of 5 result formats to choose from. Maybe json, plain or html are the one you are looking for. Here is a link to some user interfaces: http://wiki.dbpedia.org/OnlineAccess http://wiki.dbpedia.org/Applications Feel free to improve the data directly in Wikipedia (and use the live endpoint 5 minutes later) or tailor the data how you like it at mappings wiki: http://mappings.dbpedia.org . Actuallly the information contained could help to clean up the infoboxes, which would also help the start of WikiData. Here is one hook though. The more precise the queries get, the worse recall will be, as minor errors in the data add up with each constraint. Hope I could help, Sebastian Queries below: ******************* A count of all persons that have a deathDate [1]: SELECT count (*) WHERE { ?person <http://dbpedia.org/ontology/deathDate> ?deathDate . ?person <http://xmlns.com/foaf/0.1/page> ?page . } All persons that died in 1941. Note that on http:dbpedia.org there is a given limit of 1000, so you need to use OFFSET: SELECT * WHERE { ?person <http://dbpedia.org/ontology/deathDate> ?deathDate . ?person <http://xmlns.com/foaf/0.1/page> ?page . FILTER(?deathDate >= "1941-01-01"^^xsd:date) FILTER(?deathDate <= "1942-01-01"^^xsd:date) } order by ?deathDate Limit 1000 OFFSET 0 SELECT * WHERE { ?person <http://dbpedia.org/ontology/deathDate> ?deathDate . ?person <http://xmlns.com/foaf/0.1/page> ?page . FILTER(?deathDate >= "1941-01-01"^^xsd:date) FILTER(?deathDate <= "1942-01-01"^^xsd:date) } order by ?deathDate Limit 1000 OFFSET 1000 All artists that died in 1941 [3] SELECT * WHERE { ?person <http://dbpedia.org/ontology/deathDate> ?deathDate . ?person <http://xmlns.com/foaf/0.1/page> ?page . ?person rdf:type <http://dbpedia.org/ontology/Artist> FILTER(?deathDate >= "1941-01-01"^^xsd:date) FILTER(?deathDate <= "1942-01-01"^^xsd:date) } All artists and their work[4]: SELECT * WHERE { ?person <http://dbpedia.org/ontology/deathDate> ?deathDate . ?person rdf:type <http://dbpedia.org/ontology/Artist> . ?person <http://xmlns.com/foaf/0.1/page> ?page . OPTIONAL { ?person ?works ?work . FILTER (?works in (<http://dbpedia.org/property/works>, <http://dbpedia.org/property/notableworks>, <http://dbpedia.org/property/writer>) ) } FILTER(?deathDate >= "1941-01-01"^^xsd:date) . FILTER(?deathDate <= "1942-01-01"^^xsd:date) . } [1] http://dbpedia.org/snorql/?query=SELECT+count+%28*%29+WHERE+{%0D%0A%3Fperson+%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2FdeathDate%3E+%3FdeathDate+.%0D%0A%3Fperson+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2Fpage%3E+%3Fpage+.%0D%0A} [2] http://dbpedia.org/snorql/?query=SELECT+*+WHERE+{%0D%0A%3Fperson+%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2FdeathDate%3E+%3FdeathDate+.%0D%0A%3Fperson+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2Fpage%3E+%3Fpage+.%0D%0AFILTER%28%3FdeathDate+%3E%3D+%221941-01-01%22^^xsd%3Adate%29+%0D%0AFILTER%28%3FdeathDate+%3C%3D+%221942-01-01%22^^xsd%3Adate%29+%0D%0A}+order+by+%3FdeathDate%0D%0ALimit+1000+%0D%0AOFFSET+0 [3] http://dbpedia.org/snorql/?query=SELECT+*+WHERE+{%0D%0A%3Fperson+%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2FdeathDate%3E+%3FdeathDate+.%0D%0A%3Fperson+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2Fpage%3E+%3Fpage+.%0D%0A%3Fperson+rdf%3Atype+%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2FArtist%3E+%0D%0AFILTER%28%3FdeathDate+%3E%3D+%221941-01-01%22^^xsd%3Adate%29+%0D%0AFILTER%28%3FdeathDate+%3C%3D+%221942-01-01%22^^xsd%3Adate%29+%0D%0A}+ [4] http://dbpedia.org/snorql/?query=SELECT+*+WHERE+{%0D%0A%3Fperson+%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2FdeathDate%3E+%3FdeathDate+.%0D%0A%3Fperson+rdf%3Atype+%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2FArtist%3E+.%0D%0A%3Fperson+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2Fpage%3E+%3Fpage+.%0D%0A%0D%0AOPTIONAL+{%0D%0A++%3Fperson+%3Fworks+%3Fwork+.%0D%0A++FILTER+%28%3Fworks+in+%28%3Chttp%3A%2F%2Fdbpedia.org%2Fproperty%2Fworks%3E%2C+%3Chttp%3A%2F%2Fdbpedia.org%2Fproperty%2Fnotableworks%3E%2C+%3Chttp%3A%2F%2Fdbpedia.org%2Fproperty%2Fwriter%3E%29+%29%0D%0A}%0D%0AFILTER%28%3FdeathDate+%3E%3D+%221941-01-01%22^^xsd%3Adate%29+.%0D%0AFILTER%28%3FdeathDate+%3C%3D+%221942-01-01%22^^xsd%3Adate%29+.%0D%0A}+%0D%0A On 12/23/2011 01:35 PM, Alek Tarkowski wrote: > Hello everyone, > > I've been until now a lurker on this list, let me introduce myself - I'm > a sociologist studying digital technologies, an activist (I run Creative > Commons Poland) and I run a digital think tank / NGO in Poland. > > I'm hoping someone on this list might be able to help me: I'm involved > in the celebrations of the Public Domain Day - on the 1st of January > each year works pass into the public domain of authors who've died 70 > years ago (at least in Poland, and in most countries, but it might > differ in some jurisdictions). > > I'm looking for a good way to determine, who died in 1941 - and thought > that Wikipedia will be a good place to find this out. I know there are > lists of people who died in a given year, but they are not complete. Is > there any way to automatically query Wikipedia for such information? I > know that it's to some extent structured, as this information is > provided in templates for biographical articles, but I don't know > whether there is any mechanism for querying? > > Any advice will be much appreciated. > > All the best, > > Alek > -- Dipl. Inf. Sebastian Hellmann Department of Computer Science, University of Leipzig Projects:http://nlp2rdf.org ,http://dbpedia.org Homepage:http://bis.informatik.uni-leipzig.de/SebastianHellmann Research Group:http://aksw.org _______________________________________________ Wiki-research-l mailing list [hidden email] https://lists.wikimedia.org/mailman/listinfo/wiki-research-l |
|
In reply to this post by FT2
On 12/23/2011 7:02 PM, FT2 wrote:
The new CatScan is (IMHO) very unfriendly, you may find the old one more usable. Main page for all versions: http://en.wikipedia.org/wiki/Wikipedia:CatScan Either is a nice tool to get a list of content creators who died in a given year. I wonder if there would be a way to set up a ping that would let you know whenever a biography has the year of death added or is written. For new articles, you could consider some form of new article report. This is currently done by TedderBot, see how it works in practice for our http://en.wikipedia.org/wiki/Wikipedia:POLAND#New_articles_announcements I am not very familiar with the semantic side of things, it would be nice, and perhaps http://en.wikipedia.org/wiki/Template:Persondata could be of use. -- Piotr Konieczny PhD Candidate Dept of Sociology Uni of Pittsburgh http://pittsburgh.academia.edu/PiotrKonieczny/ http://en.wikipedia.org/wiki/User:Piotrus _______________________________________________ Wiki-research-l mailing list [hidden email] https://lists.wikimedia.org/mailman/listinfo/wiki-research-l |
| Powered by Nabble | Edit this page |
