WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

classic Classic list List threaded Threaded
51 messages Options
123
Reply | Threaded
Open this post in threaded view
|

WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

Brian J Mingus
I have been working with Sam and others for some time now on brainstorming a proposal for the Foundation to create a centralized wiki of citations, a WikiCite so to speak, if that is not the eventual name. My plan is to continue to discuss with folks who are knowledgeable and interested in such a project and to have the feedback I receive go into the proposal which I hope to write this summer. The proposal white paper will then be sent around to interested parties for corrections and feedback, including on-wiki and mailing lists, before eventually landing at the Foundation officially. As we know WMF has not started a new project in some years, so there is no official process. Thus I find it important to get it right.

The basic idea is a centralized wiki that contains citation information that other MediaWikis and WMF projects can then reference using something like a {{cite}} template or a simple link. The community can document the citation, the author, the book etc.. and, in one idealization, all citations across all wikis would point to the same article on WikiCite. Users can use this wiki as their personal bibliography as well, as collections of citations can be exported in arbitrary citation formats. This general plan would allow community aggregation of metadata and community documentation of sources along arbitrary dimensions (quality, trust, reliability, etc.). The hope is that such a resource would then expand on that wiki and across the projects into summarizations of collections of sources (lit reviews) that make navigating entire fields of literature easier and more reliable, getting you out of the trap of not being aware of the global context that a particular source sits in. 

To give all a more concrete view, here is an example from some software that I have implemented in our lab called WikiPapers. Please take note that while this is a scientific literature example, the idea is general to *all publications ever*. Also, while I have implemented a feature-full version of a WikiCite, it's important to point out that for the WMF project we will need a new extension that handles the needs of the project exactly, and in PHP (I use Python :). 

The name of the wiki article is a unique key that is a combination of the author names and the year, in the following format: Author1Author2Author3EtAl10b. This works for scientific articles, but we may find we need to modify the key for other kinds of sources. The content of the wiki article is composed of an infobox constructed via the Citation template, and any other text and media the community determines it is useful and legal to include in the article. Example article:


Title: KangHsuKrajbichEtAl09

{{Citation
|publisher=SAGE Publications
|dateadded=2010-07-17
|author=Kang M.J. and Hsu M. and Krajbich I.M. and Loewenstein G. and McClure S.M. and Wang J.T. and Camerer C.F.
|abstract=Curiosity has been described as a desire for learning and knowledge, but its underlying mechanisms are not well understood. We scanned subjects with functional magnetic resonance imaging while they read trivia questions. The level of curiosity when reading questions was correlated with activity in caudate regions previously suggested to be involved in anticipated reward. This finding led to a behavioral study, which showed that subjects spent more scarce resources (either limited tokens or waiting time) to find out answers when they were more curious. The functional imaging also showed that curiosity increased activity in memory areas when subjects guessed incorrectly, which suggests that curiosity may enhance memory for surprising new information. This prediction about memory enhancement was confirmed in a behavioral study: Higher curiosity in an initial session was correlated with better recall of surprising answers 1 to 2 weeks later. 
|title=The Wick in the Candle of Learning
|bibtex type=article
|number=8
|volume=20
|owner=Sethherd
|journal=Psychological Science
|year=2009
|cites=O'ReillyFrank06,Cowan95,Wise04,Fuster80,Panksepp98,KakadeDayan02b,DelgadoLockeStengerEtAl03,BrewerZhaoDesmondEtAl98,DelgadoNystromFiez00,Beatty82,Baddeley92,Waanabe96,Roland93lm,DelgadoNystromFissellEtAl00,WagnerSchacterRotteEtAl98,SeymourDawDayanEtAl07,ODoherty04,BandettiniMoonen99,ODohertyDayanFristonEtAl03,RogersOwenRobbins99,KnutsonWestdorpKaiserEtAl00,CircuitryMemory,OReillyFrank06,Watanabe96a,BrewerZhaoGabrieli98,WagnerSchacterBuckner98,RogersOwenMiddletonEtAl99,Baddeley86,Watanabe96,Rolls96a,PallerWagner02
|cited_by=Author1Author2Author3EtAl10,etc...
|pages=963
}}

Then, any other WMF wiki, or any other MediaWiki, could cite this universal entry by simply typing {{cite|KangHsuKrajbichEtAl09}}

Additionally, if a technology such as Semantic MediaWiki is used (as it is in WikiPapers), arbitrary lists of collections of literature can be generated by constructing simple queries that are boolean combinations of template properties. Given that SMW does not scale well, I have a plan that uses Lucene instead for fast, scalable dynamic generation of collections of citations. Imagine the possibilities..

Feel free to provide your feedback on this idea, in addition to your own ideas, in this thread, or to me personally. I am especially interested in the potential benefits to the WMF projects that you see, and to hear your thoughts on the potential of this project on its own, as that will feature prominently in the proposal. Additionally, what do you think WikiCite would eventually be like, once it is fully matured?

Brian Mingus
Graduate Student
Computational Cognitive Neuroscience Lab
University of Colorado at Boulder


On Mon, Jul 19, 2010 at 11:22 AM, phoebe ayers <[hidden email]> wrote:
There have been a number of proposals floated in the Wikimedia
community over the years to build a wiki-based project for collecting
journal citation information. For those interested in that topic, you
might want to check out the University of Prince Edward Island's
"knowledge for all" project proposal -- it proposes to build an open
universal citation index (to serve as an alternative to the many
hundreds of proprietary citation index products that libraries
currently buy). This of course is not the first attempt at this
problem, but it's an interesting proposal that's getting a bit of buzz
in the library community.
http://library.upei.ca/k4all

-- phoebe

--
* I use this address for lists; send personal messages to phoebe.ayers
<at> gmail.com *

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

Federico Leva (Nemo)
Brian J Mingus, 19/07/2010 22:20:
> The basic idea is a centralized wiki that contains citation information that
> other MediaWikis and WMF projects can then reference using something like a
> {{cite}} template or a simple link. The community can document the citation,
> the author, the book etc.. and, in one idealization, all citations across
> all wikis would point to the same article on WikiCite. Users can use this
> wiki as their personal bibliography as well, as collections of citations can
> be exported in arbitrary citation formats.

I have already mentioned it before, but this description looks quite
similar to http://bibdex.org/ . Maybe we should join forces (i.e., send
your proposal also to Sunir Shah).

Nemo

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
fn
Reply | Threaded
Open this post in threaded view
|

Re: WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

fn
In reply to this post by Brian J Mingus


Hi Brian and others,


Interesting project. At WikiSym and Wikimania there were some discussions
on the issue of bibliographic databases - and more generally about
structured data in wikis and I mentioned your project briefly in my talk.
Daniel Kinzler (which might be on this mailing list) showed some initial
efforts for bibliographic databasing in Wikipedia. He did not reveal much
(I don't know if it is appropriate to tell about Daniel's project - but
now I have done it anyway...). I have started to build a bibliographic
wiki (Brede Wiki) that is entirely separate from Wikimedia. It is
available from here:

http://neuro.imm.dtu.dk/wiki/

My Wikimania talk about that wiki and related issues is available here
(the video may come later):

http://commons.wikimedia.org/wiki/File:Finn_%C3%85rup_Nielsen_-_Wikipedia_is_not_the_sum_of_all_human_knowledge_-_Wikimania_2010.pdf

I also think that it would be interesting with some bibliographic support,
for two-way citation tracking and commenting on articles (for example),
but I furthermore find that particular in science article we often find
data that is worth structuring and put in a database or a structured wiki,
so that we can extract the data for meta-analysis and specialized
information retrieval. That is what I also do in the Brede Wiki. I use the
templates to store such data. So if such a system as yours is implemented
we should not just think of it as a bibliographic database but in more
broader terms: A data wiki.

Yours and my system shares some similarities. Here are some differences:

As the 'key' (the wiki page title) I use the (lowercase) title of the
article. That might be more reader friendly - but usually longer. I think
that KangHsuKrajbichEtAl09 is too camel-cased. Neither the title nor
author list + year will be unique, so we need some predictable disambig.

I have one field to each author so that I can automatically link authors.
I use author1, author2, etc. fields. Likewise for URLs: url1, url2, etc.
In this way I can also 'database' authors, ie., I have a wiki page for
each author (regardless of notability). Also journals and organizations
and events are available in my wiki.

I do not include abstracts in my CC-by-sa'ed wiki, since I am not sure how
publishers regard the copyright for abstracts. Neither I am sure about the
forward cites. Most commerical publishers hide the cites for unpaid
viewing. Including cites in CC-by-sa material on a large-scale may
infringe publishers' copyright. Perhaps it is possible to negotiate with
some publishers. We need some talk with 'closed access' publishers before
we add a such data.

I am not sure what 'owner' is in your format. Surely you cant have owners
in Wikimedia/MediaWiki wiki? And 'dateadded' would already be recorded in
the revision history.

We probably need to check on the final format of the bibliographic
template to make sure it is easy translatable to the most common
bibliographic formats: bibtex, refman, Z3988 microformat, pubmed, etc.

As I understand there are issue with Semantic MediaWiki with respect to
performance and security that needs to be resolved before a large scale
deployment within Wikimedia Foundation projects. I heard that Markus
Krötzsch is going to Oxford to work on core SMW, so there might come some
changes to SMW in the future. Code audit of SMW lacks.

It not 'necessarily necessary' to make a new Wikimedia project. There
has been a suggestion (in the meta or strategy wiki) just to use a
namespace in Wikipedia. You could then have a page called
http://en.wikipedia.org/wiki/Bib:The_wick_in_the_candle_of_learning

I would say that a page called:

http://en.wikipedia.org/wiki/The_wick_in_the_candle_of_learning

would be the way to do it. But that would never pass the deletionists. :-)


/Finn


On Mon, 19 Jul 2010, Brian J Mingus wrote:

> I have been working with Sam and others for some time now on brainstorming a
> proposal for the Foundation to create a centralized wiki of citations, a
> WikiCite so to speak, if that is not the eventual name. My plan is to
> continue to discuss with folks who are knowledgeable and interested in such
> a project and to have the feedback I receive go into the proposal which I
> hope to write this summer. The proposal white paper will then be sent around
> to interested parties for corrections and feedback, including on-wiki and
> mailing lists, before eventually landing at the Foundation officially. As we
> know WMF has not started a new project in some years, so there is no
> official process. Thus I find it important to get it right.
>
> The basic idea is a centralized wiki that contains citation information that
> other MediaWikis and WMF projects can then reference using something like a
> {{cite}} template or a simple link. The community can document the citation,
> the author, the book etc.. and, in one idealization, all citations across
> all wikis would point to the same article on WikiCite. Users can use this
> wiki as their personal bibliography as well, as collections of citations can
> be exported in arbitrary citation formats. This general plan would allow
> community aggregation of metadata and community documentation of sources
> along arbitrary dimensions (quality, trust, reliability, etc.). The hope is
> that such a resource would then expand on that wiki and across the projects
> into summarizations of collections of sources (lit reviews) that
> make navigating entire fields of literature easier and more
> reliable, getting you out of the trap of not being aware of the global
> context that a particular source sits in.
>
> To give all a more concrete view, here is an example from some software that
> I have implemented in our lab called WikiPapers. Please take note that while
> this is a scientific literature example, the idea is general to *all
> publications ever*. Also, while I have implemented a feature-full version of
> a WikiCite, it's important to point out that for the WMF project we will
> need a new extension that handles the needs of the project exactly, and in
> PHP (I use Python :).
>
> The name of the wiki article is a unique key that is a combination of the
> author names and the year, in the following format:
> Author1Author2Author3EtAl10b. This works for scientific articles, but we may
> find we need to modify the key for other kinds of sources. The content of
> the wiki article is composed of an infobox constructed via the Citation
> template, and any other text and media the community determines it is useful
> and legal to include in the article. Example article:
>
> Screenshot of how this infobox renders on our wiki:
> http://grey.colorado.edu/mediawiki/sites/mingus/images/0/0e/KangHsuKrajbichEtAl10_infobox.png
>
> Title: KangHsuKrajbichEtAl09
>
> {{Citation
> |publisher=SAGE Publications
> |dateadded=2010-07-17
> |author=Kang M.J. and Hsu M. and Krajbich I.M. and Loewenstein G. and
> McClure S.M. and Wang J.T. and Camerer C.F.
> |url=http://pss.sagepub.com/content/20/8/963.full
> |abstract=Curiosity has been described as a desire for learning and
> knowledge, but its underlying mechanisms are not well understood. We scanned
> subjects with functional magnetic resonance imaging while they read trivia
> questions. The level of curiosity when reading questions was correlated with
> activity in caudate regions previously suggested to be involved in
> anticipated reward. This finding led to a behavioral study, which showed
> that subjects spent more scarce resources (either limited tokens or waiting
> time) to find out answers when they were more curious. The functional
> imaging also showed that curiosity increased activity in memory areas when
> subjects guessed incorrectly, which suggests that curiosity may enhance
> memory for surprising new information. This prediction about memory
> enhancement was confirmed in a behavioral study: Higher curiosity in an
> initial session was correlated with better recall of surprising answers 1 to
> 2 weeks later.
> |title=The Wick in the Candle of Learning
> |bibtex type=article
> |number=8
> |volume=20
> |owner=Sethherd
> |journal=Psychological Science
> |year=2009
> |cites=O'ReillyFrank06,Cowan95,Wise04,Fuster80,Panksepp98,KakadeDayan02b,DelgadoLockeStengerEtAl03,BrewerZhaoDesmondEtAl98,DelgadoNystromFiez00,Beatty82,Baddeley92,Waanabe96,Roland93lm,DelgadoNystromFissellEtAl00,WagnerSchacterRotteEtAl98,SeymourDawDayanEtAl07,ODoherty04,BandettiniMoonen99,ODohertyDayanFristonEtAl03,RogersOwenRobbins99,KnutsonWestdorpKaiserEtAl00,CircuitryMemory,OReillyFrank06,Watanabe96a,BrewerZhaoGabrieli98,WagnerSchacterBuckner98,RogersOwenMiddletonEtAl99,Baddeley86,Watanabe96,Rolls96a,PallerWagner02
> |cited_by=Author1Author2Author3EtAl10,etc...
> |pages=963
> }}
>
> Then, any other WMF wiki, or any other MediaWiki, could cite this universal
> entry by simply typing {{cite|KangHsuKrajbichEtAl09}}
>
> Additionally, if a technology such as Semantic MediaWiki is used (as it is
> in WikiPapers), arbitrary lists of collections of literature can be
> generated by constructing simple queries that are boolean combinations of
> template properties. Given that SMW does not scale well, I have a plan that
> uses Lucene instead for fast, scalable dynamic generation of collections of
> citations. Imagine the possibilities..
>
> Feel free to provide your feedback on this idea, in addition to your own
> ideas, in this thread, or to me personally. I am especially interested in
> the potential benefits to the WMF projects that you see, and to hear your
> thoughts on the potential of this project on its own, as that will feature
> prominently in the proposal. Additionally, what do you think WikiCite would
> eventually be like, once it is fully matured?
>
> Brian Mingus
> Graduate Student
> Computational Cognitive Neuroscience Lab
> University of Colorado at Boulder
>
>
> On Mon, Jul 19, 2010 at 11:22 AM, phoebe ayers <[hidden email]>wrote:
>
>> There have been a number of proposals floated in the Wikimedia
>> community over the years to build a wiki-based project for collecting
>> journal citation information. For those interested in that topic, you
>> might want to check out the University of Prince Edward Island's
>> "knowledge for all" project proposal -- it proposes to build an open
>> universal citation index (to serve as an alternative to the many
>> hundreds of proprietary citation index products that libraries
>> currently buy). This of course is not the first attempt at this
>> problem, but it's an interesting proposal that's getting a bit of buzz
>> in the library community.
>> http://library.upei.ca/k4all
___________________________________________________________________

          Finn Aarup Nielsen, DTU Informatics, Denmark
  Lundbeck Foundation Center for Integrated Molecular Brain Imaging
    http://www.imm.dtu.dk/~fn/      http://nru.dk/staff/fnielsen/
___________________________________________________________________

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

John Mark Vandenberg
On Tue, Jul 20, 2010 at 8:06 AM, Finn Aarup Nielsen <[hidden email]> wrote:

>..
> It not 'necessarily necessary' to make a new Wikimedia project. There has
> been a suggestion (in the meta or strategy wiki) just to use a namespace in
> Wikipedia. You could then have a page called
> http://en.wikipedia.org/wiki/Bib:The_wick_in_the_candle_of_learning
>
> I would say that a page called:
>
> http://en.wikipedia.org/wiki/The_wick_in_the_candle_of_learning
>
> would be the way to do it. But that would never pass the deletionists. :-)

French Wikipedia already has a namespace dedicated to pages about references.

http://fr.wikipedia.org/wiki/R%C3%A9f%C3%A9rence:Index

There is quite a bit of activity in this namespace:

http://fr.wikipedia.org/w/index.php?namespace=104&tagfilter=&title=Sp%E9cial%3AModifications+r%E9centes

English Wikipedia has a few groups of citation pages with bots that
fill in the details.

http://en.wikipedia.org/wiki/Special:PrefixIndex/Template:cite_doi
http://en.wikipedia.org/wiki/Special:PrefixIndex/Template:cite_pmid

--
John Vandenberg

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

Rob Lanphier
In reply to this post by Brian J Mingus
On Mon, Jul 19, 2010 at 1:20 PM, Brian J Mingus
<[hidden email]> wrote:
> I have been working with Sam and others for some time now on brainstorming a
> proposal for the Foundation to create a centralized wiki of citations, a
> WikiCite so to speak, if that is not the eventual name. My plan is to
> continue to discuss with folks who are knowledgeable and interested in such
> a project and to have the feedback I receive go into the proposal which I
> hope to write this summer.

This sounds great.  Just speaking as a community member, I've been
thinking about this topic a long time myself, and have plenty to add
to the conversation.

> The proposal white paper will then be sent around
> to interested parties for corrections and feedback, including on-wiki and
> mailing lists, before eventually landing at the Foundation officially. As we
> know WMF has not started a new project in some years, so there is no
> official process. Thus I find it important to get it right.

I'd suggest finding an on-wiki spot to discuss this work.  Here's one
place this has been discussed in the past that may be a good place to
revive the conversation:
http://strategy.wikimedia.org/wiki/Proposal:Building_a_database_of_all_books_ever_published

Rather than commenting on list about the subject itself, I've
commented on the discussion page there:
http://strategy.wikimedia.org/wiki/Proposal_talk:Building_a_database_of_all_books_ever_published#Fact_database_6531

Rob

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

Sunir Shah
In reply to this post by Federico Leva (Nemo)
Hey folks,

I've been lurking on this list since the beginning of time and saw
this fly by. Thanks Nemo for the shout out. That is pretty much what
Bibdex is about. My inspiration was a Big Hairy Goal  to provide a
central place where the body of academic knowledge can be curated by
the public in a wiki style. It's different than Wikipedia because
there is no NPOV and often research needs to be secret.

I originally tried this with both MeatballWiki and a similar service
called BibWiki. Bibdex is my latest adaptation based on what I learnt.
The current iteration embraces the face that  academia is built on
controversy. Different groups need to have space to express different
opinions apart from others. So, I rebuilt the software so that
research groups can create their own public annotated bibliographies
and control who has access to write to those bibliographies, much like
Google Groups has different levels of public and private access
control.

My understanding is that WikiCite is focused specifically on the needs
of the WMF projects. That has its own set of interesting use cases.

By the way, the http://www.openlibrary.org project is very inspiring
and in a similar vein, albeit restricted to books.

Cheers,
Sunir, Bibdex

On Mon, Jul 19, 2010 at 6:03 PM, Federico Leva (Nemo)
<[hidden email]> wrote:

> Brian J Mingus, 19/07/2010 22:20:
>> The basic idea is a centralized wiki that contains citation information that
>> other MediaWikis and WMF projects can then reference using something like a
>> {{cite}} template or a simple link. The community can document the citation,
>> the author, the book etc.. and, in one idealization, all citations across
>> all wikis would point to the same article on WikiCite. Users can use this
>> wiki as their personal bibliography as well, as collections of citations can
>> be exported in arbitrary citation formats.
>
> I have already mentioned it before, but this description looks quite
> similar to http://bibdex.org/ . Maybe we should join forces (i.e., send
> your proposal also to Sunir Shah).
>
> Nemo
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

metasj
In reply to this post by Federico Leva (Nemo)
Brian,

The meta process for new project proposals is still the cleanest one
for suggesting a specific Project and presenting it alongside similar
projects.

It would be helpful if you could update a related project proposal on
meta -- say, [[m:WikiBibliography]], if that seems relevant.  (I just
cleaned that page up and merged in an older proposal that had been
obfuscated.)

Or you can create a new project proposal...  WikiCite as a name can be
confusing, since it has been used to refer to this bibliographic idea,
but also to refer to the idea of citations for every statement or fact
- something closer to a blame or trust solution that includes
citations in its transactions.

We should figure out how this project would work with acawiki, and
possibly bibdex.  Bibdex doesn't aim to   And it would be helpful to
have a publicly-viewable demo to play with -- could you clone your
current wiki and populate the result with dummy data?

I love the idea of having a global place to discuss citations -- ALL
citations -- something that OpenLibrary, the arXiv, and anyone else
hosting cited documents could point to for every one of its works.

Sam.


On Mon, Jul 19, 2010 at 6:03 PM, Federico Leva (Nemo)
<[hidden email]> wrote:

> Brian J Mingus, 19/07/2010 22:20:
>> The basic idea is a centralized wiki that contains citation information that
>> other MediaWikis and WMF projects can then reference using something like a
>> {{cite}} template or a simple link. The community can document the citation,
>> the author, the book etc.. and, in one idealization, all citations across
>> all wikis would point to the same article on WikiCite. Users can use this
>> wiki as their personal bibliography as well, as collections of citations can
>> be exported in arbitrary citation formats.
>
> I have already mentioned it before, but this description looks quite
> similar to http://bibdex.org/ . Maybe we should join forces (i.e., send
> your proposal also to Sunir Shah).
>
> Nemo
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>



--
Samuel Klein          identi.ca:sj           w:user:sj

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

Daniel Kinzler
In reply to this post by fn
Hi all

A central place for managing Bibliographic data for use with Citations is
something that has been discussed by the German community for a long time. To
me, it consists of two parts: a project for managing the structured data, and a
machanism for uzsing that data on the wikis.

I have been working on the latter recently, and there's a working prototype: on
 <http://prototype.wikimedia.org/wmde-sandbox-1/Wikipedia:DataTransclusion> you
can see how data records can be included from external sources. A demo for the
actual on-wiki use can be found at
<http://prototype.wikimedia.org/wmde-sandbox-1/Ameisenigel#Literatur>, where
{{ISBN|0868400467}} is used to show the bibliographic info for that book. (side
note: the prototype wikis are slow. sorry about that).

Fetching and showing the data is done using
<http://www.mediawiki.org/wiki/Extension:DataTransclusion>. Care has been taken
to make this secure and scalable.

For a first demo, I'm using teh ISBN as the key, but any kind of key could be
used to reference resources other than books.

For demoing managing the data by ourselves, I have set up ab SMW instance. An
example bib record is at
<http://prototype.wikimedia.org/wmde-bib/ISBN:0451526538>, it's used across
wikis at
<http://prototype.wikimedia.org/wmde-sandbox-1/Wikipedia:DataTransclusion>. Note
that changes will show delayed, as the data is cached for a while.


When discussing these things, please keep in mind that there are two components:
fetching and displaying external data records, and managing structured data in a
wiki style. The former is much simpler than the latter. I think we should really
aim at getting both, but we can start off with transclusing external data much
faster, if we allow no-so-wiki data sources. For ISBN-based queries, we could
simply fetch information from http://openlibrary.org - or the open knowledge
foundation's http://bibliographica.org, once it's working.

In the context of bibdex, I recommend to also have a look at
http://bibsonomy.org - it's a university research project, open source, and is
quite similar to bibdex (and to what citeulike used to be).

As to managing structured data ourselves: I have talked a lot with Erik Möller
and Markus Krötzsch about this, and I'm in touch with the people wo make DBpedia
and OntoWiki. Everyone wants this. But it's not simple at all to get it right
(efficient versioning of multilingual data in a document oriented database,
anyone? want inference? reasoning, even? yay...). So the plan is currently to
hatch a concrete plan for this. And I imagine that bibliographical and
biographical info will be among the first used cases.

cheers,
daniel


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

Brian J Mingus
In reply to this post by fn
On Mon, Jul 19, 2010 at 4:06 PM, Finn Aarup Nielsen <[hidden email]> wrote:


Hi Brian and others,

I also think that it would be interesting with some bibliographic support, for two-way citation tracking and commenting on articles (for example), but I furthermore find that particular in science article we often find data that is worth structuring and put in a database or a structured wiki, so that we can extract the data for meta-analysis and specialized information retrieval. That is what I also do in the Brede Wiki. I use the templates to store such data. So if such a system as yours is implemented we should not just think of it as a bibliographic database but in more broader terms: A data wiki.

Although the technology required to make a WikiCite happen will be applicable to a more generalized wiki for storing data I think that is too broad for the current proposal. A WMF analogue to Google Base is an entirely new beast that has its own requirements. I certainly think it's an interesting and worthwhile idea, but I don't feel that we are there yet.

As the 'key' (the wiki page title) I use the (lowercase) title of the article. That might be more reader friendly - but usually longer. I think that KangHsuKrajbichEtAl09 is too camel-cased. Neither the title nor author list + year will be unique, so we need some predictable disambig.

I noticed that AcaWiki is using the title, but I am personally not a fan of it. The motivation for using a key comes from BibTeX. When you cite an entry in a publication in LaTeX, you type \cite{key}. Also, I think most bibliographic formats support such a key. The idea is that there is a universal token that you can type into Google that will lead you to the right item. The predictable disambig is in the format I sent out (which likely needs modification for other kinds of sources). The format is Author1Author2Author3EtAlYYb. Here is a real world example from a pair of very prolific scientists, Deco & Rolls, who published at least three papers together in 2005. In our lab we have really come to love these keys - they are very memorable tokens that you can verbally pass on to other scientists in the midst of a discussion. Eventually, if they enter the key you have given them into Google, they will get the right entry at "WikiCite".

DecoRolls05 - Synaptic and spiking dynamics underlying reward reversal in the orbitofrontal cortex.
DecoRolls05b - Sequential memory: a putative neural and synaptic dynamical mechanism.
DecoRolls05c - Attention, short-term memory, and action selection: a unifying theory.

I have one field to each author so that I can automatically link authors.

This is accomplished via Semantic Forms, using the arraymap parser function. You just provide a comma-separated list of authors, and they each get semantic property definitions and deep linking to all papers published by that author.

I do not include abstracts in my CC-by-sa'ed wiki, since I am not sure how publishers regard the copyright for abstracts. Neither I am sure about the forward cites. Most commerical publishers hide the cites for unpaid viewing. Including cites in CC-by-sa material on a large-scale may infringe publishers' copyright. Perhaps it is possible to negotiate with some publishers. We need some talk with 'closed access' publishers before we add a such data.

Yes, I have added many nice features to WikiPapers that can unfortunately not make it into the proposed WMF project. Some can, some can't. For example, adding papers to the wiki is via a one click bookmarklet. First, you highlight the title of a paper anywhere on the web, be it a webpage, e-mail, or journal site. Then, you click your "Add to wiki" bookmarklet. On my webserver I am running the citation scraping software from Connotea, CiteULike, and Zotero. I also have a Google Scholar scraper and PubMed importer. You can choose to use one of those sources, or you can choose to merge all of the metadata together. It's automatically added to the wiki for you. Additionally, I have written a bash script that is very adept at getting the pdfs from journals, so it automatically tries to download the pdf and upload it to the wiki for you. I have also implemented the ability to compute the articles that an article cites, and vice versa. With respect to abstracts these scrapers aren't that great. Abstracts usually come from PubMed, whose database you can license, but you cannot change their metadata IIRC. 

Ultimately, I think the community will have to take a very careful look at what data can be added to the wiki and design policies accordingly. On Wikipedia I believe copyright enforcement has largely been up to the community, and it takes a long time to converge on appropriate policies. Needless to say, much of the technologies I described in the last paragraph would not be found legal on a public wiki.

I am not sure what 'owner' is in your format. Surely you cant have owners in Wikimedia/MediaWiki wiki? And 'dateadded' would already be recorded in the revision history.

The 'owner' field is a misnomer, but in lieu of mysql support it lets you know which individuals have that entry in their personal bibliographies. dateadded is needed due to what at least used to be a bug in Semantic MediaWiki.

We probably need to check on the final format of the bibliographic template to make sure it is easy translatable to the most common bibliographic formats: bibtex, refman, Z3988 microformat, pubmed, etc.

I have written extensive amounts of Python interchange code between wiki template syntax and BibTeX. I chose BibTeX because it is rather standard, our lab uses it, and it is very similar to template syntax. Also, I use Bibutils to convert from BibTeX to most popular formats, and vice versa for mass import of bibliographies: http://www.scripps.edu/~cdputnam/software/bibutils/

As I understand there are issue with Semantic MediaWiki with respect to performance and security that needs to be resolved before a large scale deployment within Wikimedia Foundation projects. I heard that Markus Krötzsch is going to Oxford to work on core SMW, so there might come some changes to SMW in the future. Code audit of SMW lacks.

As I was writing a custom Lucene search engine for WikiPapers I realized that it is a perfect replacement for Semantic MediaWiki. Lucene has fields, it supports boolean operators and you can format its output. All that is needed is to write the Lucene backend (perhaps just modifying MWLucene) and write a parser function that supports using templates for formatting of the output of queries. Lucene is extremely fast and can scale to whatever we can imagine doing. That's my proposed plan.

It not 'necessarily necessary' to make a new Wikimedia project. There has been a suggestion (in the meta or strategy wiki) just to use a namespace in Wikipedia. You could then have a page called http://en.wikipedia.org/wiki/Bib:The_wick_in_the_candle_of_learning

I believe it is necessary. First, the idea is for any mediawiki anywhere (and any software with appropriate extensions) to be able to cite the same source. Secondly, the project would be multilingual. 

Cheers,

Brian Mingus
Graduate Student
Computational Cognitive Neuroscience Lab
University of Colorado at Boulder


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

Brian J Mingus
In reply to this post by Rob Lanphier


On Mon, Jul 19, 2010 at 8:08 PM, Rob Lanphier <[hidden email]> wrote:
On Mon, Jul 19, 2010 at 1:20 PM, Brian J Mingus
<[hidden email]> wrote:
> I have been working with Sam and others for some time now on brainstorming a
> proposal for the Foundation to create a centralized wiki of citations, a
> WikiCite so to speak, if that is not the eventual name. My plan is to
> continue to discuss with folks who are knowledgeable and interested in such
> a project and to have the feedback I receive go into the proposal which I
> hope to write this summer.

This sounds great.  Just speaking as a community member, I've been
thinking about this topic a long time myself, and have plenty to add
to the conversation.

> The proposal white paper will then be sent around
> to interested parties for corrections and feedback, including on-wiki and
> mailing lists, before eventually landing at the Foundation officially. As we
> know WMF has not started a new project in some years, so there is no
> official process. Thus I find it important to get it right.

I'd suggest finding an on-wiki spot to discuss this work.  Here's one
place this has been discussed in the past that may be a good place to
revive the conversation:
http://strategy.wikimedia.org/wiki/Proposal:Building_a_database_of_all_books_ever_published

Rather than commenting on list about the subject itself, I've
commented on the discussion page there:
http://strategy.wikimedia.org/wiki/Proposal_talk:Building_a_database_of_all_books_ever_published#Fact_database_6531

Rob

Rob,

Thanks for bringing my attention to this proposal. It certainly has some of the same ring as this project, with of course some important differences. Commonalities between the projects are that they are multilingual and require a powerful search engine. Differences are that this project is for all literary sources and that I believe it is best suited at the WMF. The widespread use of citations across the Wikipedias will drive user contributions towards adding richer metadata to those citations. And having a source of citations available will increase the quality of the Wikipedias as it becomes easier and easier to cite sources.

Brian

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

Brian J Mingus
In reply to this post by metasj


On Mon, Jul 19, 2010 at 9:37 PM, Samuel Klein <meta.sj@gmail.com> wrote:
Brian,

The meta process for new project proposals is still the cleanest one
for suggesting a specific Project and presenting it alongside similar
projects.

It would be helpful if you could update a related project proposal on
meta -- say, [[m:WikiBibliography]], if that seems relevant.  (I just
cleaned that page up and merged in an older proposal that had been
obfuscated.)


Thanks for your work on this - definitely in the right direction! I will consider whether I feel it's the right way for me to get started. One point is that I am pointing more in the direction of a long-form proposal, and I have more experience writing white-paper proposals for academia. I certainly want it to end up on wiki, but when TPTB finally read the proposal perhaps they will find it more persuasive if it is a professional looking document that lands in their inbox. 
 
Or you can create a new project proposal...  WikiCite as a name can be
confusing, since it has been used to refer to this bibliographic idea,
but also to refer to the idea of citations for every statement or fact
- something closer to a blame or trust solution that includes
citations in its transactions.


Another name that I have come up with is OpenScholar. I still rather like it, but suspect it has too much of a scientific ring to it? Names are certainly very important so we should do more work on this avenue. Including a list of names in the proposal would be a good idea, and perhaps the final name will be a combination of existing name proposals.
 
We should figure out how this project would work with acawiki, and
possibly bibdex.  Bibdex doesn't aim to   And it would be helpful to
have a publicly-viewable demo to play with -- could you clone your
current wiki and populate the result with dummy data?

The problem with WikiPapers is that it has too many features! A feature-thin version would be ideal for the proposal though, so I will plan to have some kind of a demo site available.
 
I love the idea of having a global place to discuss citations -- ALL
citations -- something that OpenLibrary, the arXiv, and anyone else
hosting cited documents could point to for every one of its works.

Exactly :)

Brian 
 
Sam.


On Mon, Jul 19, 2010 at 6:03 PM, Federico Leva (Nemo)
<[hidden email]> wrote:
> Brian J Mingus, 19/07/2010 22:20:
>> The basic idea is a centralized wiki that contains citation information that
>> other MediaWikis and WMF projects can then reference using something like a
>> {{cite}} template or a simple link. The community can document the citation,
>> the author, the book etc.. and, in one idealization, all citations across
>> all wikis would point to the same article on WikiCite. Users can use this
>> wiki as their personal bibliography as well, as collections of citations can
>> be exported in arbitrary citation formats.
>
> I have already mentioned it before, but this description looks quite
> similar to http://bibdex.org/ . Maybe we should join forces (i.e., send
> your proposal also to Sunir Shah).
>
> Nemo
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>



--
Samuel Klein          identi.ca:sj           w:user:sj

_______________________________________________
foundation-l mailing list
[hidden email]
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

Brian J Mingus
In reply to this post by Daniel Kinzler


On Tue, Jul 20, 2010 at 5:10 AM, Daniel Kinzler <[hidden email]> wrote:
Hi all

A central place for managing Bibliographic data for use with Citations is
something that has been discussed by the German community for a long time. To
me, it consists of two parts: a project for managing the structured data, and a
machanism for uzsing that data on the wikis.

I have been working on the latter recently, and there's a working prototype: on
 <http://prototype.wikimedia.org/wmde-sandbox-1/Wikipedia:DataTransclusion> you
can see how data records can be included from external sources. A demo for the
actual on-wiki use can be found at
<http://prototype.wikimedia.org/wmde-sandbox-1/Ameisenigel#Literatur>, where
{{ISBN|0868400467}} is used to show the bibliographic info for that book. (side
note: the prototype wikis are slow. sorry about that).

Fetching and showing the data is done using
<http://www.mediawiki.org/wiki/Extension:DataTransclusion>. Care has been taken
to make this secure and scalable.

For a first demo, I'm using teh ISBN as the key, but any kind of key could be
used to reference resources other than books.

For demoing managing the data by ourselves, I have set up ab SMW instance. An
example bib record is at
<http://prototype.wikimedia.org/wmde-bib/ISBN:0451526538>, it's used across
wikis at
<http://prototype.wikimedia.org/wmde-sandbox-1/Wikipedia:DataTransclusion>. Note
that changes will show delayed, as the data is cached for a while.


When discussing these things, please keep in mind that there are two components:
fetching and displaying external data records, and managing structured data in a
wiki style. The former is much simpler than the latter. I think we should really
aim at getting both, but we can start off with transclusing external data much
faster, if we allow no-so-wiki data sources. For ISBN-based queries, we could
simply fetch information from http://openlibrary.org - or the open knowledge
foundation's http://bibliographica.org, once it's working.

In the context of bibdex, I recommend to also have a look at
http://bibsonomy.org - it's a university research project, open source, and is
quite similar to bibdex (and to what citeulike used to be).

As to managing structured data ourselves: I have talked a lot with Erik Möller
and Markus Krötzsch about this, and I'm in touch with the people wo make DBpedia
and OntoWiki. Everyone wants this. But it's not simple at all to get it right
(efficient versioning of multilingual data in a document oriented database,
anyone? want inference? reasoning, even? yay...). So the plan is currently to
hatch a concrete plan for this. And I imagine that bibliographical and
biographical info will be among the first used cases.


Hi Daniel, 

Have you considered that Lucene is the perfect backend for this kind of project? What kinds of faults do you see with it? At least in my mind, we can mold it to our needs here. It has the core capabilities found in Semantic MediaWiki, and it is fast and scalable.

I say this as a serious user of Semantic MediaWiki. I have seen that it can't scale well without an alternate backend, and I wonder what kind of monumental effort will be required to make it scale to tens or hundreds of millions of documents, each of which containing 20-50 properties. Lucene can already do this, SMW, not so much ;-)

Brian

 
cheers,
daniel


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

Jodi Schneider-2
In reply to this post by Brian J Mingus
Hi Brian,

On 20 Jul 2010, at 18:02, Brian J Mingus wrote:

On Mon, Jul 19, 2010 at 4:06 PM, Finn Aarup Nielsen <[hidden email]> wrote:


Hi Brian and others,

I also think that it would be interesting with some bibliographic support, for two-way citation tracking and commenting on articles (for example), but I furthermore find that particular in science article we often find data that is worth structuring and put in a database or a structured wiki, so that we can extract the data for meta-analysis and specialized information retrieval. That is what I also do in the Brede Wiki. I use the templates to store such data. So if such a system as yours is implemented we should not just think of it as a bibliographic database but in more broader terms: A data wiki.

Although the technology required to make a WikiCite happen will be applicable to a more generalized wiki for storing data I think that is too broad for the current proposal. A WMF analogue to Google Base is an entirely new beast that has its own requirements. I certainly think it's an interesting and worthwhile idea, but I don't feel that we are there yet.

As the 'key' (the wiki page title) I use the (lowercase) title of the article. That might be more reader friendly - but usually longer. I think that KangHsuKrajbichEtAl09 is too camel-cased. Neither the title nor author list + year will be unique, so we need some predictable disambig.

I noticed that AcaWiki is using the title, but I am personally not a fan of it. The motivation for using a key comes from BibTeX. When you cite an entry in a publication in LaTeX, you type \cite{key}. Also, I think most bibliographic formats support such a key. The idea is that there is a universal token that you can type into Google that will lead you to the right item. The predictable disambig is in the format I sent out (which likely needs modification for other kinds of sources). The format is Author1Author2Author3EtAlYYb. Here is a real world example from a pair of very prolific scientists, Deco & Rolls, who published at least three papers together in 2005. In our lab we have really come to love these keys - they are very memorable tokens that you can verbally pass on to other scientists in the midst of a discussion. Eventually, if they enter the key you have given them into Google, they will get the right entry at "WikiCite".

DecoRolls05 - Synaptic and spiking dynamics underlying reward reversal in the orbitofrontal cortex.
DecoRolls05b - Sequential memory: a putative neural and synaptic dynamical mechanism.
DecoRolls05c - Attention, short-term memory, and action selection: a unifying theory.

Citation keys of this sort work, but they have to be decided on by some external system. Who decides which paper is -, b, and c? Publication order would be one way to do it -- but that's complicated, especially with online first publication, or overlapping conferences.

I think whether they're memorable tokens might vary by person... Sure, the author and year will be identifiable, even memorable. But the a, b, c?

If you want to support more than recent works, I'd urge YYYY instead of YY. Then we only have an issue for pre-0 stuff. :)

Also consider differentiating authors from title and year, perhaps with slashes. 
author1-author2-author3-etal/YYYY/b
I'm not convinced that -'s are better than capital letters (author last names can have both)...


I have one field to each author so that I can automatically link authors.

This is accomplished via Semantic Forms, using the arraymap parser function. You just provide a comma-separated list of authors, and they each get semantic property definitions and deep linking to all papers published by that author.

Sure -- unless authors have the same name, or use different forms of the name.

One of my coauthors goes by John G. Breslin for disambiguration since his name is common -- but on the institute website he's credited as John Breslin, since that's the only name the system recognizes.

In other words, some authority control will be needed. Libraries have a long history with this. Groups of booklovers do it, too. For instance, here's the LibraryThing page for John Smith:
Notice that you can split and join authors -- LibraryThing's way of giving users the ability to join and separate.
Or see
Sometimes there are difficult questions -- such as "Is Lewis Carroll the same as Charles Dodgson?" - which depends on what you mean by "same".

For the scope of the potential problem, look at highly published authors -- for instance the "alternative names" list for Dante:


I do not include abstracts in my CC-by-sa'ed wiki, since I am not sure how publishers regard the copyright for abstracts. Neither I am sure about the forward cites. Most commerical publishers hide the cites for unpaid viewing. Including cites in CC-by-sa material on a large-scale may infringe publishers' copyright. Perhaps it is possible to negotiate with some publishers. We need some talk with 'closed access' publishers before we add a such data.

Yes, I have added many nice features to WikiPapers that can unfortunately not make it into the proposed WMF project. Some can, some can't. For example, adding papers to the wiki is via a one click bookmarklet. First, you highlight the title of a paper anywhere on the web, be it a webpage, e-mail, or journal site. Then, you click your "Add to wiki" bookmarklet. On my webserver I am running the citation scraping software from Connotea, CiteULike, and Zotero. I also have a Google Scholar scraper and PubMed importer. You can choose to use one of those sources, or you can choose to merge all of the metadata together. It's automatically added to the wiki for you. Additionally, I have written a bash script that is very adept at getting the pdfs from journals, so it automatically tries to download the pdf and upload it to the wiki for you. I have also implemented the ability to compute the articles that an article cites, and vice versa. With respect to abstracts these scrapers aren't that great. Abstracts usually come from PubMed, whose database you can license, but you cannot change their metadata IIRC. 

Ultimately, I think the community will have to take a very careful look at what data can be added to the wiki and design policies accordingly. On Wikipedia I believe copyright enforcement has largely been up to the community, and it takes a long time to converge on appropriate policies. Needless to say, much of the technologies I described in the last paragraph would not be found legal on a public wiki.

I am not sure what 'owner' is in your format. Surely you cant have owners in Wikimedia/MediaWiki wiki? And 'dateadded' would already be recorded in the revision history.

The 'owner' field is a misnomer, but in lieu of mysql support it lets you know which individuals have that entry in their personal bibliographies. dateadded is needed due to what at least used to be a bug in Semantic MediaWiki.

We probably need to check on the final format of the bibliographic template to make sure it is easy translatable to the most common bibliographic formats: bibtex, refman, Z3988 microformat, pubmed, etc.

I have written extensive amounts of Python interchange code between wiki template syntax and BibTeX. I chose BibTeX because it is rather standard, our lab uses it, and it is very similar to template syntax. Also, I use Bibutils to convert from BibTeX to most popular formats, and vice versa for mass import of bibliographies: http://www.scripps.edu/~cdputnam/software/bibutils/

BibTeX is good for backwards compatibility, but I'd urge a richer data format -- probably based on bibo RDF:
It's already widely used: http://bibliontology.com/projects


As I understand there are issue with Semantic MediaWiki with respect to performance and security that needs to be resolved before a large scale deployment within Wikimedia Foundation projects. I heard that Markus Krötzsch is going to Oxford to work on core SMW, so there might come some changes to SMW in the future. Code audit of SMW lacks.

As I was writing a custom Lucene search engine for WikiPapers I realized that it is a perfect replacement for Semantic MediaWiki. Lucene has fields, it supports boolean operators and you can format its output. All that is needed is to write the Lucene backend (perhaps just modifying MWLucene) and write a parser function that supports using templates for formatting of the output of queries. Lucene is extremely fast and can scale to whatever we can imagine doing. That's my proposed plan.

It not 'necessarily necessary' to make a new Wikimedia project. There has been a suggestion (in the meta or strategy wiki) just to use a namespace in Wikipedia. You could then have a page called http://en.wikipedia.org/wiki/Bib:The_wick_in_the_candle_of_learning

I believe it is necessary. First, the idea is for any mediawiki anywhere (and any software with appropriate extensions) to be able to cite the same source. Secondly, the project would be multilingual. 

I think somebody's mentioned OpenLibrary on this thread. In case not:
http://openlibrary.org/
Its scope is limited to books, but their interests are similar.

-Jodi



Cheers,

Brian Mingus
Graduate Student
Computational Cognitive Neuroscience Lab
University of Colorado at Boulder

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

Brian J Mingus


On Tue, Jul 20, 2010 at 11:56 AM, Jodi Schneider <[hidden email]> wrote:
Hi Brian,

On 20 Jul 2010, at 18:02, Brian J Mingus wrote:

On Mon, Jul 19, 2010 at 4:06 PM, Finn Aarup Nielsen <[hidden email]> wrote:


Hi Brian and others,

I also think that it would be interesting with some bibliographic support, for two-way citation tracking and commenting on articles (for example), but I furthermore find that particular in science article we often find data that is worth structuring and put in a database or a structured wiki, so that we can extract the data for meta-analysis and specialized information retrieval. That is what I also do in the Brede Wiki. I use the templates to store such data. So if such a system as yours is implemented we should not just think of it as a bibliographic database but in more broader terms: A data wiki.

Although the technology required to make a WikiCite happen will be applicable to a more generalized wiki for storing data I think that is too broad for the current proposal. A WMF analogue to Google Base is an entirely new beast that has its own requirements. I certainly think it's an interesting and worthwhile idea, but I don't feel that we are there yet.

As the 'key' (the wiki page title) I use the (lowercase) title of the article. That might be more reader friendly - but usually longer. I think that KangHsuKrajbichEtAl09 is too camel-cased. Neither the title nor author list + year will be unique, so we need some predictable disambig.

I noticed that AcaWiki is using the title, but I am personally not a fan of it. The motivation for using a key comes from BibTeX. When you cite an entry in a publication in LaTeX, you type \cite{key}. Also, I think most bibliographic formats support such a key. The idea is that there is a universal token that you can type into Google that will lead you to the right item. The predictable disambig is in the format I sent out (which likely needs modification for other kinds of sources). The format is Author1Author2Author3EtAlYYb. Here is a real world example from a pair of very prolific scientists, Deco & Rolls, who published at least three papers together in 2005. In our lab we have really come to love these keys - they are very memorable tokens that you can verbally pass on to other scientists in the midst of a discussion. Eventually, if they enter the key you have given them into Google, they will get the right entry at "WikiCite".

DecoRolls05 - Synaptic and spiking dynamics underlying reward reversal in the orbitofrontal cortex.
DecoRolls05b - Sequential memory: a putative neural and synaptic dynamical mechanism.
DecoRolls05c - Attention, short-term memory, and action selection: a unifying theory.

Citation keys of this sort work, but they have to be decided on by some external system. Who decides which paper is -, b, and c? Publication order would be one way to do it -- but that's complicated, especially with online first publication, or overlapping conferences.

I think whether they're memorable tokens might vary by person... Sure, the author and year will be identifiable, even memorable. But the a, b, c?

If you want to support more than recent works, I'd urge YYYY instead of YY. Then we only have an issue for pre-0 stuff. :)

Also consider differentiating authors from title and year, perhaps with slashes. 
author1-author2-author3-etal/YYYY/b
I'm not convinced that -'s are better than capital letters (author last names can have both)...

The key seems to be a very important point, so it's important that we get it right. My thinking is guided by several constraints. First, I strongly dislike the numeric keys used at sites such as CiteULike and most database sites (such as 7523225). To the greatest degree possible I believe the key should actually convey what is behind the link. On the other hand, the key should not be too long. Numeric keys maximize the shortness while telling you nothing , whereas titles as keys are very long and don't give you some of the most important information - the authors and the year it was published. The key format I have suggested does seem to have a flaw, being that it easily becomes ambiguous and you must resort to a token that is not easily memorable. Then again, even though many authors and sets of authors will publish multiple items in a year, the vast majority of works have a unique set of authors for a given year. 

I like your suggestion that the abc disambiguator be chosen based on the first date of publication, and I also like the prospect of using slashes since they can't be contained in names. Using the full year is a good idea too. We can combine these to come up with a key that, in principle, is guaranteed to be unique. This key would contain:

1) The first three author names separated by slashes
2) If there are more than three authors, an EtAl
3) Some or all of the date. For instance, if there is only one source by this set of authors that year, we can just use YYYY. However, once another source by those set of authors is added, the key should change to MMDDYYYY or similar. If there are multiple publications on the same day, we can resort to abc. Redirects and disambiguation pages can be set up when a key changes. 

Since the slashes are somewhat cumbersome, perhaps we can not make them mandatory, but similarly use them only when they are necessary in order to "escape" a name. In the case that one of the authors does not have a slash in their name - the dominant case - we can stick to the easily legible and niecly compact CamelCase format.

Example keys generated by this algorithm:

KangHsuKrajbichEtAl2009
Author1Author2/Author-Three/2009
Author1Author2AuthorThree10032009
Author1Author2AuthorThree12312009
 

I have one field to each author so that I can automatically link authors.

This is accomplished via Semantic Forms, using the arraymap parser function. You just provide a comma-separated list of authors, and they each get semantic property definitions and deep linking to all papers published by that author.

Sure -- unless authors have the same name, or use different forms of the name.

One of my coauthors goes by John G. Breslin for disambiguration since his name is common -- but on the institute website he's credited as John Breslin, since that's the only name the system recognizes.

In other words, some authority control will be needed. Libraries have a long history with this. Groups of booklovers do it, too. For instance, here's the LibraryThing page for John Smith:
Notice that you can split and join authors -- LibraryThing's way of giving users the ability to join and separate.
Or see
Sometimes there are difficult questions -- such as "Is Lewis Carroll the same as Charles Dodgson?" - which depends on what you mean by "same".

For the scope of the potential problem, look at highly published authors -- for instance the "alternative names" list for Dante:


LibraryThing is a great example of how to do disambiguation. We can only hope that we can likewise someday have a user community as pedantic and dedicated as theirs ;-) A big part of their success is in providing their users with straightforward tools for doing the disambig work. 
 

I do not include abstracts in my CC-by-sa'ed wiki, since I am not sure how publishers regard the copyright for abstracts. Neither I am sure about the forward cites. Most commerical publishers hide the cites for unpaid viewing. Including cites in CC-by-sa material on a large-scale may infringe publishers' copyright. Perhaps it is possible to negotiate with some publishers. We need some talk with 'closed access' publishers before we add a such data.

Yes, I have added many nice features to WikiPapers that can unfortunately not make it into the proposed WMF project. Some can, some can't. For example, adding papers to the wiki is via a one click bookmarklet. First, you highlight the title of a paper anywhere on the web, be it a webpage, e-mail, or journal site. Then, you click your "Add to wiki" bookmarklet. On my webserver I am running the citation scraping software from Connotea, CiteULike, and Zotero. I also have a Google Scholar scraper and PubMed importer. You can choose to use one of those sources, or you can choose to merge all of the metadata together. It's automatically added to the wiki for you. Additionally, I have written a bash script that is very adept at getting the pdfs from journals, so it automatically tries to download the pdf and upload it to the wiki for you. I have also implemented the ability to compute the articles that an article cites, and vice versa. With respect to abstracts these scrapers aren't that great. Abstracts usually come from PubMed, whose database you can license, but you cannot change their metadata IIRC. 

Ultimately, I think the community will have to take a very careful look at what data can be added to the wiki and design policies accordingly. On Wikipedia I believe copyright enforcement has largely been up to the community, and it takes a long time to converge on appropriate policies. Needless to say, much of the technologies I described in the last paragraph would not be found legal on a public wiki.

I am not sure what 'owner' is in your format. Surely you cant have owners in Wikimedia/MediaWiki wiki? And 'dateadded' would already be recorded in the revision history.

The 'owner' field is a misnomer, but in lieu of mysql support it lets you know which individuals have that entry in their personal bibliographies. dateadded is needed due to what at least used to be a bug in Semantic MediaWiki.

We probably need to check on the final format of the bibliographic template to make sure it is easy translatable to the most common bibliographic formats: bibtex, refman, Z3988 microformat, pubmed, etc.

I have written extensive amounts of Python interchange code between wiki template syntax and BibTeX. I chose BibTeX because it is rather standard, our lab uses it, and it is very similar to template syntax. Also, I use Bibutils to convert from BibTeX to most popular formats, and vice versa for mass import of bibliographies: http://www.scripps.edu/~cdputnam/software/bibutils/

BibTeX is good for backwards compatibility, but I'd urge a richer data format -- probably based on bibo RDF:
It's already widely used: http://bibliontology.com/projects


It was probably a mistake for me to describe WikiPapers as designed around BibTeX. In fact, it's designed around mediawiki templates. From templates as your start, you can support any other format for both import and export.  

As I understand there are issue with Semantic MediaWiki with respect to performance and security that needs to be resolved before a large scale deployment within Wikimedia Foundation projects. I heard that Markus Krötzsch is going to Oxford to work on core SMW, so there might come some changes to SMW in the future. Code audit of SMW lacks.

As I was writing a custom Lucene search engine for WikiPapers I realized that it is a perfect replacement for Semantic MediaWiki. Lucene has fields, it supports boolean operators and you can format its output. All that is needed is to write the Lucene backend (perhaps just modifying MWLucene) and write a parser function that supports using templates for formatting of the output of queries. Lucene is extremely fast and can scale to whatever we can imagine doing. That's my proposed plan.

It not 'necessarily necessary' to make a new Wikimedia project. There has been a suggestion (in the meta or strategy wiki) just to use a namespace in Wikipedia. You could then have a page called http://en.wikipedia.org/wiki/Bib:The_wick_in_the_candle_of_learning

I believe it is necessary. First, the idea is for any mediawiki anywhere (and any software with appropriate extensions) to be able to cite the same source. Secondly, the project would be multilingual. 

I think somebody's mentioned OpenLibrary on this thread. In case not:
http://openlibrary.org/
Its scope is limited to books, but their interests are similar.

-Jodi



Cheers,

Brian Mingus
Graduate Student
Computational Cognitive Neuroscience Lab
University of Colorado at Boulder

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

phoebe ayers-3
Hi guys! I'm glad my little post helped re-start such a productive
conversation.

Since some people are replying only to the research-l list and some to
both research-l and foundation-l (my fault for cc'ing both) maybe we
should centralize this discussion (at least of the nitty gritty
metadata issues) on the research list for now? thread here:
http://lists.wikimedia.org/pipermail/wiki-research-l/2010-July/thread.html

Of course the perennial issue of how to propose a new WMF project is
very much a foundation-l topic.

regards,
phoebe

On Tue, Jul 20, 2010 at 12:26 PM, Brian J Mingus
<[hidden email]> wrote:

>
>
> On Tue, Jul 20, 2010 at 11:56 AM, Jodi Schneider <[hidden email]>
> wrote:
>>
>> Hi Brian,
>> On 20 Jul 2010, at 18:02, Brian J Mingus wrote:
>>
>> On Mon, Jul 19, 2010 at 4:06 PM, Finn Aarup Nielsen <[hidden email]> wrote:
>>>
>>>
>>> Hi Brian and others,
>>>
>>> I also think that it would be interesting with some bibliographic
>>> support, for two-way citation tracking and commenting on articles (for
>>> example), but I furthermore find that particular in science article we often
>>> find data that is worth structuring and put in a database or a structured
>>> wiki, so that we can extract the data for meta-analysis and specialized
>>> information retrieval. That is what I also do in the Brede Wiki. I use the
>>> templates to store such data. So if such a system as yours is implemented we
>>> should not just think of it as a bibliographic database but in more broader
>>> terms: A data wiki.
>>

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

Daniel Mietchen
In reply to this post by Brian J Mingus
On Tue, Jul 20, 2010 at 9:26 PM, Brian J Mingus
<[hidden email]> wrote:
> I like your suggestion that the abc disambiguator be chosen based on the
> first date of publication, and I also like the prospect of using slashes
> since they can't be contained in names. Using the full year is a good idea
> too. We can combine these to come up with a key that, in principle, is
> guaranteed to be unique. This key would contain:
>
> 1) The first three author names separated by slashes
why not separate by pluses? they don't form part of names either, and
don't cause problems with wiki page titles.

> 2) If there are more than three authors, an EtAl
don't think that's necessary if we get the abc part right.

> 3) Some or all of the date. For instance, if there is only one source by
> this set of authors that year, we can just use YYYY. However, once another
> source by those set of authors is added, the key should change to MMDDYYYY
> or similar.
I don't think it is a good idea to change one key as a function of
updates on another, except for a generic disambiguation tag.

> If there are multiple publications on the same day, we can
> resort to abc. Redirects and disambiguation pages can be set up when a key
> changes.
As Jodi pointed out already, the exact date is often not clearly
identifiable, so I would go simply for the year.
Instead of an alphabetic abc, one could use some function of the
article title (e.g. the first three words thereof, or the initials of
the first three words), always in lower case.

An even less ambiguous abc would be starting page (for printed stuff)
or article number (for online only) but this brings us back to the
7523225 problem you mentioned above.

> Since the slashes are somewhat cumbersome, perhaps we can not make them
> mandatory, but similarly use them only when they are necessary in order to
> "escape" a name. In the case that one of the authors does not have a slash
> in their name - the dominant case - we can stick to the easily legible and
> niecly compact CamelCase format.
>
> Example keys generated by this algorithm:
>
> KangHsuKrajbichEtAl2009
Kang+Hsu+Krajbich+2009+the+wick+in
or
Kang+Hsu+Krajbich+2009+twi

also note that the CamelCase key does not yield results in a google
search, whereas the first plused variant brings up the right work
correctly, while the plused one with initialed title tends to bring at
least something written by or cited from these authors.

> Author1Author2/Author-Three/2009
Author1+Author2+Author-Three+2009+just+another+article
or
Author1+Author2+Author-Three+2009+jat

Of course, it does not have to be _exactly_ three authors, nor three
words from the title, and it does not solve the John Smith (or Zheng
Wang) problem.

Daniel

--
http://www.google.com/profiles/daniel.mietchen

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

Daniel Kinzler
>> 1) The first three author names separated by slashes
> why not separate by pluses? they don't form part of names either, and
> don't cause problems with wiki page titles.

I like this... however, how would you represent this in a URL? Also note that
using plusses in page names don't work with all server configurations, since
plus has a special meaning in URLs.

>> 3) Some or all of the date. For instance, if there is only one source by
>> this set of authors that year, we can just use YYYY. However, once another
>> source by those set of authors is added, the key should change to MMDDYYYY
>> or similar.
> I don't think it is a good idea to change one key as a function of
> updates on another, except for a generic disambiguation tag.

I agree. And if you *have* to use the full date, use YYYYMMDD, not the other way
around, please.

>> Since the slashes are somewhat cumbersome, perhaps we can not make them
>> mandatory, but similarly use them only when they are necessary in order to
>> "escape" a name. In the case that one of the authors does not have a slash
>> in their name - the dominant case - we can stick to the easily legible and
>> niecly compact CamelCase format.
>>
>> Example keys generated by this algorithm:
>>
>> KangHsuKrajbichEtAl2009
> Kang+Hsu+Krajbich+2009+the+wick+in
> or
> Kang+Hsu+Krajbich+2009+twi

Both seem good, though i would suggest to form a convention to ignore any
leading "the" and "a", to a more distinctive 3 word suffix.

> Of course, it does not have to be _exactly_ three authors, nor three
> words from the title, and it does not solve the John Smith (or Zheng
> Wang) problem.

It also doesn't solve issues with transliteration: Merik Möller may become
"Moeller" or "Moller", Jakob Voß may become "Voss" or "Vosz"  or even "VoB",
etc. In case of chinese names, it's often not easy to decide which part is the
last name.

To avoid this kind of ambiguity, i suggest to automatically apply some type of
normalization and/or hashing. There is quite a bit of research about this kind
of normalisation out there, generally with the aim of detecting duplicates.
Perhaps we can learn from bibsonomy.org, have a look how they do it:
<http://www.bibsonomy.org/help/doc/inside.html>.

Gotta love open source university research projects :)

-- daniel



_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

Jodi Schneider-2

On 21 Jul 2010, at 09:42, Daniel Kinzler wrote:
>> Kang+Hsu+Krajbich+2009+the+wick+in

This seems best to me of what's proposed so far.
> Both seem good, though i would suggest to form a convention to ignore any
> leading "the" and "a", to a more distinctive 3 word suffix.

While that's a good idea, then we'd have to know all "indistinctive" words in all languages. (Die, Der, La, L', ...)

There are still going to be duplicates, alas...

>
>> Of course, it does not have to be _exactly_ three authors, nor three
>> words from the title, and it does not solve the John Smith (or Zheng
>> Wang) problem.
>
> It also doesn't solve issues with transliteration: Merik Möller may become
> "Moeller" or "Moller", Jakob Voß may become "Voss" or "Vosz"  or even "VoB",
> etc. In case of chinese names, it's often not easy to decide which part is the
> last name.
>
> To avoid this kind of ambiguity, i suggest to automatically apply some type of
> normalization and/or hashing. There is quite a bit of research about this kind
> of normalisation out there, generally with the aim of detecting duplicates.
> Perhaps we can learn from bibsonomy.org, have a look how they do it:
> <http://www.bibsonomy.org/help/doc/inside.html>.

Good idea!

-Jodi
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

Daniel Kinzler
Jodi Schneider schrieb:
> On 21 Jul 2010, at 09:42, Daniel Kinzler wrote:
>>> Kang+Hsu+Krajbich+2009+the+wick+in
>
> This seems best to me of what's proposed so far.
>> Both seem good, though i would suggest to form a convention to ignore any
>> leading "the" and "a", to a more distinctive 3 word suffix.
>
> While that's a good idea, then we'd have to know all "indistinctive" words in all languages. (Die, Der, La, L', ...)

Stopword lists for major languages exists, and where they don't, they are easily
created, even automatically. Word frequency analysis on a few megabyte of text
is cheap these days :)

-- daniel


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: [Foundation-l] WikiCite - new WMF project? Was: UPEI's proposal for a "universal citation index"

Daniel Mietchen
In reply to this post by Daniel Kinzler
On Wed, Jul 21, 2010 at 10:42 AM, Daniel Kinzler <[hidden email]> wrote:
>>> 1) The first three author names separated by slashes
>> why not separate by pluses? they don't form part of names either, and
>> don't cause problems with wiki page titles.
>
> I like this... however, how would you represent this in a URL?
%2B would seem to be the obvious choice to me.

> Also note that
> using plusses in page names don't work with all server configurations, since
> plus has a special meaning in URLs.

Don't know too much about the double escaping business to comment on that, but
if pluses are not acceptable, we still have equal signs (possibly with
similar problems, but
still useful for direct web search) and underscores (which would turn
the whole key into one
string for search engines).

Daniel

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
123