WikiTrust and authorship

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

WikiTrust and authorship

Daniel Kinzler
Hi all

Most of you probably have heard of WikiTrust [1], a tool that colors parts of
MediaWiki pages based upon a calculated trust value. The demo [2] is quite
impressive. I think this would especially help us to spot "subtle" vandalism
more easily.

But WikiTrust could also solve another problem that has been coming time and
time again, and has been discussed again recently in the German community: how
to determine the main authors of an article, and how to find out who put a
specific statement into an article. Tracking and assessing authorship is
something many people are interested in, and I think I can speak for a lot of
people in saying that we would really love to have that on the German language
Wikipedia. It would be particularly helpful for print version, the method
currently used by PediaPress is more than doubtful, and is getting ripped apart
on the Verein's mailing list currently.

WikiTrust is getting more and more mature, and Luca de Alfaro and his team have
been working hard on making it a lot more efficient. The one thing that still
worries me is the fact that it would require quite a bit of storage space.
Anyway, Luca really wants to integrate it into Wikipedia and other WMF wikis --
and so do I. I think that, besides being a useful tool to the community, it
could also boost our credibility in academia, because authorship becomes much
more transparent. Compare what WikiGenes [3] does [4]. I want that for
Wikipedia. Not for making authors more prominent, but making authorship more
transparent.

So, what would it take? Where could we try it? what are the concerns?

-- Daniel

PS: I can try to supply some technical details if required, I hope Luca will
save me from getting stuff wrong :)

[1] http://trust.cse.ucsc.edu/
[2] http://wiki-trust.cse.ucsc.edu/index.php/Main_Page
[3] http://www.wikigenes.org/
[4] http://www.mememoir.org/

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: WikiTrust and authorship

Tei-2
On Sat, Oct 18, 2008 at 2:57 PM, Daniel Kinzler <[hidden email]> wrote:
...
> The one thing that still worries me is the fact that it would require quite a bit of storage space.

Maybe you can automark pages for credibility based on some subject
like ( is the page a stub?, how old/how much edits/... etc.. )..
before printing, and delete that pages,


> Tracking and assessing authorship is
> something many people are interested in, and I think I can speak for a lot of
> people in saying that we would really love to have that on the German language
> Wikipedia. It would be particularly helpful for print version, the method
> currently used by PediaPress is more than doubtful, and is getting ripped apart
> on the Verein's mailing list currently.

I can't help you here *scratch head*  Anyway the wikipedia is a wiki,
Is designed to make anonymous edits easy so everyone could edit.  The
other option, is a different type of pedia, a expert-pedia where only
credited academia experts could add his opinions.

That was ...Nupedia?
http://en.wikipedia.org/wiki/Nupedia&amp;&
http://nupedia.8media.org/  (ooops .... 404)

> Most of you probably have heard of WikiTrust [1], a tool that colors parts of
> MediaWiki pages based upon a calculated trust value. The demo [2] is quite
> impressive. I think this would especially help us to spot "subtle" vandalism
> more easily.

Nifty tools :-)

--
--
ℱin del ℳensaje.
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: WikiTrust and authorship

Daniel Kinzler
Tei schrieb:
> On Sat, Oct 18, 2008 at 2:57 PM, Daniel Kinzler <[hidden email]> wrote:
> ...
>> The one thing that still worries me is the fact that it would require quite a bit of storage space.
>
> Maybe you can automark pages for credibility based on some subject
> like ( is the page a stub?, how old/how much edits/... etc.. )..
> before printing, and delete that pages,

This is not about printing. WikiTrust determins the trust level for evey *word*
of every page on the Wiki. To do this, more storage space is required.

>> Tracking and assessing authorship is
>> something many people are interested in, and I think I can speak for a lot of
>> people in saying that we would really love to have that on the German language
>> Wikipedia. It would be particularly helpful for print version, the method
>> currently used by PediaPress is more than doubtful, and is getting ripped apart
>> on the Verein's mailing list currently.
>
> I can't help you here *scratch head*  Anyway the wikipedia is a wiki,
> Is designed to make anonymous edits easy so everyone could edit.  The
> other option, is a different type of pedia, a expert-pedia where only
> credited academia experts could add his opinions.

Anonymous edits are not a problem. The problem is that the GFDL requires me to
credit at least the 5 "main" authors, so the question is, how to determine them.
Similarly, academic citing practices call for the 3 main authors. WikiTrust
would allow us to easily determine who has contributed how much to a given
version of a page. Which would be quite useful.

-- daniel


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: WikiTrust and authorship

Tim Starling-2
In reply to this post by Daniel Kinzler
Daniel Kinzler wrote:

> Hi all
>
> Most of you probably have heard of WikiTrust [1], a tool that colors parts of
> MediaWiki pages based upon a calculated trust value. The demo [2] is quite
> impressive. I think this would especially help us to spot "subtle" vandalism
> more easily.
>
> But WikiTrust could also solve another problem that has been coming time and
> time again, and has been discussed again recently in the German community: how
> to determine the main authors of an article, and how to find out who put a
> specific statement into an article.

de Alfaro deliberately left that feature out of the demo that he showed me
in 2007, I don't know if it's been added since. I'd rather see an
annotation feature showing author names than reputation colouring. The
reputation metric is the novel part of de Alfaro's work, hence his
emphasis on it. But I think author annotation is a more serious and useful
application for the software.

Someone might have to write a user interface for it.

-- Tim Starling


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: WikiTrust and authorship

Tei-2
In reply to this post by Daniel Kinzler
On Sat, Oct 18, 2008 at 10:28 PM, Daniel Kinzler <[hidden email]> wrote:
> Tei schrieb:
...
>
> Anonymous edits are not a problem. The problem is that the GFDL requires me to
> credit at least the 5 "main" authors, so the question is, how to determine them.
> Similarly, academic citing practices call for the 3 main authors. WikiTrust
> would allow us to easily determine who has contributed how much to a given
> version of a page. Which would be quite useful.

Seems there exist already tools to list contributors to a page.

Creators of Dios:
http://toolserver.org/~escaladix/cgi-bin/auteurs.tcl?title=Dios&lang=es
http://toolserver.org/~daniel/WikiSense/Contributors.php?wikifam=.wikipedia.org&wikilang=es&page=Dios&max=200&grouped=on&order=first_edit

It will be normal for a wiki page to have 80 authors (thats a fact).
If you want to chose only 3, you have to  ignoring some authors for
some subjective bias, like... strlen(concat(modifications)) ,
COUNT(edits),... o using  WikiGenes (I guest, wikigenes work almost
like that "Blame" feature of a CVS system).  I feel like you will be
lying to support some external limitation :/
Who is the author of
http://en.wikipedia.org/wiki/One_Thousand_and_One_Nights&amp;&  ?
Maybe you sould ask for a exception on the GFDL, make so the authors
of GFDL make a new version of the license that support how a wiki
work, to avoid report 3 authors for a text that (by fact) has 80
authors.



--
--
ℱin del ℳensaje.
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: WikiTrust and authorship

Gerard Meijssen-3
Hoi,
The GFDL is intended for the documentation of software.. The WMF is finding
a route towards a more appropriate license.. Getting an exception to make
our life more easy is not really realistic I would say. The current practice
is that people refer to the Wikipedia article and this is where you find all
the authors.. a really pragmatic approach to something that would otherwise
be unwieldy and hinder the freedom of using this material.
Thanks,
      GerardM

On Sun, Oct 19, 2008 at 10:04 AM, Tei <[hidden email]> wrote:

> On Sat, Oct 18, 2008 at 10:28 PM, Daniel Kinzler <[hidden email]>
> wrote:
> > Tei schrieb:
> ...
> >
> > Anonymous edits are not a problem. The problem is that the GFDL requires
> me to
> > credit at least the 5 "main" authors, so the question is, how to
> determine them.
> > Similarly, academic citing practices call for the 3 main authors.
> WikiTrust
> > would allow us to easily determine who has contributed how much to a
> given
> > version of a page. Which would be quite useful.
>
> Seems there exist already tools to list contributors to a page.
>
> Creators of Dios:
> http://toolserver.org/~escaladix/cgi-bin/auteurs.tcl?title=Dios&lang=es<http://toolserver.org/%7Eescaladix/cgi-bin/auteurs.tcl?title=Dios&lang=es>
>
> http://toolserver.org/~daniel/WikiSense/Contributors.php?wikifam=.wikipedia.org&wikilang=es&page=Dios&max=200&grouped=on&order=first_edit<http://toolserver.org/%7Edaniel/WikiSense/Contributors.php?wikifam=.wikipedia.org&wikilang=es&page=Dios&max=200&grouped=on&order=first_edit>
>
> It will be normal for a wiki page to have 80 authors (thats a fact).
> If you want to chose only 3, you have to  ignoring some authors for
> some subjective bias, like... strlen(concat(modifications)) ,
> COUNT(edits),... o using  WikiGenes (I guest, wikigenes work almost
> like that "Blame" feature of a CVS system).  I feel like you will be
> lying to support some external limitation :/
> Who is the author of
> http://en.wikipedia.org/wiki/One_Thousand_and_One_Nights&amp;&  ?
> Maybe you sould ask for a exception on the GFDL, make so the authors
> of GFDL make a new version of the license that support how a wiki
> work, to avoid report 3 authors for a text that (by fact) has 80
> authors.
>
>
>
> --
> --
> ℱin del ℳensaje.
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: WikiTrust and authorship

Daniel Kinzler
In reply to this post by Tei-2
Tei wrote:
> Seems there exist already tools to list contributors to a page.
>
> Creators of Dios:
> http://toolserver.org/~escaladix/cgi-bin/auteurs.tcl?title=Dios&lang=es
> http://toolserver.org/~daniel/WikiSense/Contributors.php?wikifam=.wikipedia.org&wikilang=es&page=Dios&max=200&grouped=on&order=first_edit

I know, I wrote the second one.

> It will be normal for a wiki page to have 80 authors (thats a fact).
> If you want to chose only 3, you have to  ignoring some authors for
> some subjective bias, like... strlen(concat(modifications)) ,
> COUNT(edits),... o using  WikiGenes (I guest, wikigenes work almost
> like that "Blame" feature of a CVS system).  I feel like you will be
> lying to support some external limitation :/

Metrics like number of edits, or difference in size, etc, are trivial and useless.

The metric that makes most sense to me is "number of words contributed to the
current version". To get that number, you have to track text contributed by each
edit across all following edits, considering reverts, moving paragraphs, etc --
like blame, but a bit more advanced even. This is a complex task -- and it's
exactly what WikiTrust does. Which is why I'm writing about it.

> Who is the author of
> http://en.wikipedia.org/wiki/One_Thousand_and_One_Nights&amp;&  ?
> Maybe you sould ask for a exception on the GFDL, make so the authors
> of GFDL make a new version of the license that support how a wiki
> work, to avoid report 3 authors for a text that (by fact) has 80
> authors.

This is not an option for PediaPress, which is a service that lets users pick a
set of pages from Wikibooks (and soon also Wikipedia) and make a print version
from that. You can of course always list all authors, but even then, you may
want to rank them by the amount they contributed. And if you are able to do
that, the GFDL allows you to only name the top 5, wich makes thinkgs a bit less
confusing, especially in print.


Anyway, you seem to miss the point. I'm not looking for ways to track
authorship, I already know the solution. I want to discuss the technical aspects
of implementing it on Wikimedia servers.

-- daniel

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: WikiTrust and authorship

Daniel Kinzler
In reply to this post by Gerard Meijssen-3
Gerard Meijssen schrieb:
> Hoi,
> The GFDL is intended for the documentation of software.. The WMF is finding
> a route towards a more appropriate license.. Getting an exception to make
> our life more easy is not really realistic I would say. The current practice
> is that people refer to the Wikipedia article and this is where you find all
> the authors.. a really pragmatic approach to something that would otherwise
> be unwieldy and hinder the freedom of using this material.
> Thanks,
>       GerardM

For online re-use, I'd say that is OK. Not in print, however. And Wikipedians
appear to feel the same way. There's a hell of a brouhaha about the ways
Bertelsman handled attribution in their "best of Wikipedia" book.

-- daniel

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: WikiTrust and authorship

Daniel Kinzler
In reply to this post by Tim Starling-2
Tim Starling schrieb:
> Daniel Kinzler wrote:
...

>> But WikiTrust could also solve another problem that has been coming time and
>> time again, and has been discussed again recently in the German community: how
>> to determine the main authors of an article, and how to find out who put a
>> specific statement into an article.
>
> de Alfaro deliberately left that feature out of the demo that he showed me
> in 2007, I don't know if it's been added since. I'd rather see an
> annotation feature showing author names than reputation colouring. The
> reputation metric is the novel part of de Alfaro's work, hence his
> emphasis on it. But I think author annotation is a more serious and useful
> application for the software.
>
> Someone might have to write a user interface for it.

I'm in close contact with de Alfaro (met him at WikiSym), and told him that the
authorship aspect is the feature most wanted by Wikipedias. He promised to
fast-track implementation, and said he'd be working at it himself this weekend.
He *really* wants to get this out there. So i'm confident :)

So, again: what would have to be done to get this live? Do you think it would be
best to first run it on a not-so-big Wikipedia (maybe we should ask NL)? How
soon could we try it on a test or lab wiki? What'S the procedure, who needs to
approve?

-- daniel

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: WikiTrust and authorship

Nikola Smolenski
In reply to this post by Daniel Kinzler
On Saturday 18 October 2008 14:57:59 Daniel Kinzler wrote:
> So, what would it take? Where could we try it? what are the concerns?

FWIW, copying my email to M. Schneider:
   
IIRC, on Wikimania you talked about the problem of how to identify primary
authors of articles, so I wanted to share my thoughts on this.

The obvious first step is to go through all the revisions and get MD5 of each;
then, use MD5s to isolate and disregard edits that have been reverted.

To measure difference between two edits, I mentioned you that wdiff (
http://www.gnu.org/software/wdiff/ ) could be used: simply count number of
changed words in the article. Wdiff could give false positives (an author
that merely switches two paragraphs will appear to be a major author), but
could not give false negatives (an author who changes a single word really
did just change a single word; of course, such a change may be very
important, but isn't major, or, IMO, copyrightable).

More sophisticated diffs could also be introduced. For example, it would be
relatively simple to make a program that tries to find if an author has
switched two (or more) paragraphs, then apply a diff program as if they
haven't been switched.

Finally, disregard bots, as they can claim no copyright :) (More
realistically, this should be checked on a per-bot basis.)

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: WikiTrust and authorship

Tei-2
On Tue, Oct 21, 2008 at 12:33 AM, Nikola Smolenski <[hidden email]> wrote:

> On Saturday 18 October 2008 14:57:59 Daniel Kinzler wrote:
>> So, what would it take? Where could we try it? what are the concerns?
>
> FWIW, copying my email to M. Schneider:
>
> IIRC, on Wikimania you talked about the problem of how to identify primary
> authors of articles, so I wanted to share my thoughts on this.
>
> The obvious first step is to go through all the revisions and get MD5 of each;
> then, use MD5s to isolate and disregard edits that have been reverted.
>
> To measure difference between two edits, I mentioned you that wdiff (
> http://www.gnu.org/software/wdiff/ ) could be used: simply count number of
> changed words in the article. Wdiff could give false positives (an author
> that merely switches two paragraphs will appear to be a major author), but
> could not give false negatives (an author who changes a single word really
> did just change a single word; of course, such a change may be very
> important, but isn't major, or, IMO, copyrightable).
>
> More sophisticated diffs could also be introduced. For example, it would be
> relatively simple to make a program that tries to find if an author has
> switched two (or more) paragraphs, then apply a diff program as if they
> haven't been switched.

or totally disregard order
 cat article | sed -e 's/( |\t)/\n/g' | sort





--
--
ℱin del ℳensaje.
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: WikiTrust and authorship

Nikola Smolenski
On Tuesday 21 October 2008 08:59:06 Tei wrote:
> On Tue, Oct 21, 2008 at 12:33 AM, Nikola Smolenski <[hidden email]>
wrote:

> > On Saturday 18 October 2008 14:57:59 Daniel Kinzler wrote:
> >> So, what would it take? Where could we try it? what are the concerns?
> >
> > To measure difference between two edits, I mentioned you that wdiff (
> > http://www.gnu.org/software/wdiff/ ) could be used: simply count number
> > of changed words in the article. Wdiff could give false positives (an
> > author that merely switches two paragraphs will appear to be a major
> > author), but could not give false negatives (an author who changes a
> > single word really did just change a single word; of course, such a
> > change may be very important, but isn't major, or, IMO, copyrightable).
> >
> > More sophisticated diffs could also be introduced. For example, it would
> > be relatively simple to make a program that tries to find if an author
> > has switched two (or more) paragraphs, then apply a diff program as if
> > they haven't been switched.
>
> or totally disregard order
>  cat article | sed -e 's/( |\t)/\n/g' | sort

That's an excellent idea! It loses some things, but for measuring size of a
change it's simple and it works.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: WikiTrust and authorship

Brion Vibber-3
In reply to this post by Daniel Kinzler
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Daniel Kinzler wrote:
> So, again: what would have to be done to get this live? Do you think it would be
> best to first run it on a not-so-big Wikipedia (maybe we should ask NL)? How
> soon could we try it on a test or lab wiki? What'S the procedure, who needs to
> approve?

I chatted a little with Luca a while ago about deployment requirements;
basically we need to make sure the software architecture can be set up
and run relatively hands-off, and in a way that won't impact primary
operations much.

Hopefully we'll continue working out such details and start getting some
demos up!

- -- brion
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkj/cZIACgkQwRnhpk1wk44kSQCgqxyjN7Dw8/PmuUMvdGGA5d13
jxwAn0oTF5ds4rLTUyhi7E2Q7la7Pbea
=yG9C
-----END PGP SIGNATURE-----

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: WikiTrust and authorship

Daniel Kinzler
Brion Vibber schrieb:
> I chatted a little with Luca a while ago about deployment requirements;
> basically we need to make sure the software architecture can be set up
> and run relatively hands-off, and in a way that won't impact primary
> operations much.
>
> Hopefully we'll continue working out such details and start getting some
> demos up!

Indeed :) I'm glad you are also interested in getting this out. I'm trying to
keep this project in people's minds. I hope we will have a demo that includes
authorship highlighting -- I want to showcase that to the dewp people, and
hopefully get some pressure behind the project.

-- daniel

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l