Demo: coloring the text of the Wikipedia according to its trust

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Demo: coloring the text of the Wikipedia according to its trust

Luca de Alfaro-4
Dear All:

I would like to tell you about a demo we set up, where we color the text of Wikipedia articles according to a computed value of trust.  The demo is available at <a href="http://trust.cse.ucsc.edu/" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)"> http://trust.cse.ucsc.edu/

The trust value of each word of each revision is computed according to the reputation of the original author of the text, as well as the reputation of all authors that subsequently revised the text.

We have uploaded a few hundred pages; for each page, we display the most recent 50 revisions (we analyzed them all, but we just uploaded the most recent 50 to the server).

Of course, there are many other uses of text trust (for example, one could have the option of viewing a "recent high-trust version" of each page upon request), but I believe that this coloring gives an intuitive idea of how it could work.

I will talk about this at Wikimania, for those of you who will be there.  I am looking forward to Wikimania!

Details:

We first analyze the whole English Wikipedia, computing the reputation of each author at every point in time, so that we can answer questions like "what was the reputation of author with id 453 at 5:32 pm of March 14, 2006".  The reputation is computed according to the idea of <a href="http://www.soe.ucsc.edu/%7Eluca/papers/07/wikiwww2007.html" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">content-driven reputation.

For new portions of text, the trust is equal to (a scaling function of) the reputation of the text author.
Portions of text that were already present in the previous revision can gain reputation when the page is revised by higher-reputation authors, especially if those authors perform an edit in proximity of the portion of text.
Portions of text can also lose trust, if low-reputation authors edit in their proximity.
All the algorithms are still very preliminary, and I must still apply a rigorous learning approach to optimize the computation.
Please see the demo page for more details.

All the best,

Luca de Alfaro
http://www.soe.ucsc.edu/~luca


_______________________________________________
Wiki-research-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Demo: coloring the text of the Wikipedia according to its trust

Ben Yates-2
Awesome!  I posted a long response on my blog --
http://wikip.blogspot.com/2007/07/alright-this-is-most-awesome-thing-i.html

On 7/29/07, Luca de Alfaro <[hidden email]> wrote:

> Dear All:
>
> I would like to tell you about a demo we set up, where we color the text of
> Wikipedia articles according to a computed value of trust.  The demo is
> available at http://trust.cse.ucsc.edu/
>
> The trust value of each word of each revision is computed according to the
> reputation of the original author of the text, as well as the reputation of
> all authors that subsequently revised the text.
>
> We have uploaded a few hundred pages; for each page, we display the most
> recent 50 revisions (we analyzed them all, but we just uploaded the most
> recent 50 to the server).
>
> Of course, there are many other uses of text trust (for example, one could
> have the option of viewing a "recent high-trust version" of each page upon
> request), but I believe that this coloring gives an intuitive idea of how it
> could work.
>
> I will talk about this at Wikimania, for those of you who will be there.  I
> am looking forward to Wikimania!
>
> Details:
>
> We first analyze the whole English Wikipedia, computing the reputation of
> each author at every point in time, so that we can answer questions like
> "what was the reputation of author with id 453 at 5:32 pm of March 14,
> 2006".  The reputation is computed according to the idea of content-driven
> reputation.
>
> For new portions of text, the trust is equal to (a scaling function of) the
> reputation of the text author.
> Portions of text that were already present in the previous revision can gain
> reputation when the page is revised by higher-reputation authors, especially
> if those authors perform an edit in proximity of the portion of text.
> Portions of text can also lose trust, if low-reputation authors edit in
> their proximity.
> All the algorithms are still very preliminary, and I must still apply a
> rigorous learning approach to optimize the computation.
> Please see the demo page for more details.
>
> All the best,
>
> Luca de Alfaro
> http://www.soe.ucsc.edu/~luca
>
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> http://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>


--
Ben Yates
Wikipedia blog - http://wikip.blogspot.com

_______________________________________________
Wiki-research-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Demo: coloring the text of the Wikipedia according to its trust

Andre Engels
In reply to this post by Luca de Alfaro-4
2007/7/29, Luca de Alfaro <[hidden email]>:

> We first analyze the whole English Wikipedia, computing the reputation of
> each author at every point in time, so that we can answer questions like
> "what was the reputation of author with id 453 at 5:32 pm of March 14,
> 2006".  The reputation is computed according to the idea of content-driven
> reputation.
>
> For new portions of text, the trust is equal to (a scaling function of) the
> reputation of the text author.
> Portions of text that were already present in the previous revision can gain
> reputation when the page is revised by higher-reputation authors, especially
> if those authors perform an edit in proximity of the portion of text.
> Portions of text can also lose trust, if low-reputation authors edit in
> their proximity.
> All the algorithms are still very preliminary, and I must still apply a
> rigorous learning approach to optimize the computation.
> Please see the demo page for more details.

One thing I find peculiar is that adding a text somewhere can lower
the trust of the surrounding text while at the same thing heightening
that of far away text. Why is that? See for example
http://enwiki-trust.cse.ucsc.edu/index.php?title=Collation&diff=prev&oldid=102784135
- trust:6 text is added between trust:8 text, causing the surrounding
text to go down to trust:6 or even trust:5, but at the same time
improving text elsewhere in the page from trust:8 to trust:9. Why
would the author count as low-reputation for the direct environment,
but high-reputation farther away?

--
Andre Engels, [hidden email]
ICQ: 6260644  --  Skype: a_engels

_______________________________________________
Wiki-research-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Demo: coloring the text of the Wikipedia according to its trust

Luca de Alfaro-4
Dear Andre,

let me say that the algorithms need tuning, so we are not sure we are doing the best, but here is the idea:

When a user of reputation 10 (for example) edits the page, the text that is added only gets trust 6 or so.  It is not immediately considered high trust, because others have not yet had a chance to vet it.

When a user of reputation 10 edits the page, the trust of the text already on the page raises a bit (over several edits, it would approach 10).  This models the fact that the user, by leaving the text there, gave an implicit vote of assent.

The combination of the two effects explains what you are seeing.
The goal is that even high-reputation authors can only lend part of their reputation to the text they create; community vetting is still needed to achieve high trust.

Now as I say, we must still tune the various coefficients in the algorithms via a learning approach, and there is a bit more in the algorithm than i describe above, but that's the rough idea.

Another thing I am pondering is how much a reputation change should spill over paragraph or bullet-point breaks.  I could change easily what I do, but I will first set up the optimization/learning - I want to have some quantitative measure of how well the trust algo behaves.

Thanks for your careful analysis of the results!

Luca

On 7/30/07, Andre Engels <[hidden email]> wrote:
2007/7/29, Luca de Alfaro <[hidden email]>:

> We first analyze the whole English Wikipedia, computing the reputation of
> each author at every point in time, so that we can answer questions like
> "what was the reputation of author with id 453 at 5:32 pm of March 14,
> 2006".  The reputation is computed according to the idea of content-driven
> reputation.
>
> For new portions of text, the trust is equal to (a scaling function of) the
> reputation of the text author.
> Portions of text that were already present in the previous revision can gain
> reputation when the page is revised by higher-reputation authors, especially
> if those authors perform an edit in proximity of the portion of text.
> Portions of text can also lose trust, if low-reputation authors edit in
> their proximity.
> All the algorithms are still very preliminary, and I must still apply a
> rigorous learning approach to optimize the computation.
> Please see the demo page for more details.

One thing I find peculiar is that adding a text somewhere can lower
the trust of the surrounding text while at the same thing heightening
that of far away text. Why is that? See for example
http://enwiki-trust.cse.ucsc.edu/index.php?title=Collation&diff=prev&oldid=102784135
- trust:6 text is added between trust:8 text, causing the surrounding
text to go down to trust:6 or even trust:5, but at the same time
improving text elsewhere in the page from trust:8 to trust:9. Why
would the author count as low-reputation for the direct environment,
but high-reputation farther away?

--
Andre Engels, [hidden email]
ICQ: 6260644  --  Skype: a_engels


_______________________________________________
Wiki-research-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Demo: coloring the text of the Wikipedia according to its trust

Sunir Shah
I really like this project. It's hard to understand the history of a text.
It reminds me of manuscript analysis, like looking at palimpsets, etc.

However, listening to your goals, it sounds like a difficult task. It will
be very hard to stabilize your model of trustworthiness, since it is based
on a some assumptions (e.g. what does someone read when they edit a page?)
that are hard to nail down.

As you are tuning your algorithms, you might take another approach to
simplify matters. It's easier and could be more useful to visualize
behaviour without trying to draw conclusions about what that behaviour
might indicate. This is more powerful in many ways since your algorithm
will never possess knowledge of the full social context that a given user
will have. i.e. maybe a trusted user has gotten into a heated dispute and
become erratic, and no longer trustworthy?

A good summary of how to do social visualizations well is Erickson, 2003
(cf. http://www.bibwiki.com/wiki/design?Erickson,+2003)

These practices are for building a tool that can be used amongst the
entire social group. If you're after a particular research question (i.e.
how influential are trusted authors?), they don't apply as well.

Cheers,
Sunir

> Dear Andre,
>
> let me say that the algorithms need tuning, so we are not sure we are
> doing
> the best, but here is the idea:
>
> When a user of reputation 10 (for example) edits the page, the text that
> is
> added only gets trust 6 or so.  It is not immediately considered high
> trust,
> because others have not yet had a chance to vet it.
>
> When a user of reputation 10 edits the page, the trust of the text already
> on the page raises a bit (over several edits, it would approach 10).  This
> models the fact that the user, by leaving the text there, gave an implicit
> vote of assent.
>
> The combination of the two effects explains what you are seeing.
> The goal is that even high-reputation authors can only lend part of their
> reputation to the text they create; community vetting is still needed to
> achieve high trust.
>
> Now as I say, we must still tune the various coefficients in the
> algorithms
> via a learning approach, and there is a bit more in the algorithm than i
> describe above, but that's the rough idea.
>
> Another thing I am pondering is how much a reputation change should spill
> over paragraph or bullet-point breaks.  I could change easily what I do,
> but
> I will first set up the optimization/learning - I want to have some
> quantitative measure of how well the trust algo behaves.
>
> Thanks for your careful analysis of the results!
>
> Luca
>
> On 7/30/07, Andre Engels <[hidden email]> wrote:
>>
>> 2007/7/29, Luca de Alfaro <[hidden email]>:
>>
>> > We first analyze the whole English Wikipedia, computing the reputation
>> of
>> > each author at every point in time, so that we can answer questions
>> like
>> > "what was the reputation of author with id 453 at 5:32 pm of March 14,
>> > 2006".  The reputation is computed according to the idea of
>> content-driven
>> > reputation.
>> >
>> > For new portions of text, the trust is equal to (a scaling function
>> of)
>> the
>> > reputation of the text author.
>> > Portions of text that were already present in the previous revision
>> can
>> gain
>> > reputation when the page is revised by higher-reputation authors,
>> especially
>> > if those authors perform an edit in proximity of the portion of text.
>> > Portions of text can also lose trust, if low-reputation authors edit
>> in
>> > their proximity.
>> > All the algorithms are still very preliminary, and I must still apply
>> a
>> > rigorous learning approach to optimize the computation.
>> > Please see the demo page for more details.
>>
>> One thing I find peculiar is that adding a text somewhere can lower
>> the trust of the surrounding text while at the same thing heightening
>> that of far away text. Why is that? See for example
>>
>> http://enwiki-trust.cse.ucsc.edu/index.php?title=Collation&diff=prev&oldid=102784135
>> - trust:6 text is added between trust:8 text, causing the surrounding
>> text to go down to trust:6 or even trust:5, but at the same time
>> improving text elsewhere in the page from trust:8 to trust:9. Why
>> would the author count as low-reputation for the direct environment,
>> but high-reputation farther away?
>>
>> --
>> Andre Engels, [hidden email]
>> ICQ: 6260644  --  Skype: a_engels
>>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> http://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>



_______________________________________________
Wiki-research-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Demo: coloring the text of the Wikipedia according to its trust

Ben Yates-2
I'm waiting for someone to develop a wikipedia client for OS X that
uses core animation (and other frameworks) to animate changes in an
article over time.  There's a huge amount of potential on the client
side because Wikipedia, server side, is always concentrating on how to
make sure the doesn't go down; there's not enough money for frills.

On 7/30/07, [hidden email] <[hidden email]> wrote:

> I really like this project. It's hard to understand the history of a text.
> It reminds me of manuscript analysis, like looking at palimpsets, etc.
>
> However, listening to your goals, it sounds like a difficult task. It will
> be very hard to stabilize your model of trustworthiness, since it is based
> on a some assumptions (e.g. what does someone read when they edit a page?)
> that are hard to nail down.
>
> As you are tuning your algorithms, you might take another approach to
> simplify matters. It's easier and could be more useful to visualize
> behaviour without trying to draw conclusions about what that behaviour
> might indicate. This is more powerful in many ways since your algorithm
> will never possess knowledge of the full social context that a given user
> will have. i.e. maybe a trusted user has gotten into a heated dispute and
> become erratic, and no longer trustworthy?
>
> A good summary of how to do social visualizations well is Erickson, 2003
> (cf. http://www.bibwiki.com/wiki/design?Erickson,+2003)
>
> These practices are for building a tool that can be used amongst the
> entire social group. If you're after a particular research question (i.e.
> how influential are trusted authors?), they don't apply as well.
>
> Cheers,
> Sunir
>
> > Dear Andre,
> >
> > let me say that the algorithms need tuning, so we are not sure we are
> > doing
> > the best, but here is the idea:
> >
> > When a user of reputation 10 (for example) edits the page, the text that
> > is
> > added only gets trust 6 or so.  It is not immediately considered high
> > trust,
> > because others have not yet had a chance to vet it.
> >
> > When a user of reputation 10 edits the page, the trust of the text already
> > on the page raises a bit (over several edits, it would approach 10).  This
> > models the fact that the user, by leaving the text there, gave an implicit
> > vote of assent.
> >
> > The combination of the two effects explains what you are seeing.
> > The goal is that even high-reputation authors can only lend part of their
> > reputation to the text they create; community vetting is still needed to
> > achieve high trust.
> >
> > Now as I say, we must still tune the various coefficients in the
> > algorithms
> > via a learning approach, and there is a bit more in the algorithm than i
> > describe above, but that's the rough idea.
> >
> > Another thing I am pondering is how much a reputation change should spill
> > over paragraph or bullet-point breaks.  I could change easily what I do,
> > but
> > I will first set up the optimization/learning - I want to have some
> > quantitative measure of how well the trust algo behaves.
> >
> > Thanks for your careful analysis of the results!
> >
> > Luca
> >
> > On 7/30/07, Andre Engels <[hidden email]> wrote:
> >>
> >> 2007/7/29, Luca de Alfaro <[hidden email]>:
> >>
> >> > We first analyze the whole English Wikipedia, computing the reputation
> >> of
> >> > each author at every point in time, so that we can answer questions
> >> like
> >> > "what was the reputation of author with id 453 at 5:32 pm of March 14,
> >> > 2006".  The reputation is computed according to the idea of
> >> content-driven
> >> > reputation.
> >> >
> >> > For new portions of text, the trust is equal to (a scaling function
> >> of)
> >> the
> >> > reputation of the text author.
> >> > Portions of text that were already present in the previous revision
> >> can
> >> gain
> >> > reputation when the page is revised by higher-reputation authors,
> >> especially
> >> > if those authors perform an edit in proximity of the portion of text.
> >> > Portions of text can also lose trust, if low-reputation authors edit
> >> in
> >> > their proximity.
> >> > All the algorithms are still very preliminary, and I must still apply
> >> a
> >> > rigorous learning approach to optimize the computation.
> >> > Please see the demo page for more details.
> >>
> >> One thing I find peculiar is that adding a text somewhere can lower
> >> the trust of the surrounding text while at the same thing heightening
> >> that of far away text. Why is that? See for example
> >>
> >> http://enwiki-trust.cse.ucsc.edu/index.php?title=Collation&diff=prev&oldid=102784135
> >> - trust:6 text is added between trust:8 text, causing the surrounding
> >> text to go down to trust:6 or even trust:5, but at the same time
> >> improving text elsewhere in the page from trust:8 to trust:9. Why
> >> would the author count as low-reputation for the direct environment,
> >> but high-reputation farther away?
> >>
> >> --
> >> Andre Engels, [hidden email]
> >> ICQ: 6260644  --  Skype: a_engels
> >>
> > _______________________________________________
> > Wiki-research-l mailing list
> > [hidden email]
> > http://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
>
>
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> http://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>


--
Ben Yates
Wikipedia blog - http://wikip.blogspot.com

_______________________________________________
Wiki-research-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/wiki-research-l