Suggestion for solving the disambiguation problem

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Suggestion for solving the disambiguation problem

Jon Robson
I understand there is an issue that needs solving where various pages
link to disambiguation pages. These need fixing to point at the
appropriate thing.

I had a thought on how this might be done using a variant of EventLogging...

When a user clicks on a link that is a disambiguation page and then
clicks on a link on that page we log an event that contains

* page user was on before
* page user is on now

If we were to collect this data it would allow us to statistically
suggest what the  correct disambiguation page might be.

To take a more concrete theoretical example:
* If I am on the Wiki page for William Blake and click on London I am
taken to https://en.wikipedia.org/wiki/London_(disambiguation)
* I look through and see London (poem) and click on it
* An event is fired that links London (poem) to William Blake.

Obviously this won't always be accurate but I'd expect generally this
would work (obviously we'd need to filter out bots)

Then when editing William Blake say that disambiguation links are
surfaced. If I go to fix one it might prompt me that 80% of visitors
go from William Blake to London (poem).


Have we done anything like this in the past? (Collecting data from
readers and informing editors)

I can imagine applying this sort of pattern could have various other uses...




--
Jon Robson
http://jonrobson.me.uk
@rakugojon

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Suggestion for solving the disambiguation problem

Nicolas Vervelle-4
Interesting idea...


On Mon, Jul 15, 2013 at 11:41 PM, Jon Robson <[hidden email]> wrote:

> I understand there is an issue that needs solving where various pages
> link to disambiguation pages. These need fixing to point at the
> appropriate thing.
>
> I had a thought on how this might be done using a variant of
> EventLogging...
>
> When a user clicks on a link that is a disambiguation page and then
> clicks on a link on that page we log an event that contains
>
> * page user was on before
> * page user is on now
>
> If we were to collect this data it would allow us to statistically
> suggest what the  correct disambiguation page might be.
>
> To take a more concrete theoretical example:
> * If I am on the Wiki page for William Blake and click on London I am
> taken to https://en.wikipedia.org/wiki/London_(disambiguation)
> * I look through and see London (poem) and click on it
> * An event is fired that links London (poem) to William Blake.
>
> Obviously this won't always be accurate but I'd expect generally this
> would work (obviously we'd need to filter out bots)
>
> Then when editing William Blake say that disambiguation links are
> surfaced. If I go to fix one it might prompt me that 80% of visitors
> go from William Blake to London (poem).
>
>
> Have we done anything like this in the past? (Collecting data from
> readers and informing editors)
>
> I can imagine applying this sort of pattern could have various other
> uses...
>
>
>
>
> --
> Jon Robson
> http://jonrobson.me.uk
> @rakugojon
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Suggestion for solving the disambiguation problem

David Cuenca Tudela
Good idea, it could also help to know which are the links more used in a
disambiguation page to sort them by importance.

Micru

On Tue, Jul 16, 2013 at 2:03 PM, Nicolas Vervelle <[hidden email]>wrote:

> Interesting idea...
>
>
> On Mon, Jul 15, 2013 at 11:41 PM, Jon Robson <[hidden email]> wrote:
>
> > I understand there is an issue that needs solving where various pages
> > link to disambiguation pages. These need fixing to point at the
> > appropriate thing.
> >
> > I had a thought on how this might be done using a variant of
> > EventLogging...
> >
> > When a user clicks on a link that is a disambiguation page and then
> > clicks on a link on that page we log an event that contains
> >
> > * page user was on before
> > * page user is on now
> >
> > If we were to collect this data it would allow us to statistically
> > suggest what the  correct disambiguation page might be.
> >
> > To take a more concrete theoretical example:
> > * If I am on the Wiki page for William Blake and click on London I am
> > taken to https://en.wikipedia.org/wiki/London_(disambiguation)
> > * I look through and see London (poem) and click on it
> > * An event is fired that links London (poem) to William Blake.
> >
> > Obviously this won't always be accurate but I'd expect generally this
> > would work (obviously we'd need to filter out bots)
> >
> > Then when editing William Blake say that disambiguation links are
> > surfaced. If I go to fix one it might prompt me that 80% of visitors
> > go from William Blake to London (poem).
> >
> >
> > Have we done anything like this in the past? (Collecting data from
> > readers and informing editors)
> >
> > I can imagine applying this sort of pattern could have various other
> > uses...
> >
> >
> >
> >
> > --
> > Jon Robson
> > http://jonrobson.me.uk
> > @rakugojon
> >
> > _______________________________________________
> > Wikitech-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



--
Etiamsi omnes, ego non
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Suggestion for solving the disambiguation problem

lee worden
In reply to this post by Jon Robson
Maybe it could be done with just the Referer field on the second
request, without needing to log two different page requests and
correlate them.

> Date: Tue, 16 Jul 2013 14:14:42 -0400
> From: David Cuenca<[hidden email]>
>
> Good idea, it could also help to know which are the links more used in a
> disambiguation page to sort them by importance.
>
> Micru
>
> On Tue, Jul 16, 2013 at 2:03 PM, Nicolas Vervelle<[hidden email]>wrote:
>
>> >Interesting idea...
>> >
>> >
>> >On Mon, Jul 15, 2013 at 11:41 PM, Jon Robson<[hidden email]>  wrote:
>> >
>>> > >I understand there is an issue that needs solving where various pages
>>> > >link to disambiguation pages. These need fixing to point at the
>>> > >appropriate thing.
>>> > >
>>> > >I had a thought on how this might be done using a variant of
>>> > >EventLogging...
>>> > >
>>> > >When a user clicks on a link that is a disambiguation page and then
>>> > >clicks on a link on that page we log an event that contains
>>> > >
>>> > >* page user was on before
>>> > >* page user is on now
>>> > >
>>> > >If we were to collect this data it would allow us to statistically
>>> > >suggest what the  correct disambiguation page might be.
>>> > >
>>> > >To take a more concrete theoretical example:
>>> > >* If I am on the Wiki page for William Blake and click on London I am
>>> > >taken tohttps://en.wikipedia.org/wiki/London_(disambiguation)
>>> > >* I look through and see London (poem) and click on it
>>> > >* An event is fired that links London (poem) to William Blake.
>>> > >
>>> > >Obviously this won't always be accurate but I'd expect generally this
>>> > >would work (obviously we'd need to filter out bots)
>>> > >
>>> > >Then when editing William Blake say that disambiguation links are
>>> > >surfaced. If I go to fix one it might prompt me that 80% of visitors
>>> > >go from William Blake to London (poem).
>>> > >
>>> > >
>>> > >Have we done anything like this in the past? (Collecting data from
>>> > >readers and informing editors)
>>> > >
>>> > >I can imagine applying this sort of pattern could have various other
>>> > >uses...
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >--
>>> > >Jon Robson
>>> > >http://jonrobson.me.uk
>>> > >@rakugojon
>>> > >

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Suggestion for solving the disambiguation problem

John Doe-27
Without having the origin page making the connection wouldnt be possible.
(you would just end up suggesting the most common result in stead of the
most accurate )

On Tue, Jul 16, 2013 at 10:37 PM, Lee Worden <[hidden email]> wrote:

> Maybe it could be done with just the Referer field on the second request,
> without needing to log two different page requests and correlate them.
>
>  Date: Tue, 16 Jul 2013 14:14:42 -0400
>> From: David Cuenca<[hidden email]>
>>
>>
>> Good idea, it could also help to know which are the links more used in a
>> disambiguation page to sort them by importance.
>>
>> Micru
>>
>> On Tue, Jul 16, 2013 at 2:03 PM, Nicolas Vervelle<[hidden email]>**
>> wrote:
>>
>>  >Interesting idea...
>>> >
>>> >
>>> >On Mon, Jul 15, 2013 at 11:41 PM, Jon Robson<[hidden email]>
>>>  wrote:
>>> >
>>>
>>>> > >I understand there is an issue that needs solving where various pages
>>>> > >link to disambiguation pages. These need fixing to point at the
>>>> > >appropriate thing.
>>>> > >
>>>> > >I had a thought on how this might be done using a variant of
>>>> > >EventLogging...
>>>> > >
>>>> > >When a user clicks on a link that is a disambiguation page and then
>>>> > >clicks on a link on that page we log an event that contains
>>>> > >
>>>> > >* page user was on before
>>>> > >* page user is on now
>>>> > >
>>>> > >If we were to collect this data it would allow us to statistically
>>>> > >suggest what the  correct disambiguation page might be.
>>>> > >
>>>> > >To take a more concrete theoretical example:
>>>> > >* If I am on the Wiki page for William Blake and click on London I am
>>>> > >taken tohttps://en.wikipedia.org/**wiki/London_(disambiguation)<http://en.wikipedia.org/wiki/London_(disambiguation)>
>>>>
>>>> > >* I look through and see London (poem) and click on it
>>>> > >* An event is fired that links London (poem) to William Blake.
>>>> > >
>>>> > >Obviously this won't always be accurate but I'd expect generally this
>>>> > >would work (obviously we'd need to filter out bots)
>>>> > >
>>>> > >Then when editing William Blake say that disambiguation links are
>>>> > >surfaced. If I go to fix one it might prompt me that 80% of visitors
>>>> > >go from William Blake to London (poem).
>>>> > >
>>>> > >
>>>> > >Have we done anything like this in the past? (Collecting data from
>>>> > >readers and informing editors)
>>>> > >
>>>> > >I can imagine applying this sort of pattern could have various other
>>>> > >uses...
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >--
>>>> > >Jon Robson
>>>> > >http://jonrobson.me.uk
>>>> > >@rakugojon
>>>> > >
>>>>
>>>
> ______________________________**_________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/**mailman/listinfo/wikitech-l<https://lists.wikimedia.org/mailman/listinfo/wikitech-l>
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Suggestion for solving the disambiguation problem

Tyler Romeo
In reply to this post by lee worden
There's one issue with this. This assumes that links to disambiguated pages
are the only types of links on a disambiguation page. What if somebody
clicks a category link at the bottom of the page? Or what if there's just
another different link?

You'd need a way to distinguish exactly what articles are being
disambiguated on the page.

*-- *
*Tyler Romeo*
Stevens Institute of Technology, Class of 2016
Major in Computer Science
www.whizkidztech.com | [hidden email]
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Suggestion for solving the disambiguation problem

Jon Robson
On Tue, Jul 16, 2013 at 7:45 PM, Tyler Romeo <[hidden email]> wrote:
> There's one issue with this. This assumes that links to disambiguated pages
> are the only types of links on a disambiguation page. What if somebody
> clicks a category link at the bottom of the page? Or what if there's just
> another different link?

I don't suspect this is much of an issue if constrained to the content
element but if it was I imagine these links would be relatively easy
to distinguish via ignoring any links with a ':' in it using regex or
worst case scenario a soundex algorithm. I'd still suspect the
disambiguation links would be the most popular clicked links...

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Suggestion for solving the disambiguation problem

Tyler Romeo
On Wed, Jul 17, 2013 at 2:35 AM, Jon Robson <[hidden email]> wrote:

> I don't suspect this is much of an issue if constrained to the content
> element but if it was I imagine these links would be relatively easy
> to distinguish via ignoring any links with a ':' in it using regex or
> worst case scenario a soundex algorithm. I'd still suspect the
> disambiguation links would be the most popular clicked links...
>

Even if you restrict it like that, it's still an issue. You have pages like
http://en.wikipedia.org/wiki/007_(disambiguation) or
http://en.wikipedia.org/wiki/11_Squadron, which have a See also section
that is usually unrelated to the disambiguated topic, but still may be
clicked often.

Better yet, all disambiguation pages have the disambiguation template on
them, and in that template are links and image links you can click on.

*-- *
*Tyler Romeo*
Stevens Institute of Technology, Class of 2016
Major in Computer Science
www.whizkidztech.com | [hidden email]
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Suggestion for solving the disambiguation problem

John Erling Blad
In reply to this post by lee worden
Send out a "mw-previous-referrer" on the disambiguation page and echo
it back from the browser. It could be done through a cookie. On next
page it must be removed, either in the server or in the browser. The
server can simply rip off any incoming cookie, but not sure if this
will work in the squids or if it is simple to implement. The echoed
back mw-previous-referrer can then be logged somehow for the landing
page. Analysis of the log will then identify missing or failed
linkage.

The same could be done for search pages, as much of the same problem
exist there.

Instead of using cookies javascript can do this by remembering
specific pages by using the session storage. That could imply a
logging facility with some kind of api access.


On Wed, Jul 17, 2013 at 4:37 AM, Lee Worden <[hidden email]> wrote:

> Maybe it could be done with just the Referer field on the second request,
> without needing to log two different page requests and correlate them.
>
>> Date: Tue, 16 Jul 2013 14:14:42 -0400
>> From: David Cuenca<[hidden email]>
>>
>>
>> Good idea, it could also help to know which are the links more used in a
>> disambiguation page to sort them by importance.
>>
>> Micru
>>
>> On Tue, Jul 16, 2013 at 2:03 PM, Nicolas
>> Vervelle<[hidden email]>wrote:
>>
>>> >Interesting idea...
>>> >
>>> >
>>> >On Mon, Jul 15, 2013 at 11:41 PM, Jon Robson<[hidden email]>
>>> > wrote:
>>> >
>>>>
>>>> > >I understand there is an issue that needs solving where various pages
>>>> > >link to disambiguation pages. These need fixing to point at the
>>>> > >appropriate thing.
>>>> > >
>>>> > >I had a thought on how this might be done using a variant of
>>>> > >EventLogging...
>>>> > >
>>>> > >When a user clicks on a link that is a disambiguation page and then
>>>> > >clicks on a link on that page we log an event that contains
>>>> > >
>>>> > >* page user was on before
>>>> > >* page user is on now
>>>> > >
>>>> > >If we were to collect this data it would allow us to statistically
>>>> > >suggest what the  correct disambiguation page might be.
>>>> > >
>>>> > >To take a more concrete theoretical example:
>>>> > >* If I am on the Wiki page for William Blake and click on London I am
>>>> > >taken tohttps://en.wikipedia.org/wiki/London_(disambiguation)
>>>>
>>>> > >* I look through and see London (poem) and click on it
>>>> > >* An event is fired that links London (poem) to William Blake.
>>>> > >
>>>> > >Obviously this won't always be accurate but I'd expect generally this
>>>> > >would work (obviously we'd need to filter out bots)
>>>> > >
>>>> > >Then when editing William Blake say that disambiguation links are
>>>> > >surfaced. If I go to fix one it might prompt me that 80% of visitors
>>>> > >go from William Blake to London (poem).
>>>> > >
>>>> > >
>>>> > >Have we done anything like this in the past? (Collecting data from
>>>> > >readers and informing editors)
>>>> > >
>>>> > >I can imagine applying this sort of pattern could have various other
>>>> > >uses...
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >--
>>>> > >Jon Robson
>>>> > >http://jonrobson.me.uk
>>>> > >@rakugojon
>>>> > >
>
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Suggestion for solving the disambiguation problem

Tyler Romeo
On Wed, Jul 17, 2013 at 4:26 AM, John Erling Blad <[hidden email]> wrote:

> Send out a "mw-previous-referrer" on the disambiguation page and echo
> it back from the browser. It could be done through a cookie. On next
> page it must be removed, either in the server or in the browser. The
> server can simply rip off any incoming cookie, but not sure if this
> will work in the squids or if it is simple to implement. The echoed
> back mw-previous-referrer can then be logged somehow for the landing
> page. Analysis of the log will then identify missing or failed
> linkage.
>
> The same could be done for search pages, as much of the same problem
> exist there.
>
> Instead of using cookies javascript can do this by remembering
> specific pages by using the session storage. That could imply a
> logging facility with some kind of api access.
>

This is an even worse solution. Not only does it have the same problem I
mentioned, but also what if the person just browses to another page by URL?
Then the server thinks the user got there from the disambiguation page.

*-- *
*Tyler Romeo*
Stevens Institute of Technology, Class of 2016
Major in Computer Science
www.whizkidztech.com | [hidden email]
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Suggestion for solving the disambiguation problem

John Erling Blad
It doesn't matter because the correct behavior will accumulate over
time. You don't try to "fix" linkage just because you have one single
observed behavior, you collect and correlate behavior over time and
use several, perhaps hundreds of observations.

Even more interesting than disambiguation pages are search pages. A
user tries to find something, lands on some slightly related page, but
must use another search to find the correct one. Observing only a few
users will give a very confusing use pattern, but observing thousand
of users over a year or more will create distinct patterns.

There are a lot of works on why and how if anyone bother digging it
up. Short story it is only a matter of number of observations.

On Wed, Jul 17, 2013 at 10:30 AM, Tyler Romeo <[hidden email]> wrote:

> On Wed, Jul 17, 2013 at 4:26 AM, John Erling Blad <[hidden email]> wrote:
>
>> Send out a "mw-previous-referrer" on the disambiguation page and echo
>> it back from the browser. It could be done through a cookie. On next
>> page it must be removed, either in the server or in the browser. The
>> server can simply rip off any incoming cookie, but not sure if this
>> will work in the squids or if it is simple to implement. The echoed
>> back mw-previous-referrer can then be logged somehow for the landing
>> page. Analysis of the log will then identify missing or failed
>> linkage.
>>
>> The same could be done for search pages, as much of the same problem
>> exist there.
>>
>> Instead of using cookies javascript can do this by remembering
>> specific pages by using the session storage. That could imply a
>> logging facility with some kind of api access.
>>
>
> This is an even worse solution. Not only does it have the same problem I
> mentioned, but also what if the person just browses to another page by URL?
> Then the server thinks the user got there from the disambiguation page.
>
> *-- *
> *Tyler Romeo*
> Stevens Institute of Technology, Class of 2016
> Major in Computer Science
> www.whizkidztech.com | [hidden email]
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Suggestion for solving the disambiguation problem

Tyler Romeo
On Wed, Jul 17, 2013 at 4:42 AM, John Erling Blad <[hidden email]> wrote:

> It doesn't matter because the correct behavior will accumulate over
> time. You don't try to "fix" linkage just because you have one single
> observed behavior, you collect and correlate behavior over time and
> use several, perhaps hundreds of observations.
>

I strongly doubt that the correct behavior will be prevalent enough to
warrant using such an automatic system over just manually fixing
disambiguation links, which can be done quite easily using automatic wiki
browsers and the like.

*-- *
*Tyler Romeo*
Stevens Institute of Technology, Class of 2016
Major in Computer Science
www.whizkidztech.com | [hidden email]
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Suggestion for solving the disambiguation problem

C. Scott Ananian
Sounds like a disagreement that can be settled quantitatively. ;)
  --scott
On Jul 17, 2013 5:03 AM, "Tyler Romeo" <[hidden email]> wrote:

> On Wed, Jul 17, 2013 at 4:42 AM, John Erling Blad <[hidden email]>
> wrote:
>
> > It doesn't matter because the correct behavior will accumulate over
> > time. You don't try to "fix" linkage just because you have one single
> > observed behavior, you collect and correlate behavior over time and
> > use several, perhaps hundreds of observations.
> >
>
> I strongly doubt that the correct behavior will be prevalent enough to
> warrant using such an automatic system over just manually fixing
> disambiguation links, which can be done quite easily using automatic wiki
> browsers and the like.
>
> *-- *
> *Tyler Romeo*
> Stevens Institute of Technology, Class of 2016
> Major in Computer Science
> www.whizkidztech.com | [hidden email]
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Suggestion for solving the disambiguation problem

Jon Robson
Agreed. As a first step, if someone is interested in this and this
doesn't go against our privacy policy it would be good to collect some
link clicking data for various disambiguation pages to get an idea of
whether the data created is meaningful and useful. Tyler's concerns
are valid but we should clarify with some data rather than speculate
to whether these are indeed concerns we need to worry about and
whether this. EventLogging [1] could be used for this in my opinion
using some simple javascript that hijacks links on the disambiguation
page - looking at referrer and next page.

In terms of analyzing the data you could then simply look at a sample
of disambiguation pages and manually determine the accuracy of users
picking the correct link.

If the data does show promise it would then be an easy enough job to
create a UI to use it and for editors to correct them.

I don't currently have time to explore this but would like to in
future but if anyone is interested please dive in...

[1] https://mediawiki.org/wiki/Extension:EventLogging

On Wed, Jul 17, 2013 at 5:14 AM, C. Scott Ananian
<[hidden email]> wrote:

> Sounds like a disagreement that can be settled quantitatively. ;)
>   --scott
> On Jul 17, 2013 5:03 AM, "Tyler Romeo" <[hidden email]> wrote:
>
>> On Wed, Jul 17, 2013 at 4:42 AM, John Erling Blad <[hidden email]>
>> wrote:
>>
>> > It doesn't matter because the correct behavior will accumulate over
>> > time. You don't try to "fix" linkage just because you have one single
>> > observed behavior, you collect and correlate behavior over time and
>> > use several, perhaps hundreds of observations.
>> >
>>
>> I strongly doubt that the correct behavior will be prevalent enough to
>> warrant using such an automatic system over just manually fixing
>> disambiguation links, which can be done quite easily using automatic wiki
>> browsers and the like.
>>
>> *-- *
>> *Tyler Romeo*
>> Stevens Institute of Technology, Class of 2016
>> Major in Computer Science
>> www.whizkidztech.com | [hidden email]
>> _______________________________________________
>> Wikitech-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



--
Jon Robson
http://jonrobson.me.uk
@rakugojon

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Suggestion for solving the disambiguation problem

John Erling Blad
Not sure if the analysis has to expose any private data at all, you
show the result of the analysis and that would integrate over weeks or
months and perhaps after filtering out random noise. Would that be a
privacy problem?

One of the tricky things is that the disambiguation or search page is
a signal that the referrer or some other previous page in the users
history is difficult to connect to some later page. When the number of
steps between the pages are increasing the problem of detecting the
relation increases exponentially. It is also worth noting that by only
using click events on the disambiguation page you will only discover
connections that are already present as links on the disambiguation
page.

On Wed, Jul 17, 2013 at 6:49 PM, Jon Robson <[hidden email]> wrote:

> Agreed. As a first step, if someone is interested in this and this
> doesn't go against our privacy policy it would be good to collect some
> link clicking data for various disambiguation pages to get an idea of
> whether the data created is meaningful and useful. Tyler's concerns
> are valid but we should clarify with some data rather than speculate
> to whether these are indeed concerns we need to worry about and
> whether this. EventLogging [1] could be used for this in my opinion
> using some simple javascript that hijacks links on the disambiguation
> page - looking at referrer and next page.
>
> In terms of analyzing the data you could then simply look at a sample
> of disambiguation pages and manually determine the accuracy of users
> picking the correct link.
>
> If the data does show promise it would then be an easy enough job to
> create a UI to use it and for editors to correct them.
>
> I don't currently have time to explore this but would like to in
> future but if anyone is interested please dive in...
>
> [1] https://mediawiki.org/wiki/Extension:EventLogging
>
> On Wed, Jul 17, 2013 at 5:14 AM, C. Scott Ananian
> <[hidden email]> wrote:
>> Sounds like a disagreement that can be settled quantitatively. ;)
>>   --scott
>> On Jul 17, 2013 5:03 AM, "Tyler Romeo" <[hidden email]> wrote:
>>
>>> On Wed, Jul 17, 2013 at 4:42 AM, John Erling Blad <[hidden email]>
>>> wrote:
>>>
>>> > It doesn't matter because the correct behavior will accumulate over
>>> > time. You don't try to "fix" linkage just because you have one single
>>> > observed behavior, you collect and correlate behavior over time and
>>> > use several, perhaps hundreds of observations.
>>> >
>>>
>>> I strongly doubt that the correct behavior will be prevalent enough to
>>> warrant using such an automatic system over just manually fixing
>>> disambiguation links, which can be done quite easily using automatic wiki
>>> browsers and the like.
>>>
>>> *-- *
>>> *Tyler Romeo*
>>> Stevens Institute of Technology, Class of 2016
>>> Major in Computer Science
>>> www.whizkidztech.com | [hidden email]
>>> _______________________________________________
>>> Wikitech-l mailing list
>>> [hidden email]
>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>> _______________________________________________
>> Wikitech-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
>
> --
> Jon Robson
> http://jonrobson.me.uk
> @rakugojon
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l