Reader use of Wikipedia and Commons categories

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Reader use of Wikipedia and Commons categories

Pine W
Hi Research-l,

My impression is that volunteers on Commons and ENWP spend a lot of time on categorization. I have seen references to analyses of how categorization is done,  but I can't recall seeing an analysis of how much use readers make of categories on Commons and ENWP. My guess is that readers often use categories on Commons for media searches, but that ENWP categories are rarely used by readers, although maybe WMF Discovery uses categories to inform search results. Is there data that shows how extensively readers on ENWP and Commons use categories?

Thanks,Pine
( https://meta.wikimedia.org/wiki/User:Pine )
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Reader use of Wikipedia and Commons categories

Leila Zia
Hi Pine,

On Wed, May 23, 2018 at 9:46 PM, Pine W <[hidden email]> wrote:
> Hi Research-l,
>
> My impression is that volunteers on Commons and ENWP spend a lot of time on categorization. I have seen references to analyses of how categorization is done,  but I can't recall seeing an analysis of how much use readers make of categories on Commons and ENWP. My guess is that readers often use categories on Commons for media searches, but that ENWP categories are rarely used by readers, although maybe WMF Discovery uses categories to inform search results. Is there data that shows how extensively readers on ENWP and Commons use categories?

I don't know of recent (or old) studies on this topic, but there are
at least a few other things we know that can help you think about
whether it's useful to work on the category network in different
projects.

Categories are used by (at least) three different groups:
* Editors
* Readers
* Machines

We don't know all the use-cases that categories have for these groups.
It seems that generally editors use them to organize their work and
make the article space more navigable, readers use them to explore
content (in a more serendipitous way), and machines use them
extensively for a variety of applications. [We do miss published work
about what I just said, btw, and I really hope us or someone else
writes more about it in the coming year or two.:)]

While we're trying to figure out what the exact answer for the two
first groups are, it's helpful to think about the last group:

Wikipedia category network, with its known caveats, has been used
extensively by researchers to build new insights and technologies. A
lot of research on alignment of text across languages (which is in
turn used in building dictionaries and automatic translation tools)
takes advantage of this (for the most part) human curated
categorization of articles. It's an important side-product of building
the encyclopedia (and other projects). I'll give you a couple of
examples (non-comprehensive), feel free to dig in the literature
review of these papers for more:

* The usage of Wikipedia category network for telling apart classes
from instances: https://dl.acm.org/authorize.cfm?key=N655914 (a
necessary step in knowledge base creation)

* In building YAGO: http://www2007.wwwconference.org/papers/paper391.pdf

* Using Wikipedia category network for building section recommendation
systems for Wikipedia: https://arxiv.org/pdf/1804.05995.pdf , Check
for example, http://gapfinder.wmflabs.org/en.wikipedia.org/v1/section/article/Barack_Obama

There is significant value in Wikipedia Category Network, I would not
discourage editors from building it. I do hope they know what value
this work brings to, at least, the research and scientific community.

Best,
Leila

> Thanks,Pine
> ( https://meta.wikimedia.org/wiki/User:Pine )
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Reader use of Wikipedia and Commons categories

Ziko van Dijk-3
Hello,

A very interesting question. From my experience and talks with readers, I
have the impression that readers usually take no notice of the categories.
I could not find out why, because the category system may be indeed useful
for at least some use cases.

When it comes to Commons, I would be very interested to learn how many
readers (or recipients) are actually non Wikipedia editors.

Kind regards
Ziko



2018-05-24 19:09 GMT+02:00 Leila Zia <[hidden email]>:

> Hi Pine,
>
> On Wed, May 23, 2018 at 9:46 PM, Pine W <[hidden email]> wrote:
> > Hi Research-l,
> >
> > My impression is that volunteers on Commons and ENWP spend a lot of time
> on categorization. I have seen references to analyses of how categorization
> is done,  but I can't recall seeing an analysis of how much use readers
> make of categories on Commons and ENWP. My guess is that readers often use
> categories on Commons for media searches, but that ENWP categories are
> rarely used by readers, although maybe WMF Discovery uses categories to
> inform search results. Is there data that shows how extensively readers on
> ENWP and Commons use categories?
>
> I don't know of recent (or old) studies on this topic, but there are
> at least a few other things we know that can help you think about
> whether it's useful to work on the category network in different
> projects.
>
> Categories are used by (at least) three different groups:
> * Editors
> * Readers
> * Machines
>
> We don't know all the use-cases that categories have for these groups.
> It seems that generally editors use them to organize their work and
> make the article space more navigable, readers use them to explore
> content (in a more serendipitous way), and machines use them
> extensively for a variety of applications. [We do miss published work
> about what I just said, btw, and I really hope us or someone else
> writes more about it in the coming year or two.:)]
>
> While we're trying to figure out what the exact answer for the two
> first groups are, it's helpful to think about the last group:
>
> Wikipedia category network, with its known caveats, has been used
> extensively by researchers to build new insights and technologies. A
> lot of research on alignment of text across languages (which is in
> turn used in building dictionaries and automatic translation tools)
> takes advantage of this (for the most part) human curated
> categorization of articles. It's an important side-product of building
> the encyclopedia (and other projects). I'll give you a couple of
> examples (non-comprehensive), feel free to dig in the literature
> review of these papers for more:
>
> * The usage of Wikipedia category network for telling apart classes
> from instances: https://dl.acm.org/authorize.cfm?key=N655914 (a
> necessary step in knowledge base creation)
>
> * In building YAGO: http://www2007.wwwconference.org/papers/paper391.pdf
>
> * Using Wikipedia category network for building section recommendation
> systems for Wikipedia: https://arxiv.org/pdf/1804.05995.pdf , Check
> for example, http://gapfinder.wmflabs.org/en.wikipedia.org/v1/section/
> article/Barack_Obama
>
> There is significant value in Wikipedia Category Network, I would not
> discourage editors from building it. I do hope they know what value
> this work brings to, at least, the research and scientific community.
>
> Best,
> Leila
>
> > Thanks,Pine
> > ( https://meta.wikimedia.org/wiki/User:Pine )
> > _______________________________________________
> > Wiki-research-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Reader use of Wikipedia and Commons categories

Federico Leva (Nemo)
Ziko van Dijk, 24/05/2018 23:08:
> When it comes to Commons, I would be very interested to learn how many
> readers (or recipients) are actually non Wikipedia editors.

It would be useful to consider less common but high value usage, for
instance people looking for illustrations for a publication. Such
searches could be substitutes for specialised (and expensive) databases,
so the value provided by Commons categories may be higher than the mere
usage numbers suggest. (It should be measured in hours saved or
something like that.)

Federico

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Reader use of Wikipedia and Commons categories

Kerry Raymond
I do outreach including training. From that, I am inclined to agree that readers don’t use categories. People who come to edit training are (unsurprisingly) generally already keen readers of Wikipedia, but categories seem to be something they first learn about in edit training. Indeed, one of my outreach offerings is just a talk about Wikipedia, which includes tips for getting more out of the reader experience, like categories, What Links Here, and lots of thing that are in plain view on the standard desktop interface but people aren't looking there.

Also many categories exist in parallel with List-of articles and navboxes, which do more-or-less-but-not-exactly the same thing. It may be that readers are more likely to stumble on the lists or see the navbox entries (particularly if the navbox renders open). But all in all, I still think most readers enter Wikipedia via search engines and then progress further through Wikipedia by link clicking and using the Wikipedia search box as their principal navigation tools.

Editors use categories principally to increase their edit count (cynical but it's hard to think otherwise given what I see on my watchlist); there's an awful lot of messing about with categories for what seems to be very little benefit to the reader (especially as readers don't seem to use them). And with a lack of obvious ways to intersect categories (petscan is wonderful but neither readers nor most editor know about it) an leads to the never-ending creation of cross-categorisation like

https://en.wikipedia.org/wiki/Category:19th-century_Australian_women_writers

which is pretty clearly the intersection of 4 category trees that probably should be independent: nationality, sex, occupation, time frame. Sooner or later it will inevitably be further subcategorised into

1870s British-born-Australian cis-women poets

First-Monday-in-the-month Indian-born Far-North-Queensland cis-women-with-male-pseudonym romantic-sonnet-poets :-)

Obviously categories do have some uses to editors. If you have a source that provides you with some information about some aspect of a group of topics, it can be useful to work your way through each of the entries in the category updating it accordingly.

Machines. Yes, absolutely. I use AWB and doing things across a category (and the recursive closure of a category) is my primary use-case for AWB. My second use-case for AWB I use a template-use (template/infobox use is a de-facto category and indeed is a third thing that often parallels a category but unlike lists and navboxes, this form is invisible to the reader).

With Commons, again, I don't think readers go there, most haven't even heard of it. It's mainly editors at work there and I think they do use categories. The category structure seems to grow there more organically. There is not the constant "let's rename this category worldwide" or the same level of cross-categorisation on Commons that I see on en.Wikipedia.

I note that while we cannot know who is using categories, we can still get page count stats for the category itself. These tend to be close to 0-per-day for a lot of categories (e.g. Town halls in Queensland). Even a category that one might think has much greater interest get relatively low numbers, e.g. "Presidents of the United States" gets 26-per-day views on average. This compares with 37K daily average for the Donald Trump article, 19K for Barack Obama, and 16K for George Washington. So this definitely suggests that the readers who presumably make up the bulk of the views  on the presidential articles  are not looking at the obvious category for such folk (although they might be moving between presidential articles using by navboxes, succession boxes, lists or other links). Having said that, the Donald Trump article has *53* categories of which Presidents of the United States is number 39 (they appear to be alphabetically ordered), so it is possible that the reader never found the presidential category which is lost in a sea of categories like "21st century Presbyterians" and "Critics of the European Union". I would really have thought that being in the category Presidents of the USA was a slightly more important to the topic of the article than his apparent conversion to Presbyterianism in the 21st century (given he's not categorised as a 20th century Presbyterian).

And, somewhat amazingly, there is no apparent category for "Critics of Donald Trump". I must propose it, along with a fully diffused sub-cat system of Critics of Donald Trump's immigration policies, Critics of Donald Trump's hair, etc. By the time I've add all the relevant articles to those categories, I should have at least another 100K edits to my name!

Kerry




 


-----Original Message-----
From: Wiki-research-l [mailto:[hidden email]] On Behalf Of Federico Leva (Nemo)
Sent: Friday, 25 May 2018 7:14 AM
To: Research into Wikimedia content and communities <[hidden email]>; Ziko van Dijk <[hidden email]>
Subject: Re: [Wiki-research-l] Reader use of Wikipedia and Commons categories

Ziko van Dijk, 24/05/2018 23:08:
> When it comes to Commons, I would be very interested to learn how many
> readers (or recipients) are actually non Wikipedia editors.

It would be useful to consider less common but high value usage, for instance people looking for illustrations for a publication. Such searches could be substitutes for specialised (and expensive) databases, so the value provided by Commons categories may be higher than the mere usage numbers suggest. (It should be measured in hours saved or something like that.)

Federico

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Reader use of Wikipedia and Commons categories

metasj
User of Interlang links and categories varies strongly with placement on
the page. we used to be able to see this now clearly with the multiple
popular skins. today we can perhaps see this best with the multiple apps
and viewers. on wp mobile, surprisingly, readers don't use categories at
all!

More seriously: this is a tremendously useful and underutilized slice of
wiki knowledge, like the quality and completeness categories, which deserve
to be made more visible.

@kerry I expect it isn't for edit count, it is for fixing a fast of
knowledge that those editors find critically important (as I do!). yes we
need something like petscan and intersection to be a standard aspect of on
wiki search: this is precisely the sorry of use that good clean
categorisation is good for!

categorically yours,
sj


On Thu 24 May, 2018, 6:38 PM Kerry Raymond, <[hidden email]> wrote:

> I do outreach including training. From that, I am inclined to agree that
> readers don’t use categories. People who come to edit training are
> (unsurprisingly) generally already keen readers of Wikipedia, but
> categories seem to be something they first learn about in edit training.
> Indeed, one of my outreach offerings is just a talk about Wikipedia, which
> includes tips for getting more out of the reader experience, like
> categories, What Links Here, and lots of thing that are in plain view on
> the standard desktop interface but people aren't looking there.
>
> Also many categories exist in parallel with List-of articles and navboxes,
> which do more-or-less-but-not-exactly the same thing. It may be that
> readers are more likely to stumble on the lists or see the navbox entries
> (particularly if the navbox renders open). But all in all, I still think
> most readers enter Wikipedia via search engines and then progress further
> through Wikipedia by link clicking and using the Wikipedia search box as
> their principal navigation tools.
>
> Editors use categories principally to increase their edit count (cynical
> but it's hard to think otherwise given what I see on my watchlist); there's
> an awful lot of messing about with categories for what seems to be very
> little benefit to the reader (especially as readers don't seem to use
> them). And with a lack of obvious ways to intersect categories (petscan is
> wonderful but neither readers nor most editor know about it) an leads to
> the never-ending creation of cross-categorisation like
>
>
> https://en.wikipedia.org/wiki/Category:19th-century_Australian_women_writers
>
> which is pretty clearly the intersection of 4 category trees that probably
> should be independent: nationality, sex, occupation, time frame. Sooner or
> later it will inevitably be further subcategorised into
>
> 1870s British-born-Australian cis-women poets
>
> First-Monday-in-the-month Indian-born Far-North-Queensland
> cis-women-with-male-pseudonym romantic-sonnet-poets :-)
>
> Obviously categories do have some uses to editors. If you have a source
> that provides you with some information about some aspect of a group of
> topics, it can be useful to work your way through each of the entries in
> the category updating it accordingly.
>
> Machines. Yes, absolutely. I use AWB and doing things across a category
> (and the recursive closure of a category) is my primary use-case for AWB.
> My second use-case for AWB I use a template-use (template/infobox use is a
> de-facto category and indeed is a third thing that often parallels a
> category but unlike lists and navboxes, this form is invisible to the
> reader).
>
> With Commons, again, I don't think readers go there, most haven't even
> heard of it. It's mainly editors at work there and I think they do use
> categories. The category structure seems to grow there more organically.
> There is not the constant "let's rename this category worldwide" or the
> same level of cross-categorisation on Commons that I see on en.Wikipedia.
>
> I note that while we cannot know who is using categories, we can still get
> page count stats for the category itself. These tend to be close to
> 0-per-day for a lot of categories (e.g. Town halls in Queensland). Even a
> category that one might think has much greater interest get relatively low
> numbers, e.g. "Presidents of the United States" gets 26-per-day views on
> average. This compares with 37K daily average for the Donald Trump article,
> 19K for Barack Obama, and 16K for George Washington. So this definitely
> suggests that the readers who presumably make up the bulk of the views  on
> the presidential articles  are not looking at the obvious category for such
> folk (although they might be moving between presidential articles using by
> navboxes, succession boxes, lists or other links). Having said that, the
> Donald Trump article has *53* categories of which Presidents of the United
> States is number 39 (they appear to be alphabetically ordered), so it is
> possible that the reader never found the presidential category which is
> lost in a sea of categories like "21st century Presbyterians" and "Critics
> of the European Union". I would really have thought that being in the
> category Presidents of the USA was a slightly more important to the topic
> of the article than his apparent conversion to Presbyterianism in the 21st
> century (given he's not categorised as a 20th century Presbyterian).
>
> And, somewhat amazingly, there is no apparent category for "Critics of
> Donald Trump". I must propose it, along with a fully diffused sub-cat
> system of Critics of Donald Trump's immigration policies, Critics of Donald
> Trump's hair, etc. By the time I've add all the relevant articles to those
> categories, I should have at least another 100K edits to my name!
>
> Kerry
>
>
>
>
>
>
>
> -----Original Message-----
> From: Wiki-research-l [mailto:[hidden email]]
> On Behalf Of Federico Leva (Nemo)
> Sent: Friday, 25 May 2018 7:14 AM
> To: Research into Wikimedia content and communities <
> [hidden email]>; Ziko van Dijk <[hidden email]>
> Subject: Re: [Wiki-research-l] Reader use of Wikipedia and Commons
> categories
>
> Ziko van Dijk, 24/05/2018 23:08:
> > When it comes to Commons, I would be very interested to learn how many
> > readers (or recipients) are actually non Wikipedia editors.
>
> It would be useful to consider less common but high value usage, for
> instance people looking for illustrations for a publication. Such searches
> could be substitutes for specialised (and expensive) databases, so the
> value provided by Commons categories may be higher than the mere usage
> numbers suggest. (It should be measured in hours saved or something like
> that.)
>
> Federico
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l