gender balance of wikipedia citations

classic Classic list List threaded Threaded
31 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: gender balance of Wikipedia citations

Kerry Raymond
FWIW, I think there would be pushback against a quality tag that highlighted little/no citation of women's work (whether we are talking first author or not) in an article. There's two reasons for this. One is the misogyny that really does exist within the English Wikipedia "community" (those who do most of the shouting and hence decision making); they will argue that firstly gender balance of citations doesn't matter, secondly it is a reflection of the real world and thirdly that Wikipedia has a policy that it is not there to Right Great Wrongs.

More practically, we know that whole-of-article quality tagging doesn't tend to have a lot of impact in terms getting people to fix anything, compared to more specific tags like "citation needed", "dubious", "says who" and so on placed on specific pieces of text. People are much more likely to fix a specific problem and then remove the specific tag. Even when a person does respond to a generic tag like "more references needed" and add in some more references, they rarely remove the generic tag thinking "well, there's still plenty of scope here to add more references". Who among us is willing to declare "that article is 100% fully referenced by reliable sources"? Nobody it seems, it's a tag that lingers forever ...

So I think a specific tag to encourage the expansion of "Bloggs et al" citations to full author listings might work. It's a somewhat boring and mechanical task to expand "et al" but we do have people who are happy to contribute in that way. It might even be possible to build a tool to assist them which looks up the paper in WikiCite or Google Scholar etc to extract the full author list as published (just as we have tools to make it easier to typo and spelling fixes, disambiguate links and so forth). That would address the problem of women authors not being first cited and lost in the mists of "et al". However, as it is unlikely to be obvious to the average contributor that the paper with the full author list of A.B. Brown, C.D. Jones, E.F. Smith and G.H. Walker does or doesn't have any female authors, so I can't see that it's going to be easy to motivate people to try to find additional citations which do have more female authors.

And, as much as gender equity is a wrong I'd like to see rightened, I don't want to see campaigns just to "add in more female authored citations" (I call this "citation sprinkling") on Wikipedia. A citation has to be there because it verifies the information in the article and not to meet a gender quota. Remember that for a lot of Wikipedia contributors, academic literature is mostly behind a paywall so they can't actually read more than the title and abstract at best. A "sprinkling" campaign is likely to see citations based only on title and abstract ("well, it sounds like this paper which includes a woman author is talking about this topic") but the paper may not support the specific claim made in the text (indeed, it might say the exact opposite). A sprinkling campaign should only target the Further Reading section whose role is:

"The Further reading section of an article contains a bulleted list of a reasonable number of works which a reader may consult for additional and more detailed coverage of the subject of the article. In articles with numerous footnotes, it probably is not obvious which ones are suitable for further reading. The "Further reading" section can help the readers by listing selected titles without worrying about duplications."

which would avoid the risk of adding a citation that doesn't support the specific claims being made in the article. So maybe it would be possible to add a "skewed gender balance" tag onto the Further reading section and/or External links section whose role is

"Some acceptable links include those that contain further research that is accurate and on-topic, information that could not be added to the article for reasons such as copyright or amount of detail, or other meaningful, relevant content that is not suitable for inclusion in an article for reasons unrelated to its accuracy."

The downside is this idea for adding female authors to  the Further Reading and External Links sections is whether anyone ever looks at them. Currently over 50% of Wikipedia hits are now via mobile device. The mobile render of a Wikipedia article is not the whole article as you see on desktop and laptop but rather you select the sections you want to read, so for mobile readers we do know precisely what sections they are opening from which we have learned that people in developed countries are not generally reading whole articles but specific sections (suggesting seeking answers to a specific need rather than a desire to fully appreciate the topic), and they don't tend to open anything after the References as a rule, so they aren't looking at Further Reading and External links anyway. Are desktop/laptop readers looking at them either? We don't really know as they get the whole article rendered as a single result and it would really only be eye-tracking studies (an expensive type of experiment) that would give us this insight with the same accuracy as our mobile data.

Aside, in less developed countries, readers are more likely to read whole articles on a mobile device. While the reasons for this different are not proven, I'd be prepared to guess at two interlinked hypotheses. Firstly, such countries have poorer standards of education so people may be using Wikipedia to supplement their limited formal education. Also such countries are more likely to be using rote learning in their education system (valuing the ability to memorise and reproduce) rather than the more problem-solving learning approaches increasingly in use in the education systems of more developed countries. That would also explain whole-of-article viewing rather than selecting specific sub-sections.

In some ways, I think a better solution might be to try to get Google scholar interested in the issue of gender. What if articles listed on Google scholar came with a little gender balance score (a bit like hotel ratings). One blue star (or some other symbol) for one male author, two blue stars (two male authors), one pink and one blue star (first author female, second author male), etc. Why I like the idea is that it is a simple-to-understand visual aid to draw attention to gender imbalance more widely but without a specific call to action (which as I outline above may backfire if citations get added for gender balance rather than content). It potentially helps address the real world problem which would hopefully flow through to Wikipedia. Also Google Scholar is probably a lot better resourced to build the tools to do the legwork of determining gender (I guess a white star is used when it can't). The risks though that Leia has previously mentioned is that automated tools don't do a great job of getting gender correct particularly as the tools are often trained on limited data sets such as mostly white people making the automated gender  guessing of non-white people more likely to be incorrect. However, as authors can establish their own Google Scholar profile (if the author's name is underlined, it's a link to their profile, that's a place where they could disclose their gender if they desired or correct Google Scholar's mistaken guess or demand that Google Scholar not show their gender (whatever should be their choice). Hmm, might it lead to catfishing? Authors passing themselves off as a different gender? Hmm ...

Another place we might explore is marking gender in some easily visible way is in WikiCite but frankly I know little about that project so cannot comment on it nor the merits of doing it there rather than on Google scholar. I don't think traditional journal publishers are likely to be keen to show gender balance on their own websites as I think they would realise it would enable webscraping to reveal their overall gender balance profile, leading to some adverse headlines about "Brandname journals worst for gender equity". But Google Scholar has less to fear unless it was demonstrated that they exhibited stronger gender bias than the journals themselves but I would think that Google Scholar aggregates papers without any regard to the gender of the authors, but I guess it might not aggregate all topic areas equally. For example, if they didn't make much effort to include (say) nursing publications (a more female academic discipline) but went hard on engineering publications (a more male academic discipline), I guess it would skew their author gender balance towards men.

Kerry

-----Original Message-----
From: Wiki-research-l [mailto:[hidden email]] On Behalf Of Greg
Sent: Thursday, 29 August 2019 4:06 AM
To: Research into Wikimedia content and communities <[hidden email]>; [hidden email]
Subject: Re: [Wiki-research-l] gender balance of Wikipedia citations

Hi Jane,

Thanks for the link. It's clear that there is a lot of work being done, and even more left to do.

I've been thinking about what you said about second authors and was wondering if instead of fixing it (or in addition to fixing it), it would make sense to put some sort of tag on the page itself (like the ones I see questioning notability or requests for additional citations). Something along the lines of authors missing from a particular citation and how to fix that, or no work by women cited in this article (if this is the case).
It strikes me that by fixing it yourself, you are doing great work, but that maybe it also makes sense to spread awareness about these issues to the broader editing community so more people are thinking about it/doing it. At any rate, I thought I'd float the idea. Such a tag/the response (if any), could also be interesting to study, though perhaps something like this already exists and I'm just not aware of it, or perhaps there is good reason not to do it.

All best,
Greg

On Tue, Aug 27, 2019 at 5:00 AM <[hidden email]>
wrote:

> Send Wiki-research-l mailing list submissions to
>         [hidden email]
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> or, via email, send a message with subject or body 'help' to
>         [hidden email]
>
> You can reach the person managing the list at
>         [hidden email]
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Wiki-research-l digest..."
>
>
> Today's Topics:
>
>    1. Re: gender balance of wikipedia citations (Greg)
>    2. Re: gender balance of Wikipedia citations (Jane Darnell)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 26 Aug 2019 18:56:12 -0700
> From: Greg <[hidden email]>
> To: Isaac Johnson <[hidden email]>
> Cc: Research into Wikimedia content and communities
>         <[hidden email]>
> Subject: Re: [Wiki-research-l] gender balance of wikipedia citations
> Message-ID:
>         <
> [hidden email]>
> Content-Type: text/plain; charset="UTF-8"
>
> Thanks, Isaac and Federico. These notes and links are very
> helpful--and will require some time to process. As for how many years
> I have to work on this, I'm retired! In truth, I keep hoping that
> someone on this list will express interest in working on these
> matters. The questions are all very interesting and quite relevant.
> The idea of studying removed citations is both complex and compelling.
>
> Greg
>
> On Mon, Aug 26, 2019 at 6:49 AM Isaac Johnson <[hidden email]> wrote:
>
> > Regarding data, I have not been a part of these projects but I think
> > that I can help a bit with working links:
> > * The (I believe) original dataset can also be found here:
> >
> https://analytics.wikimedia.org/datasets/archive/public-datasets/all/m
> wrefs/
> > * A newer version of this dataset was produced that also included
> > information about whether the source was openly available and its topic:
> > ** Meta page:
> >
> https://meta.wikimedia.org/wiki/Research:Towards_Modeling_Citation_Qua
> lity
> > ** Figshare:
> >
> https://figshare.com/articles/Accessibility_and_topics_of_citations_wi
> th_identifiers_in_Wikipedia/6819710
> >
> > On Mon, Aug 26, 2019 at 3:53 AM Federico Leva (Nemo)
> > <[hidden email]
> >
> > wrote:
> >
> >> Greg, 22/08/19 06:19:
> >> > I do not know the current status of wikicite or if/when this
> >> > could be used for this inquiry--either to examine all, or a
> >> > sensible
> >> subset
> >> > of the citations.
> >>
> >> If I see correctly, you still did not receive an answer on the data
> >> available.
> >>
> >> It's true that the Figshare item for <
> >>
> https://meta.wikimedia.org/wiki/Research:Scholarly_article_citations_i
> n_Wikipedia
> >
> >>
> >> was deleted (I've asked about it on the talk page), but it's
> >> trivial to run https://pypi.org/project/mwcites/ and extract the
> >> data yourself, at least for citations which use an identifier.
> >>
> >> Some example datasets produced this way:
> >> https://zenodo.org/record/15871
> >> https://zenodo.org/record/55004
> >> https://zenodo.org/record/54799
> >>
> >> Once you extract the list of works, the fun begins. You'll need to
> >> intersect with other data sources (Wikidata, ORCID, other?) and
> >> account for a number of factors until you manage to find a subset
> >> of the data which has a sufficiently high signal:noise ratio. For
> >> instance you might need to filter or normalise by
> >> * year of publication (some year recent enough to have good data
> >> but old enough to allow the work to be cited elsewhere, be archived
> >> after embargos);
> >> * country or institution (some probably have better ORCID
> >> coverage);
> >> * field/discipline and language;
> >> * open access status (per Unpaywall);
> >> * number of expected pageviews and clicks (for instance using
> >> <https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews> and <
> https://meta.wikimedia.org/wiki/Research:Wikipedia_clickstream#Release
> s>;
> >>
> >> a link from 10k articles on asteroids or proteins is not the same
> >> as being the lone link from a popular article which is not the same
> >> as a link buried among a thousand others on a big article);
> >> * time or duration of the addition (with one of the various diff
> >> extraction libraries, content persistence data or possibly
> >> historical eventstream if such a thing is available).
> >>
> >> To avoid having to invent everything yourself, maybe you can reuse
> >> the method of some similar study, for instance the one on the open
> >> access citation advantage or one of the many which studied the
> >> gender imbalance of citations and peer review in journals.
> >>
> >> However, it's very possible that the noise is just too much for a
> >> general computational method. You might consider a more manual
> >> approach on a sample of relevant events, for instance the *removal*
> >> of citations, which is in my opinion more significant than the
> >> addition.* You might extract all the diffs which removed a citation
> >> from an article in the last N years (probably they'll be in the
> >> order of 10^5 rather than 10^6), remove some massive events or
> >> outliers, sample 500-1000 of them randomly and verify the required data manually.
> >>
> >> As usual it will be impossible to have an objective assessment of
> >> whether that citation was really (in)appropriate in that context
> >> according to the (English or whatever) Wikipedia guidelines. To
> >> test that too, you should replicate one of the various studies of
> >> the gender imbalance of peer review, perhaps one of those which
> >> tried to assess the impact of a double blind peer review system on the gender imbalance.
> >> However, because the sources are already published, you'd need to
> >> provide the agendered information yourself and make sure the
> >> participants perform their assessment in some controlled
> >> environment where they don't have access to any gendered
> >> information (i.e. where you cut them off the internet).
> >>
> >> How many years do you have to work on this project? :-)
> >>
> >> Federico
> >>
> >> (*) I might add a citation just because it's the first result a
> >> popular search engine gives me, after glancing at the abstract and
> >> maybe the journal home page; but if I remove an existing citation,
> >> hopefully I've at least assessed its content and made a judgement
> >> about it, apart from cases of mass removals for specific problems
> >> with certain articles or publication venues.
> >>
> >> _______________________________________________
> >> Wiki-research-l mailing list
> >> [hidden email]
> >> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >>
> >
> >
> > --
> > Isaac Johnson -- Research Scientist -- Wikimedia Foundation
> >
>
>
> ------------------------------
>
> Message: 2
> Date: Tue, 27 Aug 2019 08:00:45 +0200
> From: Jane Darnell <[hidden email]>
> To: Research into Wikimedia content and communities
>         <[hidden email]>
> Subject: Re: [Wiki-research-l] gender balance of Wikipedia citations
> Message-ID:
>         <CAFVcA-HqVicR0k65J4iox0PD=
> [hidden email]>
> Content-Type: text/plain; charset="UTF-8"
>
> Greg,
> Yes that's what I meant. On Wikipedia you get what you measure, so
> many Wikipedians are page-creators and page-hit junkies because we can
> measure that. The trick to motivating editors is giving them other
> measurements for progress. Here is the link to the Women writers
> Wikiproject and as you scroll down you can see what is measured.
> https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Women_writers
> Jane
>
> On Tue, Aug 27, 2019 at 3:39 AM Greg <[hidden email]> wrote:
>
> > Thanks for sharing your experience and thoughts, Jane. I did not
> > know
> this
> > was happening--I'm hardly an expert, so that's not surprising, and
> > yet
> it's
> > still very troubling to hear. I'm not sure what you mean by setting
> > up a Wikiproject. Do you mean of ways for how to study this
> > gap--i.e., the
> ideas
> > that have been floated in this thread to this point? Or are you
> > thinking
> of
> > something else?
> >
> > Greg
> >
> > On Mon, Aug 26, 2019 at 5:00 AM <
> > [hidden email]>
> > wrote:
> >
> > > Send Wiki-research-l mailing list submissions to
> > >         [hidden email]
> > >
> > > To subscribe or unsubscribe via the World Wide Web, visit
> > >        
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > or, via email, send a message with subject or body 'help' to
> > >         [hidden email]
> > >
> > > You can reach the person managing the list at
> > >         [hidden email]
> > >
> > > When replying, please edit your Subject line so it is more
> > > specific than "Re: Contents of Wiki-research-l digest..."
> > >
> > >
> > > Today's Topics:
> > >
> > >    1. Re: gender balance of Wikipedia citations (WereSpielChequers)
> > >    2. Re: gender balance of Wikipedia citations (Greg)
> > >    3. Re: sockpuppets and how to find them sooner (Federico Leva
> (Nemo))
> > >    4. Re: gender balance of Wikipedia citations (Jane Darnell)
> > >    5. Re: gender balance of wikipedia citations (Federico Leva
> > > (Nemo))
> > >
> > >
> > > ------------------------------------------------------------------
> > > ----
> > >
> > > Message: 1
> > > Date: Sun, 25 Aug 2019 14:28:25 +0100
> > > From: WereSpielChequers <[hidden email]>
> > > To: Research into Wikimedia content and communities
> > >         <[hidden email]>
> > > Subject: Re: [Wiki-research-l] gender balance of Wikipedia
> > > citations
> > > Message-ID:
> > >         <CAAanWP3qJnMpLB4tr9Eqt4EJLg2kCihkb50UY-d8=
> > > [hidden email]>
> > > Content-Type: text/plain; charset="UTF-8"
> > >
> > > Hi Greg,
> > >
> > > One of the major step changes in the early growth of the English
> > Wikipedia
> > > was when a bot called RamBot created stub articles on US places. I
> think
> > > they were cited to the census. Others have created articles on
> > > rivers
> in
> > > countries and various other topics by similar programmatic means.
> > Nowadays
> > > such article creation is unlikely to get consensus on the English
> > > Wikipedia, but there are some languages which are very open to
> > > such creations and have them by the million.
> > >
> > > I'm not sure if the fastest updating of existing articles is
> > > automated
> or
> > > just semiautomated. But looking at the bot requests page, it
> > > certainly looks like some people are running such maintenance bots
> > > "updating GDP
> by
> > > country" is a current bot request.
> > > https://en.wikipedia.org/wiki/Wikipedia:Bot_requests.
> > >
> > > I'm not sure how "the ease of a source for purposes of converting
> > > into
> a
> > > table and generating a separate article for each row" relates to
> gender.
> > > But i suspect "number of times cited in wikipedia" deserves less
> > > kudos
> > than
> > > "number of times cited in academia".
> > >
> > > WSC
> > >
> > > On Sun, 25 Aug 2019 at 05:22, Greg <[hidden email]> wrote:
> > >
> > > > Thanks again, Kerry. I am hoping that someone with access to
> > > > more
> > > resources
> > > > (knowledge, support, etc) than I have will look into this.
> > > >
> > > > A few more thoughts/questions:
> > > >
> > > > 1. The link to the citation dataset from the Medium article
> > > > ("What
> are
> > > the
> > > > ten most cited sources on Wikipedia? Let’s ask the data.") is broken.
> > > > 2. As far as I can tell, every named author in the top ten most
> > > > cited sources on Wikipedia is male. One piece is by a working
> > > > group 3. This line from the Medium piece struck me: "Many of
> > > > these
> > publications
> > > > have been cited by Wikipedians across large series of articles
> > > > using powerful bots and automated tools."
> > > >
> > > > Are citations being added by bots? I'm not sure that I
> > > > understand
> that
> > > line
> > > > correctly.
> > > >
> > > > Greg
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> > > ------------------------------
> > >
> > > Message: 2
> > > Date: Sun, 25 Aug 2019 21:16:25 -0700
> > > From: Greg <[hidden email]>
> > > To: [hidden email]
> > > Subject: Re: [Wiki-research-l] gender balance of Wikipedia
> > > citations
> > > Message-ID:
> > >         <CAOO9DNvGyfvJkzyRq60cSQi-T80mAkUa=
> > > [hidden email]>
> > > Content-Type: text/plain; charset="UTF-8"
> > >
> > > Thanks, WSC. All very interesting.
> > >
> > > I've been thinking about Wiklpedia citations less in terms of
> > > kudos and more in terms of a feedback loop. The cited sources get
> > > a significant amount of attention (1 click per 200 pageviews is
> > > the number I saw recently). When I imagine total Wikipedia
> > > traffic, that's huge. How
> many
> > > students are finding sources this way? How many academics? And how
> > > many
> > of
> > > these citations are finding their way back into academic
> > > publications
> via
> > > this mechanism?
> > >
> > > Assuming this is happening to some degree, the gender imbalance of
> > > the citations is also reflected. If the Wikipedia imbalance is the
> > > same as
> > the
> > > one in academia, that's one thing; if it is better on Wikipedia
> > > than it
> > is
> > > in academia, that's reason to celebrate; if the balance is worse,
> that's
> > > concerning. In fact, if the gender imbalance conforms to my fears
> instead
> > > of my hopes, and is magnified by the massive website traffic, I
> > > imagine
> > it
> > > could even explain the growth in the citation disparity
> > > researchers
> note
> > in
> > > their study of political science texts. (I link to that study in a
> > previous
> > > post; it was mentioned in the Washington Post recently)
> > >
> > > There is a very real possibility that Wikipedia is making the
> > > citation gender gap worse. I think we need to understand what is
> > > happening and
> > take
> > > immediate action if the news is not good.
> > >
> > > Greg
> > >
> > > >
> > > >
> > > >
> > >
> > >
> > > ------------------------------
> > >
> > > Message: 3
> > > Date: Mon, 26 Aug 2019 10:59:07 +0300
> > > From: "Federico Leva (Nemo)" <[hidden email]>
> > > To: Research into Wikimedia content and communities
> > >         <[hidden email]>, Aaron Halfaker
> > >         <[hidden email]>, Kerry Raymond <
> > [hidden email]>
> > > Subject: Re: [Wiki-research-l] sockpuppets and how to find them
> > > sooner
> > > Message-ID: <[hidden email]>
> > > Content-Type: text/plain; charset=utf-8; format=flowed
> > >
> > > Please everyone avoid using jargon specific to the English
> > > Wikipedia on this cross-language and cross-wiki mailing list.
> > >
> > > Aaron Halfaker, 23/08/19 17:36:
> > > > I think embeddings[1] would be a nice way to create a signature.
> > >
> > > There is some discussion of acceptable user fingerprinting
> > > (presumably to be available to CheckUsers only), other than the
> > > usual over-reliance on IP addresses, in particular at <
> > >
> >
> https://meta.wikimedia.org/wiki/Talk:IP_Editing:_Privacy_Enhancement_and_Abuse_Mitigation
> > > >.
> > >
> > > Federico
> > >
> > >
> > >
> > > ------------------------------
> > >
> > > Message: 4
> > > Date: Mon, 26 Aug 2019 10:17:46 +0200
> > > From: Jane Darnell <[hidden email]>
> > > To: Research into Wikimedia content and communities
> > >         <[hidden email]>
> > > Subject: Re: [Wiki-research-l] gender balance of Wikipedia citations
> > > Message-ID:
> > >         <CAFVcA-G87k26nBMr=-e-+C8o6eG0KQvVihH=
> > > [hidden email]>
> > > Content-Type: text/plain; charset="UTF-8"
> > >
> > > Greg,
> > > Thanks for worrying. This is a known problem and yes, Wikipedia
> > contributes
> > > to the Gendergap in citations and no, it's not an easy fix, since it is
> > the
> > > fault of systemic bias in academia. So fewer women are head author on
> > > scientific publications, and it is generally only the head author that
> > gets
> > > cited on Wikipedia. This is not just a problem with written works in
> the
> > > field of politics.  I spend most of my time working on paintings and
> > their
> > > documented catalogs, so generally I only notice and fix this problem in
> > art
> > > catalogs. Women rarely appear as lead author mentioned. I will always
> add
> > > them in to descriptions when I add items for their works on Wikidata,
> > but I
> > > can not always find them! Sometimes I can't even create items for them
> > > because all I have is a name and a work and nothing else available
> online
> > > anywhere. You see this most often with women who spent entire careers
> > > working at a single institution and the institution doesn't bother to
> > > promote their work or even list them in exhibition catalogs. With luck
> > > there might be a local obituary, but not always. If you have
> suggestions
> > > how to set up a Wikiproject to tackle this it would be a good idea. In
> my
> > > onwiki experience the Women-in-Red community can be very positive in
> > their
> > > response to gendergap-related issues for women writers.
> > > Jane
> > >
> > > On Mon, Aug 26, 2019 at 6:17 AM Greg <[hidden email]>
> wrote:
> > >
> > > > Thanks, WSC. All very interesting.
> > > >
> > > > I've been thinking about Wiklpedia citations less in terms of kudos
> and
> > > > more in terms of a feedback loop. The cited sources get a significant
> > > > amount of attention (1 click per 200 pageviews is the number I saw
> > > > recently). When I imagine total Wikipedia traffic, that's huge. How
> > many
> > > > students are finding sources this way? How many academics? And how
> many
> > > of
> > > > these citations are finding their way back into academic publications
> > via
> > > > this mechanism?
> > > >
> > > > Assuming this is happening to some degree, the gender imbalance of
> the
> > > > citations is also reflected. If the Wikipedia imbalance is the same
> as
> > > the
> > > > one in academia, that's one thing; if it is better on Wikipedia than
> it
> > > is
> > > > in academia, that's reason to celebrate; if the balance is worse,
> > that's
> > > > concerning. In fact, if the gender imbalance conforms to my fears
> > instead
> > > > of my hopes, and is magnified by the massive website traffic, I
> imagine
> > > it
> > > > could even explain the growth in the citation disparity researchers
> > note
> > > in
> > > > their study of political science texts. (I link to that study in a
> > > previous
> > > > post; it was mentioned in the Washington Post recently)
> > > >
> > > > There is a very real possibility that Wikipedia is making the
> citation
> > > > gender gap worse. I think we need to understand what is happening and
> > > take
> > > > immediate action if the news is not good.
> > > >
> > > > Greg
> > > >
> > > > >
> > > > >
> > > > >
> > > > _______________________________________________
> > > > Wiki-research-l mailing list
> > > > [hidden email]
> > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > >
> > >
> > >
> > > ------------------------------
> > >
> > > Message: 5
> > > Date: Mon, 26 Aug 2019 11:45:09 +0300
> > > From: "Federico Leva (Nemo)" <[hidden email]>
> > > To: Research into Wikimedia content and communities
> > >         <[hidden email]>, Greg
> > >         <[hidden email]>
> > > Subject: Re: [Wiki-research-l] gender balance of wikipedia citations
> > > Message-ID: <[hidden email]>
> > > Content-Type: text/plain; charset=utf-8; format=flowed
> > >
> > > Greg, 22/08/19 06:19:
> > > > I do not know the current status of wikicite or if/when this
> > > > could be used for this inquiry--either to examine all, or a sensible
> > > subset
> > > > of the citations.
> > >
> > > If I see correctly, you still did not receive an answer on the data
> > > available.
> > >
> > > It's true that the Figshare item for
> > > <
> > >
> >
> https://meta.wikimedia.org/wiki/Research:Scholarly_article_citations_in_Wikipedia
> > >
> > >
> > > was deleted (I've asked about it on the talk page), but it's trivial to
> > > run https://pypi.org/project/mwcites/ and extract the data yourself,
> at
> > > least for citations which use an identifier.
> > >
> > > Some example datasets produced this way:
> > > https://zenodo.org/record/15871
> > > https://zenodo.org/record/55004
> > > https://zenodo.org/record/54799
> > >
> > > Once you extract the list of works, the fun begins. You'll need to
> > > intersect with other data sources (Wikidata, ORCID, other?) and account
> > > for a number of factors until you manage to find a subset of the data
> > > which has a sufficiently high signal:noise ratio. For instance you
> might
> > > need to filter or normalise by
> > > * year of publication (some year recent enough to have good data but
> old
> > > enough to allow the work to be cited elsewhere, be archived after
> > > embargos);
> > > * country or institution (some probably have better ORCID coverage);
> > > * field/discipline and language;
> > > * open access status (per Unpaywall);
> > > * number of expected pageviews and clicks (for instance using
> > > <https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews> and
> > > <
> https://meta.wikimedia.org/wiki/Research:Wikipedia_clickstream#Releases
> > >;
> > >
> > > a link from 10k articles on asteroids or proteins is not the same as
> > > being the lone link from a popular article which is not the same as a
> > > link buried among a thousand others on a big article);
> > > * time or duration of the addition (with one of the various diff
> > > extraction libraries, content persistence data or possibly historical
> > > eventstream if such a thing is available).
> > >
> > > To avoid having to invent everything yourself, maybe you can reuse the
> > > method of some similar study, for instance the one on the open access
> > > citation advantage or one of the many which studied the gender
> imbalance
> > > of citations and peer review in journals.
> > >
> > > However, it's very possible that the noise is just too much for a
> > > general computational method. You might consider a more manual approach
> > > on a sample of relevant events, for instance the *removal* of
> citations,
> > > which is in my opinion more significant than the addition.* You might
> > > extract all the diffs which removed a citation from an article in the
> > > last N years (probably they'll be in the order of 10^5 rather than
> > > 10^6), remove some massive events or outliers, sample 500-1000 of them
> > > randomly and verify the required data manually.
> > >
> > > As usual it will be impossible to have an objective assessment of
> > > whether that citation was really (in)appropriate in that context
> > > according to the (English or whatever) Wikipedia guidelines. To test
> > > that too, you should replicate one of the various studies of the gender
> > > imbalance of peer review, perhaps one of those which tried to assess
> the
> > > impact of a double blind peer review system on the gender imbalance.
> > > However, because the sources are already published, you'd need to
> > > provide the agendered information yourself and make sure the
> > > participants perform their assessment in some controlled environment
> > > where they don't have access to any gendered information (i.e. where
> you
> > > cut them off the internet).
> > >
> > > How many years do you have to work on this project? :-)
> > >
> > > Federico
> > >
> > > (*) I might add a citation just because it's the first result a popular
> > > search engine gives me, after glancing at the abstract and maybe the
> > > journal home page; but if I remove an existing citation, hopefully I've
> > > at least assessed its content and made a judgement about it, apart from
> > > cases of mass removals for specific problems with certain articles or
> > > publication venues.
> > >
> > >
> > >
> > > ------------------------------
> > >
> > > Subject: Digest Footer
> > >
> > > _______________________________________________
> > > Wiki-research-l mailing list
> > > [hidden email]
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > >
> > >
> > > ------------------------------
> > >
> > > End of Wiki-research-l Digest, Vol 168, Issue 20
> > > ************************************************
> > >
> > _______________________________________________
> > Wiki-research-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
> ------------------------------
>
> End of Wiki-research-l Digest, Vol 168, Issue 22
> ************************************************
>
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: gender balance of Wikipedia citations

Greg-2
In reply to this post by Greg-2
Thanks once again for writing up your thoughts, Kerry. All very interesting.

Your comment about 'reflection of the real world' caught my eye. I believe
that the real world is moving towards acknowledging that bias exists and
that it won't just go away on its own. I see web-based tools for assessing
the gender balance of citations; I see people studying bias in things like
hiring and promotion, as well as different strategies for addressing it; I
see organizations like VIDA (https://www.vidaweb.org/) counting the number
of female writers in different journals, and journals responding because
the imbalance is known and public. I think the real world is moving towards
acknowledging and proactively addressing inequity. If the Wikipedia
community is not studying its biases and designing tools and strategies for
addressing them, it is not reflecting the world, but lagging behind it.

Frankly, if a few shouting misogynist have a problem with such initiatives,
I don't mind :) It sounds like my citation-tag idea doesn't make too much
sense, but I'd love to hear any other thoughts.

Greg

On Wed, Aug 28, 2019 at 3:27 PM <[hidden email]>
wrote:

> Send Wiki-research-l mailing list submissions to
>         [hidden email]
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> or, via email, send a message with subject or body 'help' to
>         [hidden email]
>
> You can reach the person managing the list at
>         [hidden email]
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Wiki-research-l digest..."
>
>
> Today's Topics:
>
>    1. Re: gender balance of Wikipedia citations (Kerry Raymond)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 29 Aug 2019 08:26:45 +1000
> From: "Kerry Raymond" <[hidden email]>
> To: "'Research into Wikimedia content and communities'"
>         <[hidden email]>, <[hidden email]>
> Subject: Re: [Wiki-research-l] gender balance of Wikipedia citations
> Message-ID: <006701d55def$aea84d50$0bf8e7f0$@gmail.com>
> Content-Type: text/plain;       charset="UTF-8"
>
> FWIW, I think there would be pushback against a quality tag that
> highlighted little/no citation of women's work (whether we are talking
> first author or not) in an article. There's two reasons for this. One is
> the misogyny that really does exist within the English Wikipedia
> "community" (those who do most of the shouting and hence decision making);
> they will argue that firstly gender balance of citations doesn't matter,
> secondly it is a reflection of the real world and thirdly that Wikipedia
> has a policy that it is not there to Right Great Wrongs.
>
> More practically, we know that whole-of-article quality tagging doesn't
> tend to have a lot of impact in terms getting people to fix anything,
> compared to more specific tags like "citation needed", "dubious", "says
> who" and so on placed on specific pieces of text. People are much more
> likely to fix a specific problem and then remove the specific tag. Even
> when a person does respond to a generic tag like "more references needed"
> and add in some more references, they rarely remove the generic tag
> thinking "well, there's still plenty of scope here to add more references".
> Who among us is willing to declare "that article is 100% fully referenced
> by reliable sources"? Nobody it seems, it's a tag that lingers forever ...
>
> So I think a specific tag to encourage the expansion of "Bloggs et al"
> citations to full author listings might work. It's a somewhat boring and
> mechanical task to expand "et al" but we do have people who are happy to
> contribute in that way. It might even be possible to build a tool to assist
> them which looks up the paper in WikiCite or Google Scholar etc to extract
> the full author list as published (just as we have tools to make it easier
> to typo and spelling fixes, disambiguate links and so forth). That would
> address the problem of women authors not being first cited and lost in the
> mists of "et al". However, as it is unlikely to be obvious to the average
> contributor that the paper with the full author list of A.B. Brown, C.D.
> Jones, E.F. Smith and G.H. Walker does or doesn't have any female authors,
> so I can't see that it's going to be easy to motivate people to try to find
> additional citations which do have more female authors.
>
> And, as much as gender equity is a wrong I'd like to see rightened, I
> don't want to see campaigns just to "add in more female authored citations"
> (I call this "citation sprinkling") on Wikipedia. A citation has to be
> there because it verifies the information in the article and not to meet a
> gender quota. Remember that for a lot of Wikipedia contributors, academic
> literature is mostly behind a paywall so they can't actually read more than
> the title and abstract at best. A "sprinkling" campaign is likely to see
> citations based only on title and abstract ("well, it sounds like this
> paper which includes a woman author is talking about this topic") but the
> paper may not support the specific claim made in the text (indeed, it might
> say the exact opposite). A sprinkling campaign should only target the
> Further Reading section whose role is:
>
> "The Further reading section of an article contains a bulleted list of a
> reasonable number of works which a reader may consult for additional and
> more detailed coverage of the subject of the article. In articles with
> numerous footnotes, it probably is not obvious which ones are suitable for
> further reading. The "Further reading" section can help the readers by
> listing selected titles without worrying about duplications."
>
> which would avoid the risk of adding a citation that doesn't support the
> specific claims being made in the article. So maybe it would be possible to
> add a "skewed gender balance" tag onto the Further reading section and/or
> External links section whose role is
>
> "Some acceptable links include those that contain further research that is
> accurate and on-topic, information that could not be added to the article
> for reasons such as copyright or amount of detail, or other meaningful,
> relevant content that is not suitable for inclusion in an article for
> reasons unrelated to its accuracy."
>
> The downside is this idea for adding female authors to  the Further
> Reading and External Links sections is whether anyone ever looks at them.
> Currently over 50% of Wikipedia hits are now via mobile device. The mobile
> render of a Wikipedia article is not the whole article as you see on
> desktop and laptop but rather you select the sections you want to read, so
> for mobile readers we do know precisely what sections they are opening from
> which we have learned that people in developed countries are not generally
> reading whole articles but specific sections (suggesting seeking answers to
> a specific need rather than a desire to fully appreciate the topic), and
> they don't tend to open anything after the References as a rule, so they
> aren't looking at Further Reading and External links anyway. Are
> desktop/laptop readers looking at them either? We don't really know as they
> get the whole article rendered as a single result and it would really only
> be eye-tracking studies (an expensive type of experiment) that would give
> us this insight with the same accuracy as our mobile data.
>
> Aside, in less developed countries, readers are more likely to read whole
> articles on a mobile device. While the reasons for this different are not
> proven, I'd be prepared to guess at two interlinked hypotheses. Firstly,
> such countries have poorer standards of education so people may be using
> Wikipedia to supplement their limited formal education. Also such countries
> are more likely to be using rote learning in their education system
> (valuing the ability to memorise and reproduce) rather than the more
> problem-solving learning approaches increasingly in use in the education
> systems of more developed countries. That would also explain
> whole-of-article viewing rather than selecting specific sub-sections.
>
> In some ways, I think a better solution might be to try to get Google
> scholar interested in the issue of gender. What if articles listed on
> Google scholar came with a little gender balance score (a bit like hotel
> ratings). One blue star (or some other symbol) for one male author, two
> blue stars (two male authors), one pink and one blue star (first author
> female, second author male), etc. Why I like the idea is that it is a
> simple-to-understand visual aid to draw attention to gender imbalance more
> widely but without a specific call to action (which as I outline above may
> backfire if citations get added for gender balance rather than content). It
> potentially helps address the real world problem which would hopefully flow
> through to Wikipedia. Also Google Scholar is probably a lot better
> resourced to build the tools to do the legwork of determining gender (I
> guess a white star is used when it can't). The risks though that Leia has
> previously mentioned is that automated tools don't do a great job of
> getting gender correct particularly as the tools are often trained on
> limited data sets such as mostly white people making the automated gender
> guessing of non-white people more likely to be incorrect. However, as
> authors can establish their own Google Scholar profile (if the author's
> name is underlined, it's a link to their profile, that's a place where they
> could disclose their gender if they desired or correct Google Scholar's
> mistaken guess or demand that Google Scholar not show their gender
> (whatever should be their choice). Hmm, might it lead to catfishing?
> Authors passing themselves off as a different gender? Hmm ...
>
> Another place we might explore is marking gender in some easily visible
> way is in WikiCite but frankly I know little about that project so cannot
> comment on it nor the merits of doing it there rather than on Google
> scholar. I don't think traditional journal publishers are likely to be keen
> to show gender balance on their own websites as I think they would realise
> it would enable webscraping to reveal their overall gender balance profile,
> leading to some adverse headlines about "Brandname journals worst for
> gender equity". But Google Scholar has less to fear unless it was
> demonstrated that they exhibited stronger gender bias than the journals
> themselves but I would think that Google Scholar aggregates papers without
> any regard to the gender of the authors, but I guess it might not aggregate
> all topic areas equally. For example, if they didn't make much effort to
> include (say) nursing publications (a more female academic discipline) but
> went hard on engineering publications (a more male academic discipline), I
> guess it would skew their author gender balance towards men.
>
> Kerry
>
> -----Original Message-----
> From: Wiki-research-l [mailto:[hidden email]]
> On Behalf Of Greg
> Sent: Thursday, 29 August 2019 4:06 AM
> To: Research into Wikimedia content and communities <
> [hidden email]>; [hidden email]
> Subject: Re: [Wiki-research-l] gender balance of Wikipedia citations
>
> Hi Jane,
>
> Thanks for the link. It's clear that there is a lot of work being done,
> and even more left to do.
>
> I've been thinking about what you said about second authors and was
> wondering if instead of fixing it (or in addition to fixing it), it would
> make sense to put some sort of tag on the page itself (like the ones I see
> questioning notability or requests for additional citations). Something
> along the lines of authors missing from a particular citation and how to
> fix that, or no work by women cited in this article (if this is the case).
> It strikes me that by fixing it yourself, you are doing great work, but
> that maybe it also makes sense to spread awareness about these issues to
> the broader editing community so more people are thinking about it/doing
> it. At any rate, I thought I'd float the idea. Such a tag/the response (if
> any), could also be interesting to study, though perhaps something like
> this already exists and I'm just not aware of it, or perhaps there is good
> reason not to do it.
>
> All best,
> Greg
>
> On Tue, Aug 27, 2019 at 5:00 AM <
> [hidden email]>
> wrote:
>
> > Send Wiki-research-l mailing list submissions to
> >         [hidden email]
> >
> > To subscribe or unsubscribe via the World Wide Web, visit
> >         https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > or, via email, send a message with subject or body 'help' to
> >         [hidden email]
> >
> > You can reach the person managing the list at
> >         [hidden email]
> >
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of Wiki-research-l digest..."
> >
> >
> > Today's Topics:
> >
> >    1. Re: gender balance of wikipedia citations (Greg)
> >    2. Re: gender balance of Wikipedia citations (Jane Darnell)
> >
> >
> > ----------------------------------------------------------------------
> >
> > Message: 1
> > Date: Mon, 26 Aug 2019 18:56:12 -0700
> > From: Greg <[hidden email]>
> > To: Isaac Johnson <[hidden email]>
> > Cc: Research into Wikimedia content and communities
> >         <[hidden email]>
> > Subject: Re: [Wiki-research-l] gender balance of wikipedia citations
> > Message-ID:
> >         <
> > [hidden email]>
> > Content-Type: text/plain; charset="UTF-8"
> >
> > Thanks, Isaac and Federico. These notes and links are very
> > helpful--and will require some time to process. As for how many years
> > I have to work on this, I'm retired! In truth, I keep hoping that
> > someone on this list will express interest in working on these
> > matters. The questions are all very interesting and quite relevant.
> > The idea of studying removed citations is both complex and compelling.
> >
> > Greg
> >
> > On Mon, Aug 26, 2019 at 6:49 AM Isaac Johnson <[hidden email]>
> wrote:
> >
> > > Regarding data, I have not been a part of these projects but I think
> > > that I can help a bit with working links:
> > > * The (I believe) original dataset can also be found here:
> > >
> > https://analytics.wikimedia.org/datasets/archive/public-datasets/all/m
> > wrefs/
> > > * A newer version of this dataset was produced that also included
> > > information about whether the source was openly available and its
> topic:
> > > ** Meta page:
> > >
> > https://meta.wikimedia.org/wiki/Research:Towards_Modeling_Citation_Qua
> > lity
> > > ** Figshare:
> > >
> > https://figshare.com/articles/Accessibility_and_topics_of_citations_wi
> > th_identifiers_in_Wikipedia/6819710
> > >
> > > On Mon, Aug 26, 2019 at 3:53 AM Federico Leva (Nemo)
> > > <[hidden email]
> > >
> > > wrote:
> > >
> > >> Greg, 22/08/19 06:19:
> > >> > I do not know the current status of wikicite or if/when this
> > >> > could be used for this inquiry--either to examine all, or a
> > >> > sensible
> > >> subset
> > >> > of the citations.
> > >>
> > >> If I see correctly, you still did not receive an answer on the data
> > >> available.
> > >>
> > >> It's true that the Figshare item for <
> > >>
> > https://meta.wikimedia.org/wiki/Research:Scholarly_article_citations_i
> > n_Wikipedia
> > >
> > >>
> > >> was deleted (I've asked about it on the talk page), but it's
> > >> trivial to run https://pypi.org/project/mwcites/ and extract the
> > >> data yourself, at least for citations which use an identifier.
> > >>
> > >> Some example datasets produced this way:
> > >> https://zenodo.org/record/15871
> > >> https://zenodo.org/record/55004
> > >> https://zenodo.org/record/54799
> > >>
> > >> Once you extract the list of works, the fun begins. You'll need to
> > >> intersect with other data sources (Wikidata, ORCID, other?) and
> > >> account for a number of factors until you manage to find a subset
> > >> of the data which has a sufficiently high signal:noise ratio. For
> > >> instance you might need to filter or normalise by
> > >> * year of publication (some year recent enough to have good data
> > >> but old enough to allow the work to be cited elsewhere, be archived
> > >> after embargos);
> > >> * country or institution (some probably have better ORCID
> > >> coverage);
> > >> * field/discipline and language;
> > >> * open access status (per Unpaywall);
> > >> * number of expected pageviews and clicks (for instance using
> > >> <https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews> and <
> > https://meta.wikimedia.org/wiki/Research:Wikipedia_clickstream#Release
> > s>;
> > >>
> > >> a link from 10k articles on asteroids or proteins is not the same
> > >> as being the lone link from a popular article which is not the same
> > >> as a link buried among a thousand others on a big article);
> > >> * time or duration of the addition (with one of the various diff
> > >> extraction libraries, content persistence data or possibly
> > >> historical eventstream if such a thing is available).
> > >>
> > >> To avoid having to invent everything yourself, maybe you can reuse
> > >> the method of some similar study, for instance the one on the open
> > >> access citation advantage or one of the many which studied the
> > >> gender imbalance of citations and peer review in journals.
> > >>
> > >> However, it's very possible that the noise is just too much for a
> > >> general computational method. You might consider a more manual
> > >> approach on a sample of relevant events, for instance the *removal*
> > >> of citations, which is in my opinion more significant than the
> > >> addition.* You might extract all the diffs which removed a citation
> > >> from an article in the last N years (probably they'll be in the
> > >> order of 10^5 rather than 10^6), remove some massive events or
> > >> outliers, sample 500-1000 of them randomly and verify the required
> data manually.
> > >>
> > >> As usual it will be impossible to have an objective assessment of
> > >> whether that citation was really (in)appropriate in that context
> > >> according to the (English or whatever) Wikipedia guidelines. To
> > >> test that too, you should replicate one of the various studies of
> > >> the gender imbalance of peer review, perhaps one of those which
> > >> tried to assess the impact of a double blind peer review system on
> the gender imbalance.
> > >> However, because the sources are already published, you'd need to
> > >> provide the agendered information yourself and make sure the
> > >> participants perform their assessment in some controlled
> > >> environment where they don't have access to any gendered
> > >> information (i.e. where you cut them off the internet).
> > >>
> > >> How many years do you have to work on this project? :-)
> > >>
> > >> Federico
> > >>
> > >> (*) I might add a citation just because it's the first result a
> > >> popular search engine gives me, after glancing at the abstract and
> > >> maybe the journal home page; but if I remove an existing citation,
> > >> hopefully I've at least assessed its content and made a judgement
> > >> about it, apart from cases of mass removals for specific problems
> > >> with certain articles or publication venues.
> > >>
> > >> _______________________________________________
> > >> Wiki-research-l mailing list
> > >> [hidden email]
> > >> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > >>
> > >
> > >
> > > --
> > > Isaac Johnson -- Research Scientist -- Wikimedia Foundation
> > >
> >
> >
> > ------------------------------
> >
> > Message: 2
> > Date: Tue, 27 Aug 2019 08:00:45 +0200
> > From: Jane Darnell <[hidden email]>
> > To: Research into Wikimedia content and communities
> >         <[hidden email]>
> > Subject: Re: [Wiki-research-l] gender balance of Wikipedia citations
> > Message-ID:
> >         <CAFVcA-HqVicR0k65J4iox0PD=
> > [hidden email]>
> > Content-Type: text/plain; charset="UTF-8"
> >
> > Greg,
> > Yes that's what I meant. On Wikipedia you get what you measure, so
> > many Wikipedians are page-creators and page-hit junkies because we can
> > measure that. The trick to motivating editors is giving them other
> > measurements for progress. Here is the link to the Women writers
> > Wikiproject and as you scroll down you can see what is measured.
> > https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Women_writers
> > Jane
> >
> > On Tue, Aug 27, 2019 at 3:39 AM Greg <[hidden email]> wrote:
> >
> > > Thanks for sharing your experience and thoughts, Jane. I did not
> > > know
> > this
> > > was happening--I'm hardly an expert, so that's not surprising, and
> > > yet
> > it's
> > > still very troubling to hear. I'm not sure what you mean by setting
> > > up a Wikiproject. Do you mean of ways for how to study this
> > > gap--i.e., the
> > ideas
> > > that have been floated in this thread to this point? Or are you
> > > thinking
> > of
> > > something else?
> > >
> > > Greg
> > >
> > > On Mon, Aug 26, 2019 at 5:00 AM <
> > > [hidden email]>
> > > wrote:
> > >
> > > > Send Wiki-research-l mailing list submissions to
> > > >         [hidden email]
> > > >
> > > > To subscribe or unsubscribe via the World Wide Web, visit
> > > >
> > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > > or, via email, send a message with subject or body 'help' to
> > > >         [hidden email]
> > > >
> > > > You can reach the person managing the list at
> > > >         [hidden email]
> > > >
> > > > When replying, please edit your Subject line so it is more
> > > > specific than "Re: Contents of Wiki-research-l digest..."
> > > >
> > > >
> > > > Today's Topics:
> > > >
> > > >    1. Re: gender balance of Wikipedia citations (WereSpielChequers)
> > > >    2. Re: gender balance of Wikipedia citations (Greg)
> > > >    3. Re: sockpuppets and how to find them sooner (Federico Leva
> > (Nemo))
> > > >    4. Re: gender balance of Wikipedia citations (Jane Darnell)
> > > >    5. Re: gender balance of wikipedia citations (Federico Leva
> > > > (Nemo))
> > > >
> > > >
> > > > ------------------------------------------------------------------
> > > > ----
> > > >
> > > > Message: 1
> > > > Date: Sun, 25 Aug 2019 14:28:25 +0100
> > > > From: WereSpielChequers <[hidden email]>
> > > > To: Research into Wikimedia content and communities
> > > >         <[hidden email]>
> > > > Subject: Re: [Wiki-research-l] gender balance of Wikipedia
> > > > citations
> > > > Message-ID:
> > > >         <CAAanWP3qJnMpLB4tr9Eqt4EJLg2kCihkb50UY-d8=
> > > > [hidden email]>
> > > > Content-Type: text/plain; charset="UTF-8"
> > > >
> > > > Hi Greg,
> > > >
> > > > One of the major step changes in the early growth of the English
> > > Wikipedia
> > > > was when a bot called RamBot created stub articles on US places. I
> > think
> > > > they were cited to the census. Others have created articles on
> > > > rivers
> > in
> > > > countries and various other topics by similar programmatic means.
> > > Nowadays
> > > > such article creation is unlikely to get consensus on the English
> > > > Wikipedia, but there are some languages which are very open to
> > > > such creations and have them by the million.
> > > >
> > > > I'm not sure if the fastest updating of existing articles is
> > > > automated
> > or
> > > > just semiautomated. But looking at the bot requests page, it
> > > > certainly looks like some people are running such maintenance bots
> > > > "updating GDP
> > by
> > > > country" is a current bot request.
> > > > https://en.wikipedia.org/wiki/Wikipedia:Bot_requests.
> > > >
> > > > I'm not sure how "the ease of a source for purposes of converting
> > > > into
> > a
> > > > table and generating a separate article for each row" relates to
> > gender.
> > > > But i suspect "number of times cited in wikipedia" deserves less
> > > > kudos
> > > than
> > > > "number of times cited in academia".
> > > >
> > > > WSC
> > > >
> > > > On Sun, 25 Aug 2019 at 05:22, Greg <[hidden email]>
> wrote:
> > > >
> > > > > Thanks again, Kerry. I am hoping that someone with access to
> > > > > more
> > > > resources
> > > > > (knowledge, support, etc) than I have will look into this.
> > > > >
> > > > > A few more thoughts/questions:
> > > > >
> > > > > 1. The link to the citation dataset from the Medium article
> > > > > ("What
> > are
> > > > the
> > > > > ten most cited sources on Wikipedia? Let’s ask the data.") is
> broken.
> > > > > 2. As far as I can tell, every named author in the top ten most
> > > > > cited sources on Wikipedia is male. One piece is by a working
> > > > > group 3. This line from the Medium piece struck me: "Many of
> > > > > these
> > > publications
> > > > > have been cited by Wikipedians across large series of articles
> > > > > using powerful bots and automated tools."
> > > > >
> > > > > Are citations being added by bots? I'm not sure that I
> > > > > understand
> > that
> > > > line
> > > > > correctly.
> > > > >
> > > > > Greg
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > > ------------------------------
> > > >
> > > > Message: 2
> > > > Date: Sun, 25 Aug 2019 21:16:25 -0700
> > > > From: Greg <[hidden email]>
> > > > To: [hidden email]
> > > > Subject: Re: [Wiki-research-l] gender balance of Wikipedia
> > > > citations
> > > > Message-ID:
> > > >         <CAOO9DNvGyfvJkzyRq60cSQi-T80mAkUa=
> > > > [hidden email]>
> > > > Content-Type: text/plain; charset="UTF-8"
> > > >
> > > > Thanks, WSC. All very interesting.
> > > >
> > > > I've been thinking about Wiklpedia citations less in terms of
> > > > kudos and more in terms of a feedback loop. The cited sources get
> > > > a significant amount of attention (1 click per 200 pageviews is
> > > > the number I saw recently). When I imagine total Wikipedia
> > > > traffic, that's huge. How
> > many
> > > > students are finding sources this way? How many academics? And how
> > > > many
> > > of
> > > > these citations are finding their way back into academic
> > > > publications
> > via
> > > > this mechanism?
> > > >
> > > > Assuming this is happening to some degree, the gender imbalance of
> > > > the citations is also reflected. If the Wikipedia imbalance is the
> > > > same as
> > > the
> > > > one in academia, that's one thing; if it is better on Wikipedia
> > > > than it
> > > is
> > > > in academia, that's reason to celebrate; if the balance is worse,
> > that's
> > > > concerning. In fact, if the gender imbalance conforms to my fears
> > instead
> > > > of my hopes, and is magnified by the massive website traffic, I
> > > > imagine
> > > it
> > > > could even explain the growth in the citation disparity
> > > > researchers
> > note
> > > in
> > > > their study of political science texts. (I link to that study in a
> > > previous
> > > > post; it was mentioned in the Washington Post recently)
> > > >
> > > > There is a very real possibility that Wikipedia is making the
> > > > citation gender gap worse. I think we need to understand what is
> > > > happening and
> > > take
> > > > immediate action if the news is not good.
> > > >
> > > > Greg
> > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > > ------------------------------
> > > >
> > > > Message: 3
> > > > Date: Mon, 26 Aug 2019 10:59:07 +0300
> > > > From: "Federico Leva (Nemo)" <[hidden email]>
> > > > To: Research into Wikimedia content and communities
> > > >         <[hidden email]>, Aaron Halfaker
> > > >         <[hidden email]>, Kerry Raymond <
> > > [hidden email]>
> > > > Subject: Re: [Wiki-research-l] sockpuppets and how to find them
> > > > sooner
> > > > Message-ID: <[hidden email]>
> > > > Content-Type: text/plain; charset=utf-8; format=flowed
> > > >
> > > > Please everyone avoid using jargon specific to the English
> > > > Wikipedia on this cross-language and cross-wiki mailing list.
> > > >
> > > > Aaron Halfaker, 23/08/19 17:36:
> > > > > I think embeddings[1] would be a nice way to create a signature.
> > > >
> > > > There is some discussion of acceptable user fingerprinting
> > > > (presumably to be available to CheckUsers only), other than the
> > > > usual over-reliance on IP addresses, in particular at <
> > > >
> > >
> >
> https://meta.wikimedia.org/wiki/Talk:IP_Editing:_Privacy_Enhancement_and_Abuse_Mitigation
> > > > >.
> > > >
> > > > Federico
> > > >
> > > >
> > > >
> > > > ------------------------------
> > > >
> > > > Message: 4
> > > > Date: Mon, 26 Aug 2019 10:17:46 +0200
> > > > From: Jane Darnell <[hidden email]>
> > > > To: Research into Wikimedia content and communities
> > > >         <[hidden email]>
> > > > Subject: Re: [Wiki-research-l] gender balance of Wikipedia citations
> > > > Message-ID:
> > > >         <CAFVcA-G87k26nBMr=-e-+C8o6eG0KQvVihH=
> > > > [hidden email]>
> > > > Content-Type: text/plain; charset="UTF-8"
> > > >
> > > > Greg,
> > > > Thanks for worrying. This is a known problem and yes, Wikipedia
> > > contributes
> > > > to the Gendergap in citations and no, it's not an easy fix, since it
> is
> > > the
> > > > fault of systemic bias in academia. So fewer women are head author on
> > > > scientific publications, and it is generally only the head author
> that
> > > gets
> > > > cited on Wikipedia. This is not just a problem with written works in
> > the
> > > > field of politics.  I spend most of my time working on paintings and
> > > their
> > > > documented catalogs, so generally I only notice and fix this problem
> in
> > > art
> > > > catalogs. Women rarely appear as lead author mentioned. I will always
> > add
> > > > them in to descriptions when I add items for their works on Wikidata,
> > > but I
> > > > can not always find them! Sometimes I can't even create items for
> them
> > > > because all I have is a name and a work and nothing else available
> > online
> > > > anywhere. You see this most often with women who spent entire careers
> > > > working at a single institution and the institution doesn't bother to
> > > > promote their work or even list them in exhibition catalogs. With
> luck
> > > > there might be a local obituary, but not always. If you have
> > suggestions
> > > > how to set up a Wikiproject to tackle this it would be a good idea.
> In
> > my
> > > > onwiki experience the Women-in-Red community can be very positive in
> > > their
> > > > response to gendergap-related issues for women writers.
> > > > Jane
> > > >
> > > > On Mon, Aug 26, 2019 at 6:17 AM Greg <[hidden email]>
> > wrote:
> > > >
> > > > > Thanks, WSC. All very interesting.
> > > > >
> > > > > I've been thinking about Wiklpedia citations less in terms of kudos
> > and
> > > > > more in terms of a feedback loop. The cited sources get a
> significant
> > > > > amount of attention (1 click per 200 pageviews is the number I saw
> > > > > recently). When I imagine total Wikipedia traffic, that's huge. How
> > > many
> > > > > students are finding sources this way? How many academics? And how
> > many
> > > > of
> > > > > these citations are finding their way back into academic
> publications
> > > via
> > > > > this mechanism?
> > > > >
> > > > > Assuming this is happening to some degree, the gender imbalance of
> > the
> > > > > citations is also reflected. If the Wikipedia imbalance is the same
> > as
> > > > the
> > > > > one in academia, that's one thing; if it is better on Wikipedia
> than
> > it
> > > > is
> > > > > in academia, that's reason to celebrate; if the balance is worse,
> > > that's
> > > > > concerning. In fact, if the gender imbalance conforms to my fears
> > > instead
> > > > > of my hopes, and is magnified by the massive website traffic, I
> > imagine
> > > > it
> > > > > could even explain the growth in the citation disparity researchers
> > > note
> > > > in
> > > > > their study of political science texts. (I link to that study in a
> > > > previous
> > > > > post; it was mentioned in the Washington Post recently)
> > > > >
> > > > > There is a very real possibility that Wikipedia is making the
> > citation
> > > > > gender gap worse. I think we need to understand what is happening
> and
> > > > take
> > > > > immediate action if the news is not good.
> > > > >
> > > > > Greg
> > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > _______________________________________________
> > > > > Wiki-research-l mailing list
> > > > > [hidden email]
> > > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > > >
> > > >
> > > >
> > > > ------------------------------
> > > >
> > > > Message: 5
> > > > Date: Mon, 26 Aug 2019 11:45:09 +0300
> > > > From: "Federico Leva (Nemo)" <[hidden email]>
> > > > To: Research into Wikimedia content and communities
> > > >         <[hidden email]>, Greg
> > > >         <[hidden email]>
> > > > Subject: Re: [Wiki-research-l] gender balance of wikipedia citations
> > > > Message-ID: <[hidden email]>
> > > > Content-Type: text/plain; charset=utf-8; format=flowed
> > > >
> > > > Greg, 22/08/19 06:19:
> > > > > I do not know the current status of wikicite or if/when this
> > > > > could be used for this inquiry--either to examine all, or a
> sensible
> > > > subset
> > > > > of the citations.
> > > >
> > > > If I see correctly, you still did not receive an answer on the data
> > > > available.
> > > >
> > > > It's true that the Figshare item for
> > > > <
> > > >
> > >
> >
> https://meta.wikimedia.org/wiki/Research:Scholarly_article_citations_in_Wikipedia
> > > >
> > > >
> > > > was deleted (I've asked about it on the talk page), but it's trivial
> to
> > > > run https://pypi.org/project/mwcites/ and extract the data yourself,
> > at
> > > > least for citations which use an identifier.
> > > >
> > > > Some example datasets produced this way:
> > > > https://zenodo.org/record/15871
> > > > https://zenodo.org/record/55004
> > > > https://zenodo.org/record/54799
> > > >
> > > > Once you extract the list of works, the fun begins. You'll need to
> > > > intersect with other data sources (Wikidata, ORCID, other?) and
> account
> > > > for a number of factors until you manage to find a subset of the data
> > > > which has a sufficiently high signal:noise ratio. For instance you
> > might
> > > > need to filter or normalise by
> > > > * year of publication (some year recent enough to have good data but
> > old
> > > > enough to allow the work to be cited elsewhere, be archived after
> > > > embargos);
> > > > * country or institution (some probably have better ORCID coverage);
> > > > * field/discipline and language;
> > > > * open access status (per Unpaywall);
> > > > * number of expected pageviews and clicks (for instance using
> > > > <https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews> and
> > > > <
> > https://meta.wikimedia.org/wiki/Research:Wikipedia_clickstream#Releases
> > > >;
> > > >
> > > > a link from 10k articles on asteroids or proteins is not the same as
> > > > being the lone link from a popular article which is not the same as a
> > > > link buried among a thousand others on a big article);
> > > > * time or duration of the addition (with one of the various diff
> > > > extraction libraries, content persistence data or possibly historical
> > > > eventstream if such a thing is available).
> > > >
> > > > To avoid having to invent everything yourself, maybe you can reuse
> the
> > > > method of some similar study, for instance the one on the open access
> > > > citation advantage or one of the many which studied the gender
> > imbalance
> > > > of citations and peer review in journals.
> > > >
> > > > However, it's very possible that the noise is just too much for a
> > > > general computational method. You might consider a more manual
> approach
> > > > on a sample of relevant events, for instance the *removal* of
> > citations,
> > > > which is in my opinion more significant than the addition.* You might
> > > > extract all the diffs which removed a citation from an article in the
> > > > last N years (probably they'll be in the order of 10^5 rather than
> > > > 10^6), remove some massive events or outliers, sample 500-1000 of
> them
> > > > randomly and verify the required data manually.
> > > >
> > > > As usual it will be impossible to have an objective assessment of
> > > > whether that citation was really (in)appropriate in that context
> > > > according to the (English or whatever) Wikipedia guidelines. To test
> > > > that too, you should replicate one of the various studies of the
> gender
> > > > imbalance of peer review, perhaps one of those which tried to assess
> > the
> > > > impact of a double blind peer review system on the gender imbalance.
> > > > However, because the sources are already published, you'd need to
> > > > provide the agendered information yourself and make sure the
> > > > participants perform their assessment in some controlled environment
> > > > where they don't have access to any gendered information (i.e. where
> > you
> > > > cut them off the internet).
> > > >
> > > > How many years do you have to work on this project? :-)
> > > >
> > > > Federico
> > > >
> > > > (*) I might add a citation just because it's the first result a
> popular
> > > > search engine gives me, after glancing at the abstract and maybe the
> > > > journal home page; but if I remove an existing citation, hopefully
> I've
> > > > at least assessed its content and made a judgement about it, apart
> from
> > > > cases of mass removals for specific problems with certain articles or
> > > > publication venues.
> > > >
> > > >
> > > >
> > > > ------------------------------
> > > >
> > > > Subject: Digest Footer
> > > >
> > > > _______________________________________________
> > > > Wiki-research-l mailing list
> > > > [hidden email]
> > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > >
> > > >
> > > > ------------------------------
> > > >
> > > > End of Wiki-research-l Digest, Vol 168, Issue 20
> > > > ************************************************
> > > >
> > > _______________________________________________
> > > Wiki-research-l mailing list
> > > [hidden email]
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > >
> >
> >
> > ------------------------------
> >
> > Subject: Digest Footer
> >
> > _______________________________________________
> > Wiki-research-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> >
> > ------------------------------
> >
> > End of Wiki-research-l Digest, Vol 168, Issue 22
> > ************************************************
> >
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
> ------------------------------
>
> End of Wiki-research-l Digest, Vol 168, Issue 25
> ************************************************
>
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: gender balance of Wikipedia citations

Federico Leva (Nemo)
Kerry Raymond, 29/08/19 01:26:
 > So I think a specific tag to encourage the expansion of "Bloggs et al"
 > citations to full author listings might work.

But it's easier to fix it yourself, using the citation bot:
https://en.wikipedia.org/wiki/WP:UCB

Greg, 30/08/19 07:48:
> If the Wikipedia
> community is not studying its biases and designing tools and strategies for
> addressing them, it is not reflecting the world, but lagging behind it.

However, going back to Kerry:

 > In some ways, I think a better solution might be to try to get Google
 > scholar interested in the issue of gender.

I'm not aware of studies of gender bias in Google Scholar search results
themselves, yet we'd really need such basic information before going
into specifics of how the research is consumed and redistributed. There
is a mention of gender in https://oadoi.org/10.1017/S104909651800094 
which states

 > Moreover, because a GS pro-
 > file is a public signal, it can have a disproportionate effect on
 > opinions because a person seeing it knows that others also see
 > it (Chwe 2016).

Which seems to me an argument very similar to yours on Wikipedia.

Federico

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: gender balance of Wikipedia citations

Gerard Meijssen-3
Hoi,
In Wikidata we import many papers and link authors to these papers. In
order to have sensible gender information, we do add gender info on a big
scale. It is less interesting to look at the absolute numbers
(statistically hard to move) but it is more interesting to know for the
co-authors of an author or to know the gender ratio for a particular
profession eg historians, chemists et al. The great thing of using Wikidata
is that we can query any which way.

While Google Scholar is nice, we have our own environment that we will use
for our citations anyway so why not use it?
Thanks,
     GerardM

On Fri, 30 Aug 2019 at 07:35, Federico Leva (Nemo) <[hidden email]>
wrote:

> Kerry Raymond, 29/08/19 01:26:
>  > So I think a specific tag to encourage the expansion of "Bloggs et al"
>  > citations to full author listings might work.
>
> But it's easier to fix it yourself, using the citation bot:
> https://en.wikipedia.org/wiki/WP:UCB
>
> Greg, 30/08/19 07:48:
> > If the Wikipedia
> > community is not studying its biases and designing tools and strategies
> for
> > addressing them, it is not reflecting the world, but lagging behind it.
>
> However, going back to Kerry:
>
>  > In some ways, I think a better solution might be to try to get Google
>  > scholar interested in the issue of gender.
>
> I'm not aware of studies of gender bias in Google Scholar search results
> themselves, yet we'd really need such basic information before going
> into specifics of how the research is consumed and redistributed. There
> is a mention of gender in https://oadoi.org/10.1017/S104909651800094
> which states
>
>  > Moreover, because a GS pro-
>  > file is a public signal, it can have a disproportionate effect on
>  > opinions because a person seeing it knows that others also see
>  > it (Chwe 2016).
>
> Which seems to me an argument very similar to yours on Wikipedia.
>
> Federico
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: gender balance of Wikipedia citations

Greg-2
In reply to this post by Federico Leva (Nemo)
Thanks, Federico. Do you mean that examining gender bias is more relevant
to google than wikipedia? Or necessary before any work can be done here?
I'm not sure that I fully understand what you are saying, but I would like
to.

In a cursory look at the top 10 wikipedia citations (
https://medium.com/freely-sharing-the-sum-of-all-knowledge/what-are-the-ten-most-cited-sources-on-wikipedia-lets-ask-the-data-34071478785a),
I noticed that the bulk of the occurrences of three of the ten (#4-6)
appear on bswiki (almost exclusively). From a few observations, it also
seems possible that a bot has surfaced these three texts on many (perhaps
even thousands) of pages in a "Literatura" section. I do not know what the
effect of such a surfacing would be--either through human or tech/search
discovery, perhaps it is small--but when I think of Jane's story--that she
hand-fixes missing second authors--while these male authors are pounded
into pages with such ease, I feel heartbroken. These three books may be
wonderful, but I strongly suspect there are other books that are also
wonderful, with no bot behind them.

In other news, it has been brought to my attention that responding to the
digest version of the list is problematic for a number of reasons. My
apologies! I did not realize this. I have adjusted my settings.

Greg

On Thu, Aug 29, 2019 at 10:35 PM Federico Leva (Nemo) <[hidden email]>
wrote:

> Kerry Raymond, 29/08/19 01:26:
>  > So I think a specific tag to encourage the expansion of "Bloggs et al"
>  > citations to full author listings might work.
>
> But it's easier to fix it yourself, using the citation bot:
> https://en.wikipedia.org/wiki/WP:UCB
>
> Greg, 30/08/19 07:48:
> > If the Wikipedia
> > community is not studying its biases and designing tools and strategies
> for
> > addressing them, it is not reflecting the world, but lagging behind it.
>
> However, going back to Kerry:
>
>  > In some ways, I think a better solution might be to try to get Google
>  > scholar interested in the issue of gender.
>
> I'm not aware of studies of gender bias in Google Scholar search results
> themselves, yet we'd really need such basic information before going
> into specifics of how the research is consumed and redistributed. There
> is a mention of gender in https://oadoi.org/10.1017/S104909651800094
> which states
>
>  > Moreover, because a GS pro-
>  > file is a public signal, it can have a disproportionate effect on
>  > opinions because a person seeing it knows that others also see
>  > it (Chwe 2016).
>
> Which seems to me an argument very similar to yours on Wikipedia.
>
> Federico
>
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: gender balance of Wikipedia citations

Federico Leva (Nemo)
Greg, 31/08/19 05:17:
> Thanks, Federico. Do you mean that examining gender bias is more
> relevant to google than wikipedia? Or necessary before any work can be
> done here?

I'm saying that any gender bias of citations on Wikipedia articles will
compound a number of factors, including the underlying bias in the
literature, bias in how it's presented in discovery tools, etc. As long
as we don't know the size of such underlying biases, I suspect an
attempt to measure Wikipedia's specific contribution would be futile.

It's also a standard research practice to break down a problem into
smaller parts, easier to manage. Google Scholar or similar tools are
already large enough. Millions of Wikipedia authors and all their
background and methods are however significantly larger.

Federico

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: gender balance of Wikipedia citations

Kerry Raymond
In reply to this post by Federico Leva (Nemo)
Does it expand an existing citation that someone else has created with "et al" which is the scenario here? My experience of it is that I can use it to expand an pre-existing naked URL citation (in some cases, exceptions being PDFs) but I've never seen a way to use it expand a partial citation to a more fullsome one.

Kerry

Sent from my iPad

> On 29 Aug 2019, at 10:35 pm, Federico Leva (Nemo) <[hidden email]> wrote:
>
> Kerry Raymond, 29/08/19 01:26:
> > So I think a specific tag to encourage the expansion of "Bloggs et al"
> > citations to full author listings might work.
>
> But it's easier to fix it yourself, using the citation bot:
> https://en.wikipedia.org/wiki/WP:UCB
>
> Greg, 30/08/19 07:48:
>> If the Wikipedia
>> community is not studying its biases and designing tools and strategies for
>> addressing them, it is not reflecting the world, but lagging behind it.
>
> However, going back to Kerry:
>
> > In some ways, I think a better solution might be to try to get Google
> > scholar interested in the issue of gender.
>
> I'm not aware of studies of gender bias in Google Scholar search results themselves, yet we'd really need such basic information before going into specifics of how the research is consumed and redistributed. There is a mention of gender in https://oadoi.org/10.1017/S104909651800094 which states
>
> > Moreover, because a GS pro-
> > file is a public signal, it can have a disproportionate effect on
> > opinions because a person seeing it knows that others also see
> > it (Chwe 2016).
>
> Which seems to me an argument very similar to yours on Wikipedia.
>
> Federico
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: gender balance of Wikipedia citations

Greg-2
In reply to this post by Federico Leva (Nemo)
Hi Federico,
Thanks for the clarification! I also think that it is very difficult to
understand bias--where it is coming from and what is contributing to
it--when it has not been measured. I originally came here looking for
information about the existing gender balance of citations on Wikipedia so
that I could begin to understand what is happening. My concerns have
unfolded over the course of this conversation.

I am cc'ing Gerard here because I received his note via digest but wanted
to say thank you. I am curious about how best to approach using wikidata to
generate useful information about gender balance and if there are any
issues around doing this.

Thanks all,
Greg

On Sat, Aug 31, 2019 at 12:43 AM Federico Leva (Nemo) <[hidden email]>
wrote:

> Greg, 31/08/19 05:17:
> > Thanks, Federico. Do you mean that examining gender bias is more
> > relevant to google than wikipedia? Or necessary before any work can be
> > done here?
>
> I'm saying that any gender bias of citations on Wikipedia articles will
> compound a number of factors, including the underlying bias in the
> literature, bias in how it's presented in discovery tools, etc. As long
> as we don't know the size of such underlying biases, I suspect an
> attempt to measure Wikipedia's specific contribution would be futile.
>
> It's also a standard research practice to break down a problem into
> smaller parts, easier to manage. Google Scholar or similar tools are
> already large enough. Millions of Wikipedia authors and all their
> background and methods are however significantly larger.
>
> Federico
>
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: gender balance of wikipedia citations

WereSpielChequers-2
In reply to this post by Greg-2
Dear Greg,

The Wikimedia foundation and various chapters have microgrant programs and
an online library. In part this is to counter the projects known bias
towards free online sources. If you can come up with software that
identifies sources that we aren't using but should then that would make for
some interesting reports on Wikiprojects, or an interesting opportunity for
the wiki library.

I'd particularly like to see something along the lines of a bot that sends
messages to active Wikipedia editors "As an editor who has been active in
topic zzzzzz we would like to send you a free copy of the new book xxxxxx
by yyyyy. Click here to arrange your free copy"

There might be a little grumbling if there was something in the algorithm
that meant that half of such gifts happened to have female authors, but I
suspect only a little as long as the books were being offered to currently
active editors who actually write content, and there was an option for them
to say "actually that's not relevant to my current interests, can I have a
copy of x instead?" even the more cynical and jaded members of the
community would accept that as the foundation trying to do something useful
for once.

My experience is that Wiikipedians are most likely to grumble about gender
balancing that reduces their personal chances, either through a gender
balanced recruitment of staff from a predominately male pool of
volunteers,  or gender balanced trips to Wikimania from that same
predominately male pool of volunteers. But gender balancing by author of
purchase of reference books, that doesn't disadvantage many Wikipedians.
Though a skew towards more academic topics and away from  military history
might get a few grumbles.

WSC


On Fri, 23 Aug 2019 at 08:01, Greg <[hidden email]> wrote:

> Wow, Kerry! Thank you for taking the time to write all these thoughts out.
>
> I'm asking the question because I'm concerned that the gender balance of
> the authors being cited on wikipedia is different from the already quite
> bad patterns in academia. My fear is that the citation gender imbalance on
> Wikipedia is more pronounced. If so, it is not just perpetuating the
> problem, but making it worse by surfacing certain authors and ideas even
> more frequently, or hardly at all. I would like to know if this is the
> case, and if so, how big the effect is.
>
> In my last message, I mention a study about a set of award-winning
> political science books (the researchers study the citation gender
> imbalance for that set). I just saw this study today, but I began to think
> that it/the set of works--or some similar set of titles--could possibly be
> a good place to begin, especially if the original researchers were willing
> to share the list of titles/authors/gender/etc that they put
> together/worked with. Then it seems it would mostly be a matter of figuring
> out how to understand how those titles are cited on Wikipedia--through
> either the citation dataset or wikicite--to see if/how the citation
> patterns differ (i.e., if the works by women/men are cited more
> frequently/at the same rate/less frequently on Wikipedia than what the
> researchers found in the original study).
>
> This seems like it would be easier to do than what you propose, but perhaps
> the idea is not sound. Until very recently, I thought I could find the
> answer in an existing paper! I honestly don't know the best way to get the
> answer, but I would like to know the answer and think it's important to
> look at.
>
> All of the things you bring up--from the gender of the editor, to the type
> of editing being done, to the issues around multiple authors/paywalls/year
> of publication/field--complicate the inquiry, and in particular a larger
> one. I agree with what you say about doing something small first to see
> what's there.
>
> Thanks again for all your thoughts.
> Greg
>
>
>
> On Thu, Aug 22, 2019 at 9:41 PM <
> [hidden email]>
> wrote:
>
> > Send Wiki-research-l mailing list submissions to
> >         [hidden email]
> >
> > To subscribe or unsubscribe via the World Wide Web, visit
> >         https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > or, via email, send a message with subject or body 'help' to
> >         [hidden email]
> >
> > You can reach the person managing the list at
> >         [hidden email]
> >
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of Wiki-research-l digest..."
> >
> >
> > Today's Topics:
> >
> >    1. Re: gender balance of wikipedia citations (Greg)
> >    2. Re: gender balance of wikipedia citations (Kerry Raymond)
> >
> >
> > ----------------------------------------------------------------------
> >
> > Message: 1
> > Date: Thu, 22 Aug 2019 18:47:48 -0700
> > From: Greg <[hidden email]>
> > To: [hidden email]
> > Subject: Re: [Wiki-research-l] gender balance of wikipedia citations
> > Message-ID:
> >         <
> > [hidden email]>
> > Content-Type: text/plain; charset="UTF-8"
> >
> > Hi Leila,
> >
> > Thanks for your thoughts.
> >
> > Having just read Troy Vettese's very powerful essay, Sexism in the
> Academy
> > (
> > https://nplusonemag.com/issue-34/essays/sexism-in-the-academy/), I wish
> > this were a top priority.
> >
> > I stumbled upon a study today--it came up in the Washington Post's
> > excellent series on gender bias in political science. The authors look
> at a
> > set of award winning political science books and the gender imbalance in
> > the citations drawn from google scholar.  I'm linking the piece here in
> > case anyone on this list is interested now, or in the future, in how the
> > patterns on Wikipedia compare.
> >
> > Washington Post piece: "There’s a gender gap in who wins political
> science
> > book awards – and in how widely they’re cited"
> >
> >
> https://www.washingtonpost.com/politics/2019/08/22/theres-gender-gap-who-wins-political-science-book-awards-how-widely-theyre-cited/
> > "Just as significantly, women’s award-winning books receive fewer
> scholarly
> > citations than men’s award-winning volumes — and this disparity has
> grown,
> > rather than shrunk, in recent years. Over the entire period, APSA
> > award-winning volumes by women averaged 43 percent fewer citations per
> year
> > than those by male authors."
> >
> > Paper: "Winning awards and gaining recognition: An impact analysis of
> APSA
> > section book prizes"
> > https://www.sciencedirect.com/science/article/abs/pii/S0362331918300867
> >
> >
> > Best,
> > Greg
> >
> > On Thu, Aug 22, 2019 at 3:44 PM <
> > [hidden email]>
> > wrote:
> >
> > > Send Wiki-research-l mailing list submissions to
> > >         [hidden email]
> > >
> > > To subscribe or unsubscribe via the World Wide Web, visit
> > >         https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > or, via email, send a message with subject or body 'help' to
> > >         [hidden email]
> > >
> > > You can reach the person managing the list at
> > >         [hidden email]
> > >
> > > When replying, please edit your Subject line so it is more specific
> > > than "Re: Contents of Wiki-research-l digest..."
> > >
> > >
> > > Today's Topics:
> > >
> > >    1. Re: gender balance of wikipedia citations (Greg)
> > >    2. Re: gender balance of wikipedia citations (Leila Zia)
> > >    3. Wikimania 2019 disinformation meetup follow-up (Leila Zia)
> > >    4. Upcoming Research Newsletter (special issue on gender gap
> > >       research): New papers open for review (Mohammed Sadat Abdulai)
> > >
> > >
> > > ----------------------------------------------------------------------
> > >
> > > Message: 1
> > > Date: Thu, 22 Aug 2019 09:57:15 -0700
> > > From: Greg <[hidden email]>
> > > To: [hidden email]
> > > Subject: Re: [Wiki-research-l] gender balance of wikipedia citations
> > > Message-ID:
> > >         <CAOO9DNuSYzzaVwcdqiWA7pj671z3N43XOSwv6DtW0SxWg=
> > > [hidden email]>
> > > Content-Type: text/plain; charset="UTF-8"
> > >
> > > Hi Kerry,
> > > Those are all very interesting ways to look at this. I was thinking
> > mostly
> > > along the lines of your first bullet point, but I'd be interested in
> > > research in any of those areas.
> > >
> > > Thanks,
> > > Greg
> > >
> > > On Thu, Aug 22, 2019 at 5:00 AM <
> > > [hidden email]>
> > > wrote:
> > >
> > > > Send Wiki-research-l mailing list submissions to
> > > >         [hidden email]
> > > >
> > > > To subscribe or unsubscribe via the World Wide Web, visit
> > > >         https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > > or, via email, send a message with subject or body 'help' to
> > > >         [hidden email]
> > > >
> > > > You can reach the person managing the list at
> > > >         [hidden email]
> > > >
> > > > When replying, please edit your Subject line so it is more specific
> > > > than "Re: Contents of Wiki-research-l digest..."
> > > >
> > > >
> > > > Today's Topics:
> > > >
> > > >    1. gender balance of wikipedia citations (Greg)
> > > >    2. Re: gender balance of wikipedia citations (Kerry Raymond)
> > > >
> > > >
> > > >
> ----------------------------------------------------------------------
> > > >
> > > > Message: 1
> > > > Date: Wed, 21 Aug 2019 20:19:18 -0700
> > > > From: Greg <[hidden email]>
> > > > To: [hidden email]
> > > > Subject: [Wiki-research-l] gender balance of wikipedia citations
> > > > Message-ID:
> > > >         <
> > > > [hidden email]>
> > > > Content-Type: text/plain; charset="UTF-8"
> > > >
> > > > Greetings!
> > > >
> > > > I was looking for information about the gender balance of Wikipedia
> > > > citations and no one I've asked knows of any work on this topic. Do
> > you?
> > > >
> > > > I think this is an important question.
> > > >
> > > > Here's what I've learned so far:
> > > >
> > > > Wikipedia citations are currently in the form of text strings. There
> is
> > > > also an initiative to place citations in an annotated structured
> > > repository
> > > > (wikicite). I do not know the current status of wikicite or if/when
> > this
> > > > could be used for this inquiry--either to examine all, or a sensible
> > > subset
> > > > of the citations.
> > > >
> > > > My perspective is that understanding the gender balance is  necessary
> > and
> > > > urgent. The balance could be better, the same, or worse than the
> > citation
> > > > balances we already know, and the scale of the effect is quite large.
> > > >
> > > > Is this a line of inquiry that the wikimedia/wikicite community is
> > > > interested in pursuing? If so, what is the best way to get started?
> > Does
> > > > the WMF have the resources and interest to look into this matter
> > inhouse?
> > > >
> > > > Thanks for your thoughts.
> > > >
> > > > Greg
> > > >
> > > >
> > > > ------------------------------
> > > >
> > > > Message: 2
> > > > Date: Thu, 22 Aug 2019 13:53:45 +1000
> > > > From: "Kerry Raymond" <[hidden email]>
> > > > To: "'Research into Wikimedia content and communities'"
> > > >         <[hidden email]>
> > > > Subject: Re: [Wiki-research-l] gender balance of wikipedia citations
> > > > Message-ID: <00ed01d5589d$33e31ed0$9ba95c70$@gmail.com>
> > > > Content-Type: text/plain;       charset="UTF-8"
> > > >
> > > > Could you elaborate a bit more on what you mean by the gender balance
> > of
> > > > citations?
> > > >
> > > > Are you talking about:
> > > >
> > > > * proportion of male vs female authors of the source material used as
> > > > citations in arbitrary articles>
> > > > *  the quality/quantity of citations in biography articles of men vs
> > > women?
> > > > * the quality/quantity of citations in articles that are gendered by
> > some
> > > > other criteria (e.g. reader interest, romantic comedy vs action
> film)?
> > > >
> > > > Kerry
> > > >
> > > > -----Original Message-----
> > > > From: Wiki-research-l [mailto:
> > > [hidden email]]
> > > > On Behalf Of Greg
> > > > Sent: Thursday, 22 August 2019 1:19 PM
> > > > To: [hidden email]
> > > > Subject: [Wiki-research-l] gender balance of wikipedia citations
> > > >
> > > > Greetings!
> > > >
> > > > I was looking for information about the gender balance of Wikipedia
> > > > citations and no one I've asked knows of any work on this topic. Do
> > you?
> > > >
> > > > I think this is an important question.
> > > >
> > > > Here's what I've learned so far:
> > > >
> > > > Wikipedia citations are currently in the form of text strings. There
> is
> > > > also an initiative to place citations in an annotated structured
> > > repository
> > > > (wikicite). I do not know the current status of wikicite or if/when
> > this
> > > > could be used for this inquiry--either to examine all, or a sensible
> > > subset
> > > > of the citations.
> > > >
> > > > My perspective is that understanding the gender balance is  necessary
> > and
> > > > urgent. The balance could be better, the same, or worse than the
> > citation
> > > > balances we already know, and the scale of the effect is quite large.
> > > >
> > > > Is this a line of inquiry that the wikimedia/wikicite community is
> > > > interested in pursuing? If so, what is the best way to get started?
> > Does
> > > > the WMF have the resources and interest to look into this matter
> > inhouse?
> > > >
> > > > Thanks for your thoughts.
> > > >
> > > > Greg
> > > > _______________________________________________
> > > > Wiki-research-l mailing list
> > > > [hidden email]
> > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > >
> > > >
> > > >
> > > >
> > > > ------------------------------
> > > >
> > > > Subject: Digest Footer
> > > >
> > > > _______________________________________________
> > > > Wiki-research-l mailing list
> > > > [hidden email]
> > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > >
> > > >
> > > > ------------------------------
> > > >
> > > > End of Wiki-research-l Digest, Vol 168, Issue 11
> > > > ************************************************
> > > >
> > >
> > >
> > > ------------------------------
> > >
> > > Message: 2
> > > Date: Thu, 22 Aug 2019 10:43:51 -0700
> > > From: Leila Zia <[hidden email]>
> > > To: Research into Wikimedia content and communities
> > >         <[hidden email]>
> > > Subject: Re: [Wiki-research-l] gender balance of wikipedia citations
> > > Message-ID:
> > >         <CAK0Oe2uCo70_=ma2b=2d+fvr4GseEVxOP0sh=
> > > [hidden email]>
> > > Content-Type: text/plain; charset="UTF-8"
> > >
> > > Hi Greg,
> > >
> > > A few comments if you're going to go with "proportion of male vs
> > > female authors of the source material used as citations in arbitrary
> > > articles":
> > >
> > > * Please differentiate between sex (female, male, ...) and gender
> > > (woman, man, ...). My understanding from your initial email is that
> > > you want to stay focused on gender, not sex.
> > >
> > > * Unless you have reliable sources about the gender of an author, I
> > > would not recommend trying to predict what the gender is. (As you may
> > > know, this is not uncommon in social media studies, for example, to
> > > predict the gender of the author based on their image or their name.
> > > These approaches introduce biases and social challenges.)
> > >
> > > * Re your question about whether WMF has resources to look into this
> > > question in-house: I can't speak for the whole of WMF, however, I can
> > > share more about the Research team's direction. As part of our future
> > > work, we would like to "help contributors monitor violations of core
> > > content policies and assess information reliability and bias both
> > > granularly and at scale". [1] The question you proposed can fall under
> > > assessing bias in content (considering citations as part of the
> > > content). I expect us to focus first on the piece about violations of
> > > core content policies and information reliability and come back to the
> > > bias question later. As a result, we won't have bandwidth to do your
> > > proposal in-house at the moment. Sorry about that.
> > >
> > > I hope this helps.
> > >
> > > Best,
> > > Leila
> > >
> > > [1] Section 2 of our Knowledge Integrity whitepaper:
> > >
> > >
> >
> https://upload.wikimedia.org/wikipedia/commons/9/9a/Knowledge_Integrity_-_Wikimedia_Research_2030.pdf
> > >
> > >
> > > On Thu, Aug 22, 2019 at 9:57 AM Greg <[hidden email]>
> wrote:
> > > >
> > > > Hi Kerry,
> > > > Those are all very interesting ways to look at this. I was thinking
> > > mostly
> > > > along the lines of your first bullet point, but I'd be interested in
> > > > research in any of those areas.
> > > >
> > > > Thanks,
> > > > Greg
> > > >
> > > > On Thu, Aug 22, 2019 at 5:00 AM <
> > > [hidden email]>
> > > > wrote:
> > > >
> > > > > Send Wiki-research-l mailing list submissions to
> > > > >         [hidden email]
> > > > >
> > > > > To subscribe or unsubscribe via the World Wide Web, visit
> > > > >
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > > > or, via email, send a message with subject or body 'help' to
> > > > >         [hidden email]
> > > > >
> > > > > You can reach the person managing the list at
> > > > >         [hidden email]
> > > > >
> > > > > When replying, please edit your Subject line so it is more specific
> > > > > than "Re: Contents of Wiki-research-l digest..."
> > > > >
> > > > >
> > > > > Today's Topics:
> > > > >
> > > > >    1. gender balance of wikipedia citations (Greg)
> > > > >    2. Re: gender balance of wikipedia citations (Kerry Raymond)
> > > > >
> > > > >
> > > > >
> > ----------------------------------------------------------------------
> > > > >
> > > > > Message: 1
> > > > > Date: Wed, 21 Aug 2019 20:19:18 -0700
> > > > > From: Greg <[hidden email]>
> > > > > To: [hidden email]
> > > > > Subject: [Wiki-research-l] gender balance of wikipedia citations
> > > > > Message-ID:
> > > > >         <
> > > > > [hidden email]
> >
> > > > > Content-Type: text/plain; charset="UTF-8"
> > > > >
> > > > > Greetings!
> > > > >
> > > > > I was looking for information about the gender balance of Wikipedia
> > > > > citations and no one I've asked knows of any work on this topic. Do
> > > you?
> > > > >
> > > > > I think this is an important question.
> > > > >
> > > > > Here's what I've learned so far:
> > > > >
> > > > > Wikipedia citations are currently in the form of text strings.
> There
> > is
> > > > > also an initiative to place citations in an annotated structured
> > > repository
> > > > > (wikicite). I do not know the current status of wikicite or if/when
> > > this
> > > > > could be used for this inquiry--either to examine all, or a
> sensible
> > > subset
> > > > > of the citations.
> > > > >
> > > > > My perspective is that understanding the gender balance is
> necessary
> > > and
> > > > > urgent. The balance could be better, the same, or worse than the
> > > citation
> > > > > balances we already know, and the scale of the effect is quite
> large.
> > > > >
> > > > > Is this a line of inquiry that the wikimedia/wikicite community is
> > > > > interested in pursuing? If so, what is the best way to get started?
> > > Does
> > > > > the WMF have the resources and interest to look into this matter
> > > inhouse?
> > > > >
> > > > > Thanks for your thoughts.
> > > > >
> > > > > Greg
> > > > >
> > > > >
> > > > > ------------------------------
> > > > >
> > > > > Message: 2
> > > > > Date: Thu, 22 Aug 2019 13:53:45 +1000
> > > > > From: "Kerry Raymond" <[hidden email]>
> > > > > To: "'Research into Wikimedia content and communities'"
> > > > >         <[hidden email]>
> > > > > Subject: Re: [Wiki-research-l] gender balance of wikipedia
> citations
> > > > > Message-ID: <00ed01d5589d$33e31ed0$9ba95c70$@gmail.com>
> > > > > Content-Type: text/plain;       charset="UTF-8"
> > > > >
> > > > > Could you elaborate a bit more on what you mean by the gender
> balance
> > > of
> > > > > citations?
> > > > >
> > > > > Are you talking about:
> > > > >
> > > > > * proportion of male vs female authors of the source material used
> as
> > > > > citations in arbitrary articles>
> > > > > *  the quality/quantity of citations in biography articles of men
> vs
> > > women?
> > > > > * the quality/quantity of citations in articles that are gendered
> by
> > > some
> > > > > other criteria (e.g. reader interest, romantic comedy vs action
> > film)?
> > > > >
> > > > > Kerry
> > > > >
> > > > > -----Original Message-----
> > > > > From: Wiki-research-l [mailto:
> > > [hidden email]]
> > > > > On Behalf Of Greg
> > > > > Sent: Thursday, 22 August 2019 1:19 PM
> > > > > To: [hidden email]
> > > > > Subject: [Wiki-research-l] gender balance of wikipedia citations
> > > > >
> > > > > Greetings!
> > > > >
> > > > > I was looking for information about the gender balance of Wikipedia
> > > > > citations and no one I've asked knows of any work on this topic. Do
> > > you?
> > > > >
> > > > > I think this is an important question.
> > > > >
> > > > > Here's what I've learned so far:
> > > > >
> > > > > Wikipedia citations are currently in the form of text strings.
> There
> > is
> > > > > also an initiative to place citations in an annotated structured
> > > repository
> > > > > (wikicite). I do not know the current status of wikicite or if/when
> > > this
> > > > > could be used for this inquiry--either to examine all, or a
> sensible
> > > subset
> > > > > of the citations.
> > > > >
> > > > > My perspective is that understanding the gender balance is
> necessary
> > > and
> > > > > urgent. The balance could be better, the same, or worse than the
> > > citation
> > > > > balances we already know, and the scale of the effect is quite
> large.
> > > > >
> > > > > Is this a line of inquiry that the wikimedia/wikicite community is
> > > > > interested in pursuing? If so, what is the best way to get started?
> > > Does
> > > > > the WMF have the resources and interest to look into this matter
> > > inhouse?
> > > > >
> > > > > Thanks for your thoughts.
> > > > >
> > > > > Greg
> > > > > _______________________________________________
> > > > > Wiki-research-l mailing list
> > > > > [hidden email]
> > > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > ------------------------------
> > > > >
> > > > > Subject: Digest Footer
> > > > >
> > > > > _______________________________________________
> > > > > Wiki-research-l mailing list
> > > > > [hidden email]
> > > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > > >
> > > > >
> > > > > ------------------------------
> > > > >
> > > > > End of Wiki-research-l Digest, Vol 168, Issue 11
> > > > > ************************************************
> > > > >
> > > > _______________________________________________
> > > > Wiki-research-l mailing list
> > > > [hidden email]
> > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > >
> > >
> > >
> > > ------------------------------
> > >
> > > Message: 3
> > > Date: Thu, 22 Aug 2019 13:36:17 -0700
> > > From: Leila Zia <[hidden email]>
> > > To: Research into Wikimedia content and communities
> > >         <[hidden email]>
> > > Subject: [Wiki-research-l] Wikimania 2019 disinformation meetup
> > >         follow-up
> > > Message-ID:
> > >         <CAK0Oe2sodYJpkuhSqgo3dtfDr=
> > > [hidden email]>
> > > Content-Type: text/plain; charset="UTF-8"
> > >
> > > Hi,
> > >
> > > This message is for those of you who attended the disinformation
> > > meet-up [0] in Wikimania 2019 [1] or others who may be interested.
> > >
> > > * The notes from our meet-up are now posted in the bottom of the page
> > [0].
> > >
> > > * I was tasked to see if space.wmflabs.org is the place for us to
> > > continue conversations about this topic. The answer is yes. Thanks to
> > > the help of Elena Lappen, we now have a dedicated subcategory for
> > > disinformation:
> > > https://discuss-space.wmflabs.org/c/research/disinformation . Feel
> > > free to subscribe, watch, and/or post new topics if you're involved in
> > > this space.
> > >
> > > * If you are new to this conversation, please read the purpose of the
> > > subcategory at
> > >
> >
> https://discuss-space.wmflabs.org/t/about-the-disinformation-category/949
> > > and welcome! :)
> > >
> > > Best,
> > > Leila
> > >
> > > [0] https://wikimania.wikimedia.org/wiki/2019:Meetups/Disinformation
> > > [1] https://wikimania.wikimedia.org/wiki/2019:Program
> > >
> > >
> > >
> > > ------------------------------
> > >
> > > Message: 4
> > > Date: Thu, 22 Aug 2019 22:43:53 +0000 (UTC)
> > > From: Mohammed Sadat Abdulai <[hidden email]>
> > > To: Research Into Wikimedia Content and Communities
> > >         <[hidden email]>
> > > Subject: [Wiki-research-l] Upcoming Research Newsletter (special issue
> > >         on gender gap research): New papers open for review
> > > Message-ID: <[hidden email]>
> > > Content-Type: text/plain; charset=UTF-8
> > >
> > >  Hi everyone,
> > > We’re preparing for the August 2019 research newsletter and looking for
> > > contributors. Please take a look at
> > > https://etherpad.wikimedia.org/p/WRN201908 and add your name next to
> any
> > > paper you are interested in covering. Our target publication date is on
> > 31
> > > August 11:59 UTC. As usual, short notes and one-paragraph reviews are
> > most
> > > welcome.
> > >  For the August edition, we are planning a special issue focusing
> mainly
> > > on recent gender gap/gender bias research. (Upcoming special issues
> > topics
> > > may include health and education.) There are about 20 papers from this
> > area
> > > on our todo list which will all be covered in the August issue, either
> > as a
> > > mere list item or - with your help - in form of a more informative
> > writeup
> > > or review. They include:
> > >    - Analyzing Gender Stereotyping in Bollywood Movies
> > >
> > >    - Breaking the glass ceiling on Wikipedia| journal
> > >
> > >    - Breastfeeding, Authority, and Genre: Women's Ethos in Wikipedia
> and
> > > Blogs
> > >
> > >    - Cyberfeminism on Wikipedia: Visibility and deliberation in
> feminist
> > > Wikiprojects
> > >
> > >    - Gender and deletion on Wikipedia
> > >
> > >    - Gender imbalance and Wikipedia
> > >
> > >    - Gender Markers in Wikipedia Usernames
> > >
> > >    - How do students trust Wikipedia? An examination across genders
> > >
> > >    - Investigating the Gender Pronoun Gap in Wikipedia
> > >
> > >    - It’s Not What You Think: Gender Bias in Information about Fortune
> > > 1000 CEOs on Wikipedia
> > >
> > >    - Mapping and Bridging the Gender Gap: An Ethnographic Study of
> Indian
> > > Wikipedians and Their Motivations to Contribute
> > >
> > >    - People Who Can Take It: How Women Wikipedians Negotiate and
> Navigate
> > > Safety
> > >
> > >    - Redressing Gender Inequities on Wikipedia Through an Editathon
> > >
> > >    - Similar Gaps, Different Origins? Women Readers and Editors at
> Greek
> > > Wikipedia
> > >
> > >    - Simulation Experiments on (the Absence of) Ratings Bias in
> > Reputation
> > > Systems
> > >
> > >    - The Gendered Presentation of Professions on Wikipedia
> > >
> > >    - Who Counts as a Notable Sociologist on Wikipedia? Gender, Race,
> and
> > > the “Professor Test”
> > >
> > >    - Who Wants to Read This?: A Method for Measuring Topical
> > > Representativeness in User Generated Content Systems
> > >
> > >    - Women and Wikipedia. Diversifying Editors and Enhancing Content
> > > through Library Edit-a-Thons
> > >
> > > Masssly and Tilman Bayer
> > >
> > > [1] Research:Newsletter - Meta[2] WikiResearch (@WikiResearch) on
> Twitter
> > >
> > >
> > > ------------------------------
> > >
> > > Subject: Digest Footer
> > >
> > > _______________________________________________
> > > Wiki-research-l mailing list
> > > [hidden email]
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > >
> > >
> > > ------------------------------
> > >
> > > End of Wiki-research-l Digest, Vol 168, Issue 12
> > > ************************************************
> > >
> >
> >
> > ------------------------------
> >
> > Message: 2
> > Date: Fri, 23 Aug 2019 14:41:09 +1000
> > From: "Kerry Raymond" <[hidden email]>
> > To: "'Research into Wikimedia content and communities'"
> >         <[hidden email]>
> > Subject: Re: [Wiki-research-l] gender balance of wikipedia citations
> > Message-ID: <001001d5596c$fe22a100$fa67e300$@gmail.com>
> > Content-Type: text/plain;       charset="utf-8"
> >
> > Yes, that was my thought. It would be difficult to know the sex (or the
> > gender) of an author name on a paper. There would inevitably be a lot
> that
> > you could not determine. And certainly in the sciences multi-author pages
> > are the norm and even where you did know the sex/gender of all, do you
> > assign some part-score? E.g. 0 for all male, 1 for all female, 0.6 for 3
> > women and 2 men.
> >
> > But I am curious why you are asking the question? That the
> > writing/research of women is under-represented in Wikipedia citations? If
> > so, without conducting any research, I'd say "yes it is
> under-represented".
> > But my reason would be because women are under-represented as
> > writers/researchers in the first place.  And certainly the older the
> > source, the more likely it is to be written by a man. So to investigate
> > gender bias in citations in Wikipedia, you would have to estimate the
> > proportion of men/women (or at least their outputs) over time in a given
> > discipline and then ask the question, "taking into account of the time of
> > publication of a citation and the proportion of men/women active in this
> > discipline at that time, do Wikipedia citations show a sex/gender
> basis?".
> > Hmm ... very tricky.
> >
> > I'd be inclined to suggest starting with a much simpler task. Pick a
> > discipline (preferably one with a professional society who can tell your
> > their estimate of current male/female ratio over (say) the past 5 years),
> > limit the Wikipedia articles to topics in that discipline, and limit the
> > citations to those published within the last 5 years. Indeed, perhaps
> > limiting it to publications that are principally from the same country(s)
> > as the professional society from which you get the data (as clearly
> > men/women's participation in any discipline can vary with different
> > countries for cultural reasons). Then you have some way to gauge whether
> > Wikipedia is showing more or less gender bias in its citations than the
> > discipline itself exhibits through publication. Quite a challenge!
> >
> > And of course, it is not Wikipedia that adds citations. It is individual
> > contributor who add citations. Does the sex/gender of the contributor
> have
> > any correlation to any observed bias? Again, the task is made more
> > difficult because a lot of Wikipedians don't identify their sex/gender.
> >
> > The other thing to be alert to is the difference in how (I believe)
> > Wikipedians cite compared to researchers. As a researcher, I will of
> course
> > be reading papers in my field all the time and what I read will influence
> > my subsequent work. Therefore when I write about my research, my
> citations
> > are referring to papers that I have already read and whose authors may be
> > familiar to me from their other work, having met them at a conferences,
> > private correspondence, etc. However as a Wikipedian, I am only partially
> > operating that way (mostly when I write new articles or significantly
> > expand them, that is, when I am doing the research). A lot of the time I
> am
> > adding citations relating to content other people (often new users) have
> > added/changed without citation. These come up on my watchlist all the
> time.
> > What do I do? Of course I could revert saying "no citation provided", but
> > that's not the way to encourage new contributors nor to grow the
> > encyclopedia, so if the information seems plausible (not obviously
> > vandalism), I will attempt to find a citation for it (using tools like
> > Google and other topic-specialise search tools). This is what I call
> "lucky
> > dip" mode of citing as obviously I have no idea what the source was for
> the
> > original contributor. The sources I find from my search may not already
> be
> > known to me (frequently they are not). Or to summarise, IMHO, researchers
> > (or Wikipedians in "new content mode") cite a source already known to
> them
> > and whose authors may be known to them and could consciously or
> > unconsciously engage in some discrimination in citation based on
> sex/gender
> > or other criteria, whereas Wikipedians in "updating mode" are likely to
> be
> > citing a source not previously known to them and may be happy just to
> have
> > found a source and are unlikely to be spending a lot of their time
> > researching the authors of that source to be extent they could then
> > consciously or unconsciously exercise discrimination on sex/gender. If I
> > invest any extra effort in such a situations, it's probably because the
> > wording of the source is a close match to the Wikipedia article which
> begs
> > the question of copyright violation (which needs to be dealt with by
> > deletion or rewriting) or being a Wikipedia mirror (which is obviously
> not
> > an acceptable citation).
> >
> > So I suspect whether a citation was added by the same contributor as the
> > content it supports or a subsequent contributor probably makes a
> difference
> > to the likelihood of conscious/unconscious discrimination.
> >
> > Also, finally, often Wikipedia cites web pages and other sources that do
> > not have any individual authorship, e.g. government websites. Remember
> that
> > Wikipedia prefers open citations over paywalled citations and a lot of
> the
> > publications behind paywalls are individually authored.
> >
> > Your proposed research has a lot of interesting challenges and a number
> of
> > limitations. I'm not saying don't do it, but I am saying start very small
> > and see if you can find any evidence to support your hypothesis before
> > embarking on a larger study. Because contributor behaviour is what you
> are
> > trying to study, you probably need to do both quantitative and
> qualitative
> > experiments. E.g. I have described the two modes of citation I do, but I
> > cannot say how typical my behaviour is.
> >
> > Kerry
> >
> > -----Original Message-----
> > From: Wiki-research-l [mailto:
> [hidden email]]
> > On Behalf Of Leila Zia
> > Sent: Friday, 23 August 2019 3:44 AM
> > To: Research into Wikimedia content and communities <
> > [hidden email]>
> > Subject: Re: [Wiki-research-l] gender balance of wikipedia citations
> >
> > Hi Greg,
> >
> > A few comments if you're going to go with "proportion of male vs female
> > authors of the source material used as citations in arbitrary
> > articles":
> >
> > * Please differentiate between sex (female, male, ...) and gender (woman,
> > man, ...). My understanding from your initial email is that you want to
> > stay focused on gender, not sex.
> >
> > * Unless you have reliable sources about the gender of an author, I would
> > not recommend trying to predict what the gender is. (As you may know,
> this
> > is not uncommon in social media studies, for example, to predict the
> gender
> > of the author based on their image or their name.
> > These approaches introduce biases and social challenges.)
> >
> > * Re your question about whether WMF has resources to look into this
> > question in-house: I can't speak for the whole of WMF, however, I can
> share
> > more about the Research team's direction. As part of our future work, we
> > would like to "help contributors monitor violations of core content
> > policies and assess information reliability and bias both granularly and
> at
> > scale". [1] The question you proposed can fall under assessing bias in
> > content (considering citations as part of the content). I expect us to
> > focus first on the piece about violations of core content policies and
> > information reliability and come back to the bias question later. As a
> > result, we won't have bandwidth to do your proposal in-house at the
> moment.
> > Sorry about that.
> >
> > I hope this helps.
> >
> > Best,
> > Leila
> >
> > [1] Section 2 of our Knowledge Integrity whitepaper:
> >
> >
> https://upload.wikimedia.org/wikipedia/commons/9/9a/Knowledge_Integrity_-_Wikimedia_Research_2030.pdf
> >
> >
> > On Thu, Aug 22, 2019 at 9:57 AM Greg <[hidden email]> wrote:
> > >
> > > Hi Kerry,
> > > Those are all very interesting ways to look at this. I was thinking
> > > mostly along the lines of your first bullet point, but I'd be
> > > interested in research in any of those areas.
> > >
> > > Thanks,
> > > Greg
> > >
> > > On Thu, Aug 22, 2019 at 5:00 AM
> > > <[hidden email]>
> > > wrote:
> > >
> > > > Send Wiki-research-l mailing list submissions to
> > > >         [hidden email]
> > > >
> > > > To subscribe or unsubscribe via the World Wide Web, visit
> > > >         https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > > or, via email, send a message with subject or body 'help' to
> > > >         [hidden email]
> > > >
> > > > You can reach the person managing the list at
> > > >         [hidden email]
> > > >
> > > > When replying, please edit your Subject line so it is more specific
> > > > than "Re: Contents of Wiki-research-l digest..."
> > > >
> > > >
> > > > Today's Topics:
> > > >
> > > >    1. gender balance of wikipedia citations (Greg)
> > > >    2. Re: gender balance of wikipedia citations (Kerry Raymond)
> > > >
> > > >
> > > > --------------------------------------------------------------------
> > > > --
> > > >
> > > > Message: 1
> > > > Date: Wed, 21 Aug 2019 20:19:18 -0700
> > > > From: Greg <[hidden email]>
> > > > To: [hidden email]
> > > > Subject: [Wiki-research-l] gender balance of wikipedia citations
> > > > Message-ID:
> > > >         <
> > > > [hidden email]>
> > > > Content-Type: text/plain; charset="UTF-8"
> > > >
> > > > Greetings!
> > > >
> > > > I was looking for information about the gender balance of Wikipedia
> > > > citations and no one I've asked knows of any work on this topic. Do
> > you?
> > > >
> > > > I think this is an important question.
> > > >
> > > > Here's what I've learned so far:
> > > >
> > > > Wikipedia citations are currently in the form of text strings. There
> > > > is also an initiative to place citations in an annotated structured
> > > > repository (wikicite). I do not know the current status of wikicite
> > > > or if/when this could be used for this inquiry--either to examine
> > > > all, or a sensible subset of the citations.
> > > >
> > > > My perspective is that understanding the gender balance is
> > > > necessary and urgent. The balance could be better, the same, or
> > > > worse than the citation balances we already know, and the scale of
> the
> > effect is quite large.
> > > >
> > > > Is this a line of inquiry that the wikimedia/wikicite community is
> > > > interested in pursuing? If so, what is the best way to get started?
> > > > Does the WMF have the resources and interest to look into this matter
> > inhouse?
> > > >
> > > > Thanks for your thoughts.
> > > >
> > > > Greg
> > > >
> > > >
> > > > ------------------------------
> > > >
> > > > Message: 2
> > > > Date: Thu, 22 Aug 2019 13:53:45 +1000
> > > > From: "Kerry Raymond" <[hidden email]>
> > > > To: "'Research into Wikimedia content and communities'"
> > > >         <[hidden email]>
> > > > Subject: Re: [Wiki-research-l] gender balance of wikipedia citations
> > > > Message-ID: <00ed01d5589d$33e31ed0$9ba95c70$@gmail.com>
> > > > Content-Type: text/plain;       charset="UTF-8"
> > > >
> > > > Could you elaborate a bit more on what you mean by the gender
> > > > balance of citations?
> > > >
> > > > Are you talking about:
> > > >
> > > > * proportion of male vs female authors of the source material used
> > > > as citations in arbitrary articles>
> > > > *  the quality/quantity of citations in biography articles of men vs
> > women?
> > > > * the quality/quantity of citations in articles that are gendered by
> > > > some other criteria (e.g. reader interest, romantic comedy vs action
> > film)?
> > > >
> > > > Kerry
> > > >
> > > > -----Original Message-----
> > > > From: Wiki-research-l
> > > > [mailto:[hidden email]]
> > > > On Behalf Of Greg
> > > > Sent: Thursday, 22 August 2019 1:19 PM
> > > > To: [hidden email]
> > > > Subject: [Wiki-research-l] gender balance of wikipedia citations
> > > >
> > > > Greetings!
> > > >
> > > > I was looking for information about the gender balance of Wikipedia
> > > > citations and no one I've asked knows of any work on this topic. Do
> > you?
> > > >
> > > > I think this is an important question.
> > > >
> > > > Here's what I've learned so far:
> > > >
> > > > Wikipedia citations are currently in the form of text strings. There
> > > > is also an initiative to place citations in an annotated structured
> > > > repository (wikicite). I do not know the current status of wikicite
> > > > or if/when this could be used for this inquiry--either to examine
> > > > all, or a sensible subset of the citations.
> > > >
> > > > My perspective is that understanding the gender balance is
> > > > necessary and urgent. The balance could be better, the same, or
> > > > worse than the citation balances we already know, and the scale of
> the
> > effect is quite large.
> > > >
> > > > Is this a line of inquiry that the wikimedia/wikicite community is
> > > > interested in pursuing? If so, what is the best way to get started?
> > > > Does the WMF have the resources and interest to look into this matter
> > inhouse?
> > > >
> > > > Thanks for your thoughts.
> > > >
> > > > Greg
> > > > _______________________________________________
> > > > Wiki-research-l mailing list
> > > > [hidden email]
> > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > >
> > > >
> > > >
> > > >
> > > > ------------------------------
> > > >
> > > > Subject: Digest Footer
> > > >
> > > > _______________________________________________
> > > > Wiki-research-l mailing list
> > > > [hidden email]
> > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > >
> > > >
> > > > ------------------------------
> > > >
> > > > End of Wiki-research-l Digest, Vol 168, Issue 11
> > > > ************************************************
> > > >
> > > _______________________________________________
> > > Wiki-research-l mailing list
> > > [hidden email]
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> > _______________________________________________
> > Wiki-research-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> >
> >
> >
> > ------------------------------
> >
> > Subject: Digest Footer
> >
> > _______________________________________________
> > Wiki-research-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> >
> > ------------------------------
> >
> > End of Wiki-research-l Digest, Vol 168, Issue 13
> > ************************************************
> >
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: gender balance of wikipedia citations

Federico Leva (Nemo)
WereSpielChequers, 02/09/19 17:10:
> If you can come up with software that
> identifies sources that we aren't using but should then that would make for
> some interesting reports on Wikiprojects, or an interesting opportunity for
> the wiki library.

Good point. This also came up at a research meetup at Wikimania, where
the question was what corpora could be used (one proposal is the
Internet Archive): see notes at
https://wikimania.wikimedia.org/wiki/2019:Meetups/Affiliates_and_Research

A related proposal was
<https://github.com/eggpi/citationhunt/issues/137>

>
> I'd particularly like to see something along the lines of a bot that sends
> messages to active Wikipedia editors "As an editor who has been active in
> topic zzzzzz we would like to send you a free copy of the new book xxxxxx
> by yyyyy. Click here to arrange your free copy"

I would be interested in helping you do this (for Italian-language
editors?). I think it can easily be written in a way that doesn't sound
like a promotion for the specific item.

Federico

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: gender balance of wikipedia citations

Greg-2
Hi WSC,

That is an interesting idea, and the question of the corpus to draw from
(or build) is key. In my own work, I have used the following techniques:

- direct solicitations from domain experts. I will express my interest in
consulting a diverse range of perspectives and will ask for book titles
and/or the names of people from underrepresented demographics who are doing
work in the field.

- name-database. I start from a list of underrepresented domain experts (I
have found some such things on wikipedia itself as well as places like
500womenscientists) and I will look to see if any of the people have books
on the topic I'm researching.

For now, my interest remains focused on understanding the current gender
balance of the citations on Wikipedia. To this end, I am still very
interested in hearing from someone with knowledge of wikidata about how
best to use it and any limitations I should be aware of.

Thanks,
Greg

On Mon, Sep 2, 2019 at 10:17 AM Federico Leva (Nemo) <[hidden email]>
wrote:

> WereSpielChequers, 02/09/19 17:10:
> > If you can come up with software that
> > identifies sources that we aren't using but should then that would make
> for
> > some interesting reports on Wikiprojects, or an interesting opportunity
> for
> > the wiki library.
>
> Good point. This also came up at a research meetup at Wikimania, where
> the question was what corpora could be used (one proposal is the
> Internet Archive): see notes at
> https://wikimania.wikimedia.org/wiki/2019:Meetups/Affiliates_and_Research
>
> A related proposal was
> <https://github.com/eggpi/citationhunt/issues/137>
>
> >
> > I'd particularly like to see something along the lines of a bot that
> sends
> > messages to active Wikipedia editors "As an editor who has been active in
> > topic zzzzzz we would like to send you a free copy of the new book xxxxxx
> > by yyyyy. Click here to arrange your free copy"
>
> I would be interested in helping you do this (for Italian-language
> editors?). I think it can easily be written in a way that doesn't sound
> like a promotion for the specific item.
>
> Federico
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
12