Jump to content

Wikipedia:Wikipedia Signpost/2015-02-25/Recent research

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by NeilK (talk | contribs) at 16:34, 25 February 2015 (added pics, better links, a few typos fixed, critique added). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Recent research

(Your article's descriptive subtitle here)

A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.

First Women, Second Sex: Gender Bias in Wikipedia

by Maximilianklein (talk)
File:Simone De Beuvoir3.jpg
"it is not women's inferiority that has determined their historical insignificance; it is their historical insignificance that has doomed them to inferiority" ~ Beauvoir

The problem of the Gender Gap in Wikipedia can mean several things; a gap in editors, or a gap in the content, and of course the relation between the two. "First Women, Second Sex: Gender Bias in Wikipedia" [1] addresses the gap from the content-side, with justification by many Simone de Beauvoir quotes. The authors use an ensemble of three methods - DBPedia metadata, language modelling, and network theroy - to shown not just inequality in encyclopedia inclusion, but degrees of sexism in how biographies are included. For instance, how different genders meet notability is quantifiably different, as is the centrality of biographies in their link structure.

The initial metadata technique is an inspection of DBPedia data mashed up with a separate dataset from previous research based on pronoun counting techniques. This method is a bit shaky as it relies on the combination of two derived datasets, especially in an era when Wikidata can deliver data closer to the source. Nevertheless the researchers find that 15.5% of their final dataset are Women biographies, and this corroborates 16.1% and 15.6% estimations by alternative techniques[2]. Digging further, biographies are separated by subclass: athletes, politicians, military-personnel, and all othersare more heavily male - only artists and royalty are female-biased. Other findings from this type of infobox scraping is that female biographies are much more likely to have the spouse parameter filled.

Moving into the natural language realm, the paper inspects bigrams of the biographies' text. The top words associated with men are "played", "football" and "league"; for women, the top are "actress", "women's" and "her husband". This already starts to hint at the notion that men are notable for what they do, rather than only their static characteristics. To further investigate Linguistic Inquiry and Word Count (LIWC) and two measures - frequency and burstiness - are employed for semantic classification. The semantic category where male biographies score significantly higher is cognitive mechanics, which encompasses words like "became", "known", and "made"; meanwhile female biographies have significantly more sexual words like "love", "passion", and "sex".

The last domain explored is network structure. Each biography links-to, and is linked-from other biographies, and thus make a directed graph. The first interesting thing to note is that in chi-squared testing between 4 link types (female-female, female-male, male-male, male-female), only female-female occur more than expected. Next a PageRank ranking is made of the graph, which determines the importance or "centrality" of biographies. Any subsetting of biographies by removing the least PageRanked articles, it is found, reduces the female ratio of the subset below the total figure.

The authors wrap up their conclusions within the context of feminist theory. They argue the notion of gender roles is evident in Wikipedia in the way that metadata shows that men are more often known to be sportspeople, and women to be artists, royalty or spouses of someone else. Likewise the language of biographies is biased. That "her husband" and "first woman" are top terms in female articles shows a failure in the Finkbeiner test. Furthermore the authors claim this exhibits "objectification" in light of the evidence that the "cognitive processes" of men where shown to be more significant than women, and that the "sexual" category is the only one in which women are more frequently described than men. Finally, as viewed from the network structure results, female biographies are less central to the encyclopedia. This is said to be because of historical philosophy and today's notability guidelines, that "reason and objectivity are gendered male" - a feminist metaphysical view. The explanation of female articles tending to link to other female articles more than expected, the authors imagine is due to women-led gender gap addressing efforts.

Overall this article provides a wide variety of methods to measure the gender gap, which proves a high-level point from many perspectives. It is situated in feminist thought, but multiple returns to Beauvoir make the final analysis seem surface and general. Additionally the simplifying assumptions of English-only and derived datasets leave open the criticism that the larger points cannot be disentangled from a few extra biases introduced by language- and processing-inherited lenses. The authors admit as much in their limitations when they also acknowledge not questioning the gender binary either. What we have here though is an increment to a growing pile of methods and techniques proving the gender gap which, ideologically, does not need, but can always benefit from additional statistical legitimacy.


Wikipedia’s SOPA Strike considered as international political movement

by NeilK (talk)
A screenshot of the English Wikipedia landing page, symbolically its only page during the blackout on January 18, 2012.
A paper[3] written by prolific Wikipedian Piotr Konieczny revisits the SOPA Strike. This was a 24-hour blackout of the English Wikipedia in 2012 to protest against proposed American copyright legislation, accompanied by tools for citizens to contact their representatives on the issue. The author argues this event demonstrates a new political opportunity structure for international movements, such as the free culture movement, to influence national policies.

A chronology of the events leading up to the SOPA Strike on Wikipedia is presented. The author then analyzes Wikipedia’s forums debating whether and how to restrict access to the site for a day. Debate participants are classified by such characteristics as national origin, history of editing Wikipedia, and stated arguments for and against, and simple quantitative analyses of population percentages and relative contribution are performed. Konieczny then tests various hypotheses about the nature of the protest, to see which one fits the facts.

Konieczny concludes that experienced Wikipedians were generally supportive of a protest but were more likely to express misgivings about losing neutrality. Americans also participated in a greater proportion than their prevalence on the English Wikipedia. However the process also allowed non-US citizens and free culture idealists to have significant leverage over the debate on Wikipedia, and thus on American national politics. Konieczny tries to show that Wikipedia is thus an international social movement in the broader free culture movement. Konieczny ends the paper with a speculation that the many pro-blackout single-purpose accounts may reflect a new political consciousness among the young and internet-savvy.

Konieczny's analysis gives us a very detailed, fascinating picture of what arguments were made in public on Wikipedia forums during a crucial few weeks. However, this may omit some of the most influential discussions, by insiders, taking place person-to-person and in chat rooms. The paper also omits discussion of the influence of the Wikimedia Foundation, as an American institution responding to a American legal threat.

When Konieczny asserts the existence of a rising transnational "Net Generation", he's presented very little evidence. A skeptical or quietist Wikipedian might still conclude that the encyclopedia wasn't acting as an organ of democracy, but was briefly overrun by a Twitter trending topic. If Konieczny is right, we may see other internet-based communities also being pressed into service, or more permanent institutions being developed to serve this new community.

Full disclosure: I (NeilK) was intimately involved with the SOPA Strike movement on Wikipedia, as a technologist on the WMF staff, and as a concerned Wikipedian who weighed in on the very forums analyzed in this paper, in favor of a blackout.


Briefly

  • ...:
  • ...:

Other recent publications

A list of other recent publications that could not be covered in time for this issue – contributions are always welcome for reviewing or summarizing newly published research.

  • "..."
  • "..."

References

  1. ^ Graells-Garrido, Eduardo (2015-02-08). "First Women, Second Sex: Gender Bias in Wikipedia". arXiv:1502.02341 [cs]. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)
  2. ^ "Wiki Research Mailing List Discussion".
  3. ^ Konieczny, Piotr (2014-09-29). "The day Wikipedia stood still: Wikipedia's editors' participation in the 2012 anti-SOPA protests as a case study of online organization empowering international and national political opportunity structures". Current Sociology. I (23): 77–93. doi:10.1177/0011392114551649. Retrieved 2015-02-25.
Supplementary references and notes: