A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, edited jointly with the Wikimedia Research Committee and republished as the Wikimedia Research Newsletter.
How Wikipedia built governance capability, 2001–2009
This paper looks at the topic of Wikipedia governance in the context of online social production, which is contrasted with traditional, contract-bound, hierarchical production models that characterize most organizational settings. Building on the dynamic capabilities theory, the authors introduce a new concept, "collective governance capability", which they define as "the capability of a collective arrangement to steer a production process and an associated interaction system". The authors ask the research question, "How does a collective governance capability to create and maintain value emerge and evolve in online social production?"
Figure from the paper: "The number of monthly contributors and the number of contributor clusters in the English Wikipedia from January 2001 to December 2009."
Quantitative analysis: The authors processed a dump of the full history of the English Wikipedia's first nine years. For each of the 108 months from January 2001 to December 2009 and each editor, that editor's activity was described by the following numbers: "the number of edits and pages edited, median [Levenshtein] edit distance and article length change, the number of reverted edits, and reverts done [...., in] four namespaces: encyclopedia articles, article talk pages, policies and guidelines, and policies and guidelines talk pages". A cluster analysis is then performed for each month to group editors into sets of similar editing behavior. The authors report: "we identify a slow initiation period followed by a period of extremely rapid growth, and, finally, levelling out and a slight decline. In the first phase, there is only a minimal differentiation of contributors into clusters. The second phase of exponential growth is characterized by increasing differentiation of contributors, while the number of clusters stabilizes in the third phase. The statistics provide only a very rough depiction of a complex system, but they certainly suggest that, whatever governance mechanisms have been in place, they have had to deal with dramatically different circumstances over the years."
qualitative analysis: Building on these three phases identified via descriptive statistics, the authors construct "theoretical narrative ... [using] a highly selective representation of empirical material that advances the plot of capability-building", including discussion of the history of policies, processes and events including IAR, 3RR, FAR, bot policy, flagged revisions, the 2005 Nature study comparing Wikipedia's quality with Britannica's, the Seigenthaler affair the same year, etc.
The researchers note that Wikipedia governance has changed significantly over the years, becoming less open and more codified, which they seem to acknowledge as a positive change. The authors' main conclusion stresses, first, that governance could itself be a dynamic, evolving process. Second, that new kinds of governance mechanisms make it possible to create significant value by harnessing knowledge resources that would be very difficult to seize through a market or corporate system. Third, that the lack of a contractually sanctioned governance framework means that people have to learn to deal directly with each other through peer-based interaction and informal agreements, which in turn creates opportunities for self-improvement through learning. Fourth, the authors note that the new type of governance models are constantly evolving and changing, meaning they have a very fluid structure that is difficult to describe, and may be better understood instead as changing combinations of different, semi-independent governance mechanisms that complement one another. Finally, they stress the importance of technology in making those new models of governance possible.
Readability of plastic surgery articles examined
The subject of readability of online patient materials for Plastic Surgery topics was recently assessed by teams from Beth Israel Medical Center at the Harvard Medical School. Readability scores are generally expressed as a grade level: Higher grade levels indicate that that content is more difficult to read. According to the authors, "nearly half of American adults have poor or marginal health literacy skills and the NIH (National Institute of Health) and AMA (American Medical Association) have recommended that patient information should be written at the sixth grade level". The aim of their research was to calculate readability scores for the most popular web pages displaying procedure information and compare the results to the sixth grade reading level recommendation.
The core author group published two papers, "Online Patient Resources for Liposuction", in Annals of Plastic Surgery , and "Assessment of Online Patient Materials for Breast Reconstruction", in Journal of Surgical Research. The authors concentrated on the topics of "liposuction" and "tattoo information" in one paper, and focused solely on the topic of "breast reconstruction" in the second paper. Readability scores were accessed in both papers, but the breast reconstruction paper added an analysis of ‘complexity’ and ‘suitability’ to more comprehensively evaluate reading level.
For each procedure term topic, websites selected for analysis were based on the top 10 links resulting from the Google search query. The top 10 links were identified as the 10 most common websites for that search term.
The authors concluded that the readability of online patient information for ‘liposuction’ and ‘breast reconstruction’ is ‘too difficult’ for many patients as the readability scores of all 20 websites (10 each) far exceeds that of a 6th-grade reading level. The average score for the most popular ‘liposuction’ websites was determined equal to 13.6-grade level. As a comparison ‘tattoo information’ scored at the 7.8-grade level.
Health care information available at the most popular websites for ‘breast reconstruction’ had an average readability score of 13.4, with 100% of the top 10 websites providing content far above the recommended 6th grade reading level . Wikipedia.org readability scores aligned at the higher readability range for both terms, with scores above the 14 grade level for ‘liposuction’, and above grade 15 for ‘breast reconstruction’.
When other metrics such as ‘complexity’ and ‘suitability’ were applied to the Breast Reconstruction websites, the content appeared to be more friendly towards less educated readers. Complexity analysis using PMOSE/iKIRSCH yielded an average score of 8th–12th grade level. In a testament to how images and topography enhance user readability, the breast reconstruction paper also employed the SAM ‘suitability’ formula. This metric concluded that 50% of the websites were ‘adequate’. The SAM formula gives weight to the contribution that images, bulleted lists, subheadings, and video make to the readability of content. Wikipedia.org was found to be ‘unsuitable’ along with Komen.org, BreastReconstruction.com, WebMD.com, and MedicineNet.com.
In conjunction with the ‘readability score’, the PMOSE and SAM metric helped to achieve a more comprehensive view of a patient’s ability to read and comprehend the breast reconstruction material.
Liposuction paper methodology
After articles from the 10 websites with liposuction content were stripped of images and videos, the plain text content was analyzed using ten established readability formulas. These included Coleman–Liau, Flesch–Kincaid, Flesch reading ease, FORCAST, Fry graph, Gunning fog, New Dale–Chall, New Fog count, Raygor estimate, and SMOG. All readability formulas in this paper relied on some combination of word length, syllable count, word complexity, and sentence length. Longer word lengths and sentence lengths compute to higher reading levels. Similarly, words of three or more syllables increase the grade level readability scores. These text-based readability scores do not include the impact that images or graphics have on readers.
In an effort to compare readability scores for a procedure ‘similar’ to liposuction, the authors performed the same type of analysis on the term ‘tattoo information’. Not surprisingly, the query for ‘tattoo information’, a simpler procedure, yielded content with average readability scores of 7.8-grade level.
Based on this wide gap of 5.8 grade levels in readability scores between ‘liposuction’ and ‘tattoo’ literature, the authors pose the question , “So why is this (tattoo) information significantly easier to read than liposuction?” The authors do present good example strategies for rewriting some liposuction content at lower reading levels. However, the authors do not convincingly clarify why the two procedures should have similar low readability levels. The average education levels of the target audience for "liposuction" and "tattoo information" is not well documented in the paper, and it is questionable if they are equal.
According to ASPS statistics, 50% of liposuction patients are over 40 years old. Are 50% of the people seeking tattoos over age 40? While age does not equal reading level, it may certainly give a hint.
Furthermore, the authors downplay the complexity of the liposuction procedure in comparison to tattooing. Liposuction is an invasive procedure performed by a credentialed surgeon and anesthesiologist under IV or General Anesthesia in an accredited outpatient surgery center. The tools, equipment, and anesthetics used in the technique are not simple, common words.
Unlike surgeons, tattoos artists do not require any type of formal medical training or certification. The tattoo procedure does not involve the complexities of pre-operative clearance, fat extraction , fluid and electrolyte regulation, anesthesia administration , or vital sign monitoring. Likewise, the liposuction procedure description is destined to be longer, more technical, and likely requires higher readability levels than tattooing.
Top 10 Google links used in methodology
One consideration which is not discussed by these and other published authors evaluating online content readability, is the fact that Google uses the Dale-Chall and Flesch Kincaid readability formulas in its Penguin algorithm. However, rather than punish high (difficult) readability scores, the algorithm is thought to punish low grade level readability scores. In 2013, the UK analytics company MathSight determined[supp 1] that the Penguin algorithm penalized websites with low grade level readability scores. After the MathSight finding, many SEO experts concluded that Google favors content written at a higher educational level.
In light of this, and regarding the typical methodology of obtaining the data set from Google’s top 10 links, one must question if Google would ever rank a medical content website with a grade 6 readability score higher than a website with a grade 13 readability score. Perhaps even more importantly, most website publishers want what Google wants. Competition is fierce for a spot in the top 10 links. Therefore, as long as online content publishers believe that Google favors well written, well researched, sophisticated content, it might be a tough sell to persuade medical content publishers to oversimplify their content to a sixth grade reading level.
Fukushima discussions in the English and Japanese Wikipedias: Similar to several other pieces of research, this paper looks at social production of knowledge in the context of a single, controversial Wikipedia topic, this time, the Fukushima Daiichi nuclear disaster. The authors compare the discussions in the English and Japanese Wikipedias, noting that (as we would expect) the English one attracts more global audience. Both communities were primarily focused on writing an encyclopedic article, though, contrary to the authors' expectation, it was the English Wikipedia editors who were more likely to raise topics not directly related to the creation of the article. Overall, although the paper is primarily descriptive and does not provide much discussion to enhance existing social theories, it creates a framework to understand the types of discourses among Wikipedia contributors. Other researchers can use the framework to analyze various topics on Wikipedia.
Other recent publications
A list of other recent publications that could not be covered in time for this issue – contributions are always welcome for reviewing or summarizing newly published research.
"Wikipedia and medicine: quantifying readership, editors, and the significance of natural language"
"One-shot Wikipedia: An edit-sprint toward information literacy" From the abstract: "In this case study, a Wikipedia-editing activity was incorporated into two-hour one-shot instruction sessions. ... While a great deal of attention has been paid to teaching with multi-week Wikipedia assignments and coursework, evidence from this project suggests that Wikipedia-related activities can be used effectively within much narrower time constraints."
"Unsupervised biographical event extraction using Wikipedia traffic" From the introduction: "We hypothesise that when a notable event happens to a person, traffic to their Wikipedia page peaks abruptly, and an edit is made to their page describing the event. To explore this hypothesis, a simple outlier-based method is applied to extract peaks (short periods of sudden activity) from Wikipedia page traffic data, which are used to locate page edits which align to sentences contributing to the notability of the page subject."
"The Internet School of Medicine: use of electronic resources by medical trainees and the reliability of those resources" (blog summary: )
"Wikipedia knowledge community modeling" (book chapter / reference work entry)
"Domain-specific semantic relatedness from Wikipedia structure: a case study in biomedical text" (book chapter)
"Wikipedia – challenges and new horizons in enhancing medical education"
"Coverage of European parties in European language Wikipedia editions"
"Context-aware detection of sneaky vandalism on Wikipedia across multiple languages"
"Google and Wikipedia in the professional translation process: a qualitative work" (related paper by the same author)
"Coordination and efficiency in decentralized collaboration" (conference paper submitted to ICWSM 2015). From the abstract: "we consider the trade-offs inherent in coordination in [decentralized on-line collaboration environments], balancing the benefits to collaboration with the cost in effort that could be spent in other ways. We consider two diverse domains that each contain a wide range of collaborations taking place simultaneously – Wikipedia and GitHub – allowing us to study how coordination varies across different projects. We analyze trade-offs in coordination along two main dimensions, finding similar effects in both our domains of study: first we show that, in aggregate, high-status projects on these sites manage the coordination trade-off at a different level than typical projects; and second, we show that projects use a different balance of coordination when they are "crowded", with relatively small size but many participants."
^Alexander Hogue, Joel Nothman and James R. Curran. 2014. Unsupervised biographical event extraction using wikipedia traffic. In Proceedings of Australasian Language Technology Association Workshop, pages 41–49. http://www.aclweb.org/anthology/U14-1006
^Egle, Jonathan P.; Smeenge, David M.; Kassem, Kamal M.; Mittal, Vijay K. (April 2015). "The Internet School of Medicine: Use of electronic resources by medical trainees and the reliability of those resources". Journal of Surgical Education. 72 (2): 316â€“320. doi:10.1016/j.jsurg.2014.08.005. ISSN1878-7452. PMID25487347.