Wikipedia:Wikipedia Signpost/2018-12-01/Recent research

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
Why do the most active Wikipedians burn out?; only 4% of students vandalize: And other new research results.
Wikimedia Research Newsletter Logo.png
A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, edited jointly with the Wikimedia Research Committee and republished as the Wikimedia Research Newsletter.

"Volunteer Retention, Burnout and Dropout in Online Voluntary Organizations: Stress, Conflict and Retirement of Wikipedians" by Piotr Konieczny

Reviewed by Bri

This paper[1] begins with a review of prior research on various reasons for editor dropout on Wikipedia, which focus on the stress of interpersonal conflict and overburdened volunteers, especially admins. It then adds the methods and findings from new research on "more experienced and active Wikipedians", the 1% contributing the most time and content. One startling perspective gained from the survey of the 300 most active Wikipedians (with a 41% survey response rate!) is a lack of recognition from the wider academic and professional community. Volunteer Wikipedia editing is not often treated as a "legitimate" volunteer activity contributing e.g. to professional development. The author also states "the current Wikimedia Foundation efforts directed at increasing positive reinforcement, developed with a focus on increasing the retention of new editors ... may be much less efficient ... when it comes to ... long-term highly active contributors" and concludes that more research on interpersonal conflict as a motivation for retirement needs to be conducted.


  • The Wikimedia Foundation's "Audiences" department (which develops and maintains software features for Wikipedia and its sister sites) has published a series of draft essays designed to inform their longer-term product development planning, each drawing heavily from existing research (much of which has been covered in this research report over the year). They focus on the topics of "Trust", "Experience", "Scale", "Augmentation", "Culture", and "Tools".

Conferences and events

See the research events page on Meta-wiki for upcoming conferences and events, including submission deadlines.

Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. contributions are always welcome for reviewing or summarizing newly published research.

Compiled by Tilman Bayer

From the proceedings of OpenSym 2018:

Only 4% of students vandalize Wikipedia – motivated by boredom, amusement or ideology (according to their peers)

From the abstract:[2] "This research aims to find the extent to which a particular group of university students vandalize Wikipedia, while also exploring their perceptions of vandalism. Data is obtained from a questionnaire sent to university students in educational psychology, early and primary childhood education, and related master’s programs, as well as a focus group involving a sample of these students and interviews with editors in charge of maintaining Wikipedia. [...] it seems that students and editors have some preconceived ideas (boredom, amusement, or ideological motivations) about what pushes individuals to vandalize."

Among the 928 survey participants, only 39 (4%) reported to have vandalized Wikipedia. Younger students were vandalizing more often ("there is a meaningful difference between students under 23 (5.3 % of them vandalize) and both students from 24 to 30 (1.9 %) and from 31 to 40 (0%)"), but on the other hand there was no significant difference between male and female respondents.

Could gamification increase participation in Wikimedia Commons?

From the abstract:[3] ".... in comparison to photosharing sites like Flickr and mobile apps like Instagram, Commons is largely unknown to the general public and under-researched by scholars. We conducted an exploratory study to determine if an alternative means of contribution—a mobile application that gamifies implicitly desirable and useful behavior—could broaden awareness of and participation in Commons. Our findings from an online survey (N=103) suggest that by creating value around implicitly desirable behaviors, we can create new opportunities and alternative pathways for both increasing and broadening participation in peer-production communities such as Commons."

"Triggering" article contributions by adding factoids

From the paper:[4] "The analysis shows that the introduction of a few key terms acts as milestones for the evolution of the article. When a factoid is added, more knowledge related to that factoid is likely to be added. However, different users get triggered differently, leading to the inclusion of diversified knowledge into the articles." (compare also "Do less active participants make active participants more active?" below)

See also our earlier coverage of other OpenSym 2018 papers:

Other publications:

"Do less active participants make active participants more active? An examination of Chinese Wikipedia"

From the abstract:[5] "In this study, we probe the indirect influence of less active participants' contributing behaviors on the quality of knowledge collaboration. [...] Using the edit data of featured articles in the Chinese Wikipedia, we examine the proposed causal path. The main findings of this study are as follows: the productivity of active participants of a Wikipedia article increases when they are triggered by less active participants' editing activities; the additional edits of active participants triggered by less active participants can improve the quality of an article; and less active participants play a major role in reviving the editing work of dormant articles. These findings reveal that less active participants play a substantial role in knowledge collaboration in online communities, as their contributing behaviors sustain collaborative work and eventually improve the quality [of Wikipedia]." (compare also "Triggering" paper above)

Filling in missing Wikipedia infobox attributes

From the paper's[6] introduction: "51% of entity attributes in English Wikipedia infoboxes are not described in English Wikipedia articles. We aim to fill in this knowledge gap via a system that can take an entity as input and automatically generate a natural language description.”

"Neural Article Pair Modeling for Wikipedia Sub-article Matching"

From the abstract:[7] "Nowadays, editors tend to separate different subtopics of a long Wikipedia article into multiple sub-articles. This separation seeks to improve human readability. However, it also has a deleterious effect on many Wikipedia-based tasks that rely on the article-as-concept assumption, which requires each entity (or concept) to be described solely by one article. This underlying assumption significantly simplifies knowledge representation and extraction [...] In this paper we provide an approach to match the scattered sub-articles back to their corresponding main-articles, with the intent of facilitating automated Wikipedia curation and processing." (see also related earlier coverage: "Problematizing and Addressing the Article-as-Concept Assumption in Wikipedia")

"World Influence of Infectious Diseases from Wikipedia Network Analysis"

From the abstract:[8] "We consider the network of 5 416 537 articles of English Wikipedia extracted in 2017. Using the recent reduced Google matrix (REGOMAX) method we construct the reduced network of 230 articles (nodes) of infectious diseases and 195 articles of world countries. [...] PageRank and CheiRank algorithms are used to determine the most influential diseases with the top PageRank diseases being Tuberculosis, HIV/AIDS and Malaria. From the reduced Google matrix we determine the sensitivity of world countries to specific diseases integrating their influence over all their history including the times of ancient Egyptian mummies. The obtained results are compared with the World Health Organization (WHO) data demonstrating that the Wikipedia network analysis provides reliable results with up to about 80 percent overlap between WHO and REGOMAX analyses."

"Navigating the Spoken Wikipedia"

From the abstract:[9] "The Spoken Wikipedia project unites volunteer readers of encyclopedic entries. Their recordings make encyclopedic knowledge accessible to persons who are unable to read [...]. However, [these recordings] can only be consumed linearly [...]. We present a reading application which uses an alignment between the recording, text and article structure and which allows to navigate spoken articles, through a graphical or voice-based user interface (or a combination thereof). We present the results of a usability study in which we compare the two interaction modalities. We find that both types of interaction enable users to navigate articles and to find specific information much more quickly compared to a sequential presentation of the full article." (see also code on GitHub)

Participation patterns on Wikidata

From the abstract and paper:[10] "This paper builds upon previous research, where we identified six common participation patterns, i.e. roles, in Wikidata. In the research presented here, we study the applicability of sequence analysis methods by analyzing the dynamics in users’ participation patterns. The sequence analysis is judged by its ability to answer three questions: (i) 'Are there any preferable role transitions in Wikidata?'; (ii) 'What are the dominant dynamic participation patterns?'; (iii) 'Are users who join earlier more turbulent contributors?' [answer: "the earlier an user joins Wikidata, the more turbulent his/her dynamic participation pattern is"] Our data set includes participation patterns of about 20,000 users in each month from October 2012 to October 2014."


  1. ^ Konieczny, Piotr (2018-10-02). "Volunteer Retention, Burnout and Dropout in Online Voluntary Organizations: Stress, Conflict and Retirement of Wikipedians". In Coy, Patrick G. Research in Social Movements, Conflicts and Change. Research in Social Movements, Conflicts and Change. 42. Emerald Group. pp. 199–219. doi:10.1108/S0163-786X20180000042008. ISBN 1787568954.
  2. ^ Sierra, Ángel Obregón; Castanedo, Jorge Oceja (2018). "University Students in the Educational Field and Wikipedia Vandalism". Proceedings of the 14th International Symposium on Open Collaboration. OpenSym '18. New York, NY, USA: ACM. pp. 16–1–16:7. doi:10.1145/3233391.3233540. ISBN 9781450359368. closed access publication – behind paywall free PDF on conference website
  3. ^ Menking, Amanda; Rangarajan, Vaibhavi; Gilbert, Michael (2018). ""Sharing Small Pieces of the World": Increasing and Broadening Participation in Wikimedia Commons". Proceedings of the 14th International Symposium on Open Collaboration. OpenSym '18. New York, NY, USA: ACM. pp. 13–1–13:12. doi:10.1145/3233391.3233537. ISBN 9781450359368. closed access publication – behind paywall free PDF on conference website
  4. ^ Chhabra, Anamika; Iyengar, S. R. Sudarshan (2018). "Characterizing the Triggering Phenomenon in Wikipedia". Proceedings of the 14th International Symposium on Open Collaboration. OpenSym '18. New York, NY, USA: ACM. pp. 11–1–11:7. doi:10.1145/3233391.3233535. ISBN 9781450359368. closed access publication – behind paywall free PDF on conference website
  5. ^ "Do less active participants make active participants more active? An examination of Chinese Wikipedia". DecisionSupport Systems. 2018-08-08. doi:10.1016/j.dss.2018.08.002. ISSN 0167-9236. closed access publication – behind paywall
  6. ^ Wang, Qingyun; Pan, Xiaoman; Huang, Lifu; Zhang, Boliang; Jiang, Zhiying; Ji, Heng; Knight, Kevin (2018-09-05). "Narrating a Knowledge Base". arXiv:1809.01797 [cs].
  7. ^ Chen, Muhao; Meng, Changping; Huang, Gang; Zaniolo, Carlo (2018-07-31). "Neural Article Pair Modeling for Wikipedia Sub-article Matching". arXiv:1807.11689 [cs].
  8. ^ Rollin, Guillaume; Lages, José; Shepelyansky, Dima (2018-09-24). "World Influence of Infectious Diseases from Wikipedia Network Analysis". bioRxiv: 424465. doi:10.1101/424465. Retrieved 2018-10-22.
  9. ^ Rohde, Marcel; Baumann, Timo. Navigating the Spoken Wikipedia (PDF). SLPAT 2016 Workshop on Speech and Language Processing for Assistive Technologies, 13 September 2016, San Francisco, USA. Germany: Universität Hamburg, Department of Informatics, Natural Language Systems Group. p. 5.
  10. ^ Cuong, To Tu; Müller-Birn, Claudia (2016). "Applicability of Sequence Analysis Methods in Analyzing Peer-Production Systems: A Case Study in Wikidata". In Emma Spiro, Yong-Yeol Ahn (eds.). Social Informatics. Lecture Notes in Computer Science. Springer International Publishing. pp. 142–156. doi:10.1007/978-3-319-47874-6_11. ISBN 9783319478746. closed access publication – behind paywall