"Problematizing and Addressing the Article-as-Concept Assumption in Wikipedia"
- Reviewed by Thomas Niebler
In several Wikipedia-based systems and scientific analyses, researchers have assumed that no two articles in Wikipedia represent the same concept, i.e. a semantically closed description of a specific item, for example "New York City". Lin et al. however published a paper at CSCW'17 where they showed that this “article-as-concept” assumption does in fact not hold: The abovementioned article about "New York City" has a separate sub-article about the "History of New York City", which describes a topic very closely related to “New York City” and could at the same time easily be merged into the original article. This way of splitting up lengthy articles into several smaller ones ("summary style", more specifically "article size") may improve readability for human users, but seriously impairs many studies based on the “article-as-concept” assumption. Using a simple classification approach on features based on both the link structure as well as semantic aspects of the title and the context, the authors identified 70.8% of the top 1000 visited pages which have been split up into articles and sub-articles, with an average of 7.5 sub-articles per article, thus stating that the existence of sub-articles is not the exception, but the rule.
A drawback with the proposed sub-article relationship detection method, as stated in the paper, is that it is trained only on explicitly encoded sub-article relationships; it is yet unsure how to detect implicit relationships, i.e. where no editor has linked the sub-article with the main article. Still, this presents the first step into a deeper analysis of the Wikipedia page network to make it at the same time better readable for humans, but also easily exploitable for many algorithms.
85% of German scientists use Wikipedia, and other European media survey results
- Summary by Tilman Bayer
A survey among 1,354 German academic researchers about their professional use of social media found Wikipedia to be the most widely used site as of 2015, with 84.7%. Among German internet users in general, 79% use Wikipedia. Only 2% of these Wikipedia readers think it's "never reliable" and 80% hold it is "mostly" ("größtenteils") reliable. A report by the German Monopolkommission (which advises the government on antitrust matters) on potential monopoly problems in the Internet search engine market highlighted Wikipedia as the top 10 website in Germany that is by far the most dependent on Google, with around 80% of its traffic (according to third-party data from SimilarWeb that is not quite consistent with the Wikimedia Foundation's own data).
In France, surveys by the Institut national de la statistique et des études économiques (INSEE) found that from 2011 to 2013, the ratio of people who use the internet to consult Wikipedia ("or any other collaborative online encylopedia") rose from 39% to 51%. Wikipedia usage was higher among younger internet users and among those with degrees - 82% among 16-24 year olds, 54% among 25-54 year olds, and only 31% among 55-74 year olds. The corresponding Eurostat data gave 45% for the entire European Union as of 2015.
In contrast, Ofcom found that only 2-4% of UK 12-15 year olds use Wikipedia as first stop for information as of 2015.
In the meantime, a 2016 Knight Foundation report, based on a study by Nielsen, found that "Among mobile sites [in the US], Wikipedia reigns in terms of popularity (the app does well too) and amount of time users spend on the entity. Wikipedia’s site reaches almost one-third of the total mobile population each month".
Conferences and events
See the research events page on Meta-wiki for upcoming conferences and events, including submission deadlines.
Other recent publications
Other recent publications that could not be covered in time for this issue include the items listed below. contributions are always welcome for reviewing or summarizing newly published research.
- Compiled by Tilman Bayer
- "Intellectual interchanges in the history of the massive online open-editing encyclopedia, Wikipedia" From the abstract: "[Its] open-editing nature may give us prejudice that Wikipedia is an unstable and unreliable source; yet many studies suggest that Wikipedia is even more accurate and self-consistent than traditional encyclopedias. Scholars have attempted to understand such extraordinary credibility, but usually used the number of edits as the unit of time, without consideration of real time. In this work, we probe the formation of such collective intelligence through a systematic analysis using the entire history of English Wikipedia articles, between 2001 and 2014. ... [We] find the existence of distinct growth patterns that are unobserved by utilizing the number of edits as the unit of time. To account for these results, we present a mechanistic model that adopts the article editing dynamics based on both editor-editor and editor-article interactions.. .. [The] model indicates that infrequently referred articles tend to grow faster than frequently referred ones, and articles attracting a high motivation to edit counterintuitively reduce the number of participants. We suggest that this decay of participants eventually brings inequality among the editors, which will become more severe with time."