The few who write Wikipedia
The views expressed in this special report are those of the author only; responses and critical commentary are invited in the comments section.
Edit distribution of all Wikipedians, as of 8 January 2014
On 15 January, the English Wikipedia turned thirteen years old. In that time, this site has grown from a small site that was known to only a select few to one of the most popular websites on the internet. At the same time, recent data suggests that there is a power law among users, where the comparative few who are writing most of Wikipedia have most of the edits. The result of this is that there is going to be bias in what is created, and how we deal with it as Wikipedians is indicative of the future of the site. Furthermore, this brings up what we have to do in order to combat this bias, as there are many ideas, but the question is whether they will work or not.
Every Wednesday, various charts are updated that show trends in editing. These include lists on the top editors, top article creators, and overall bot edit counts, as well as what editors have made the most edits in the last thirty days, which is updated less than the others. Over the past few years, there have been periodic attempts at deciphering this information to figure out what it all means, although as far as I know, no one in the Wikimedia Foundation has published reports using this information. When I came across these lists in 2011 and decided to put these trends on a chart and see what it all meant, unsurprisingly, some interesting trends came up. Fast forward to two weeks ago, when I decided to update the charts for the first time since November of 2012, and I had no idea what I would discover.
One of the more interesting trends that I found during the many hours that I built the charts was how many edits a rather select few Wikipedians have when compared to the rest of the site's users. In terms of overall numbers, 45% of the edits on Wikipedia have been done by a combined ten thousand editors and the 850+ bots on the site. When charted onto a line graph, there is a distinct power law that rises sharply for both bots and editors. Interestingly, the top bot (Cydebot) has more than three times the top edits than Koavf, the editor with the highest edit count on the site. These high number of edits have helped to push the bots into a significant percentage of the overall edits on the site, totaling 12%. As of the publication of this article, there are 20,590,000+ users on the site, meaning that .052% of Wikipedian users (bots included) have a vast majority of the edits.
Even more surprising was the numbers on article creators. Most Wikipedians who are active on the site have written an article or two, some being as simple as a stub, or some that have been expanded to a Featured Article. Other times, users focus on expanding existing articles, due to knowledge on a specific subject area. Other users, myself included, have created hundreds or tens of thousands of articles. To find the time to even create an article thoroughly takes time and dedication, and it is likely that many of these articles were created as stubs. This is shown in the fact that the top 3,000 editors have written 55% of the articles on the entire site. Adding in the next 2,000 editors shows that they have only written 5% of the articles, but it shows that 60% of the articles on this site have been written by 5,000 users, which equates to .026% of the site's overall users. Of note are the numerous IP addresses that show up on these page creation lists, as before 2005 users were allowed to anonymously submit articles (a feature which was removed because of the Seigenthaler incident). On the list, the IP address 22.214.171.124 has 983 live article creations, a number which places it at 459th on the list.
What does this all mean?
Top article creators when compared to the rest of the community
One question that should be asked about the fact that so few editors are writing so many articles is why this is occurring. Wikipedia can often be harsh to new users, as the amount of rules both written and unwritten can scare off even the most dedicated of writers. Those who stay seem to be ones who want to contribute and write more for the site, but the data seems to show that these are an incredibly select few individuals when compared to the over twenty million usernames that have been registered over the years. Furthermore, with declining editor counts, this number is only going to become more of an issue over the years as the Wikipedians who are left will probably start expanding into more niche topics, ones that are not easily researchable to the average person with stable internet access.
One other question that this brings up are what are the costs of having so few editors who write so many articles. In theory, having fewer users write more articles brings standardization to the site, as there are fewer differences in prose and article quality. In reality though, having so few users means that there is going to be an implicit bias in what is written, to degrees which have already been shown through the work of the Wikimedia Foundation. With the already low numbers of females on the site, this means that there will be more coverage of male-oriented topics. If an article is not covered immediately, there is a good chance that it will be created in the coming years. Unfortunately, this means that whatever female-oriented topics are out there will probably get further neglected, as there is less of a chance that someone will even know that the subject exists, never mind it being notable enough for an article (when in doubt, go for it). The amount of these super page creators only exacerbates the problem, as it means that the users who are mass-creating pages are probably not doing neglected topics, and this tilts our coverage disproportionately towards male-oriented topics.
Finally, the last question that is brought up is why are the majority of editors only responsible for 60% of the articles. Most users are aware of the Article wizard, while fewer know about Articles for creation (side note, if you can, please volunteer there, as they have been flooded in the past couple of years by new articles and are in need of knowledgeable Wikipedians for reviews). Oftentimes, articles that are created in either of these two venues that are created by inexperienced users are deleted or shot down before the users have any idea what is going on. This can be a discouraging issue and dissuades users from helping out. Other times, they will come seeking help, but will get discouraged when the topic that they have been working on is deemed unnotable. Most likely, many more Wikipedians out there have attempted to create an article, but because it is deleted, the data skews slightly more in favor of pushing the number of edits towards experienced Wikipedians, who then go on to hold a slightly more majority of article creations as well.
What can we do to fix this?
The Teahouse has been a successful model of helping new editors along in the process. Through providing guidance to new editors, they have found great success in their endeavors. Additionally, mentoring editors and guiding them towards working on articles that they might not have originally thought of working on can also be a good way to direct their enthusiasm into something positive. Through the channeling of talent and encouraging and redirecting editors onto viable paths, it is possible to ensure that a greater amount of knowledge will be present on the site in the coming years. Finally, the Wikipedia Education Program and Wiki Education Foundation have also attempted to make inroads in the classroom, by encouraging students to become more involved with the community through their school work.
The final part of this is whether or not these attempts will work. A community that is dedicated to fixing and addressing the issues that exist on the site is a community that will succeed. In the past, many ideas at reform have been met with resistance from the community, often with mixed results. Other times, approaches to fix these issues run counter to what others want to do in the community, so some editors end up unintentionally (or intentionally, for that matter) sabotaging the intentions of reform-minded users, although this can also be expected in a large community where people have differing views.
In the end, it is up to us as a community to ensure that the site continues for another thirteen successful years, as we are part of one of the greatest social, intellectual, and academic experiments on the internet. Our success in the coming years will be based on how we choose to address these issues, so it is imperative that we attempt to correct these issues while there are still people interested in editing the site, in order to continue to strive to be the most important encyclopedia in the world.
of the top Wikipedian editors, which is similar to the article creation and bot curves