User:Dragons flight/Log analysis

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Due to technical difficulties, Wikimedia's internal statistics for the English Wikipedia have not been compiled since October 2006. In order to partially fill that gap, I have compiled an independent analysis based on Wikipedia's log files and downloading the edit histories for a substantial fraction of Wikipedia's articles (118793 articles, ~6% of all articles).

The results of this analysis are summarized below. The most surprising result is that the activity of the Wikipedia community appears to have been declining during the last six months. Where relevant, I have scaled the number of articles to show what would be expected from the full 2 million articles on Wikipedia. Dragons flight 22:07, 9 October 2007 (UTC)

Edit rate[edit]

Rate at which articles are being edited, separated into categories based on who is editing.
Same as left, except excluding edits marked as "reverts" and edits that were reverted

The rate at which edits were being made to Wikipedia articles appears to have peaked in February to April 2007 and declined since. This decline is unprecedented in Wikipedia's history, which has been marked by nearly exponential growth during much of its history. As discussed below, several other statistics show declines beginning around the same period. Though it may be purely coincidental, this time frame also corresponds to the Essjay controversy appearing in the press.

In addition we note that unregistered editors, i.e. those identified only by an IP address, still comprise approximately a third of all edits made to articles.

The short, abrupt drop in late 2006 is associated with an interval of hardware problems that affected Wikipedia's availability.

Edits per article[edit]

Number of edits versus the frequency of occurrence in a sample of 120000 articles. Mean, median, and mode are also indicated.
Number of edits versus the fraction of articles having at least that many edits.

At present, the median Wikipedia article has just 16 edits, with 30% of articles having fewer than 10 edits. Only about 9% of articles have greater than 100 edits, and 0.5% of articles have greater than 1000 edits.

For technical reasons, this sample does not include any articles created during a three week period in September 2007, and consequently will slightly overestimate the average number of edits per article.

Revert rate[edit]

Percentage of all edits to Wikipedia articles that were marked as reverts or that were reverted.
Cumulative number of edits per article over time, and a reduced version in which edits which were reverted or marked as reverts are removed.
Article edits divided into categories of editor and types of edits.

Reverts, i.e. edits that undo the edits of others, are a common part of the Wikipedia editing experience. I have attempted to identify reverts based on common keywords conventionally used to identify them in edit summaries, e.g. "revert", "rv", "undid", "vandalism", etc. In addition, I assumed that the immediately preceding edit was the one that was reverted. Using these approximations, it appears that the majority of reverted content comes from unregistered editors identified only by an IP address, and further that admins spent much more of their time making reverts than the other groups. Nonetheless, contributions by unregistered editors are still a substantial fraction of the normal edits being contributed to articles, and non-admin registered users collectively issue approximately three times as many reverts as admins.

The frequency of reverts/reverted content appears to have been increasing with time and most recently approximately 20% of all article edits are either reverts or edits that are reverted. There is also a notably seasonality, with fewer reverts during Northern Hemisphere summer, possibly indicating an association with school year.

Note that reverts can include both vandalism and content disputes, so the revert rate does not necessarily translate to simple vandalism rate. It is possible, for example, that the frequency of content disputes has been increasing independent of changes in vandalism.

New articles, new users, new administrators[edit]

Rate of article creations. The early spikes are the result of rambot and other automated processes created a large number of new articles.
Rate of creation of new users. Note: Vertical lines are 6 month increments.
Rate at which users become administrators.

Article creation plateaued in early 2006 after unregistered users were prohibited from creating new articles following the Seigenthaler controversy, but the overall rate has not noticeably declined.

Like the overall edit rate, the rate of new account creation peaked in early 2007 and has declined ~30% since.

The greatest period for the creation of new administrators was late 2005 and has remained relatively steady since mid 2006.

Uploads and admin actions[edit]

Deletions broken into segments for articles, images, and everything else.
A comparison of image uploads to image deletions.
Blocks and unblocks
Protections and unprotections

Article deletions, blocks, protections and uploads have all decreased in recent months. However, image deletions have increased as a result of recent efforts to more strictly enforce the criteria for non-free content.

Administrative action records are partly confused by the action of secretive adminbots that have been run by Curps (partial bot description), Betacommand, Misza13 (partial bot description) and others.

Data files[edit]

The following pages contain much of the data used in generating the image shown above as well as additional data extracted from Wikipedia's log files.