User:Shadowjams/Stats

From Wikipedia, the free encyclopedia

A statistical analysis of pages

This is a rudimentary categorization of a statistical sample of wikipedia pages.

I analyzed 121 pages at random from the wikipedia mainspace in May 2010. I used the "random article" api function to do this. I then categorized each page and indicated whether it met certain criteria (below).

These results should be taken as an overview of the encyclopedia. I may have made mistakes in my analysis and the sample size is relatively small. However this may reveal some patterns in the encyclopedia.

The largest category of articles are geographic articles about towns, states, and geographic features. Of those, 71% are stubs. Of the biographies, about 29% are stubs. Sports articles make up about 10% of the sample with half of those covering soccer/football topics. 67% of the sports topics are biographies regardless of the sport.

Similar projects and comparisons[edit]

Others have done similar reports at different times. While the methodologies differ, some statistics come out strikingly similar. For example, the 17% geography articles appears to be a consistent finding, as do some other frequencies for categories. Dantheox's assessment that 35% of articles were stubs is also close to the 38% number I found four years later.

Other editor's reports[edit]

Results[edit]

Attributes Percentage
Stub 38%
Biography 23%
Biography of a living person 13%
Almanac-type entry 7%
Pop culture topic 13%
Sports related 10%
Soccer/Football related 5%

Pages may belong to more than one attribute.

Category Percentage
Actor / Actress 1.65%
Business or organization 6.61%
Computer or videogame 1.65%
Disambiguation page 7.44%
Film 1.65%
Food 1.65%
Geographic 17.36%
Health / Medicine 2.48%
Historical 5.79%
Military 2.48%
Music 10.74%
Politics or government 9.09%
Science and nature 7.44%
Sports 9.92%
Transportation 5.79%
Television 1.65%
All others 6.61%

Categories are exclusive, all pages were assigned one category.