# Wikipedia:Does Wikipedia traffic obey Zipf's law?

Jump to navigation Jump to search

If accesses to Wikipedia's article pages obey Zipf's law, we can expect a roughly linear relationship between log(hits) and log(hit rank) for Wikipedia pages. (Note: the hit data in the graph has been scaled in such a way that 10000 hits are equivalent to 1% of the total access rate.)

This appears to be the case in practice for pages with rank between 5 and 1000, based on data from WikiCharts, as of September 2006.

The five most popular pages deviate significantly from the straight-line curve, but the approximation is pretty accurate from then on. The slope of this part of the log-log graph is approximately 1/2, suggesting that the hit rate is inversely proportional to the square root of the page rank,

{\begin{aligned}\log({\text{hits}})&\approx -{\frac {1}{2}}\log({\text{hit rank}})+\log H_{0},\\[8pt]{\text{thus, hits}}&\approx {\frac {H_{0}}{\sqrt {\text{hit rank}}}}\\[8pt]{\text{or, }}&{\frac {{\text{number of hits for article ranked}}\;n}{{\text{number of hits for article ranked}}\;m}}\approx {\sqrt {\frac {m}{n}}}\end{aligned}} Note: These scaled hit rates are derived from actual hit data counts over a particular period, and thus reflect actual hit counts for a statistical sample of user hits over that period, rather than statistical estimates of a theoretical underlying constant hit rate from those hit counts. The error bars in the WikiCharts data apply to the hit rates as an estimator of an underlying hit rate, and do not apply here.

## How much traffic does the least popular page get?

Although this data does not directly tell us anything about the traffic of pages other than the most popular 1000, if we assume that Zipf's law continues to hold for the remaining 2.7 million (as of 2009) Wikipedia article pages, we can extrapolate the traffic expected for less-popular pages, and in particular the least popular page, at rank 1.3 million.

Compared to the page with rank 6, which is probably the first point that fits the trend, this suggests that the least popular Wikipedia article might get ${\sqrt {\frac {6}{2,700,000}}}\approx 0.0015$ times as much traffic.

Given that the actual unscaled hit rate of the page with rank six is about 100,000 hits per day, that suggests that the least popular page will get about 150 hits per day. In fact, though, it is common for stubs and articles about little-researched subjects to get fewer than 150 hits in a month.