User:Mr.Z-man/analysis

From Wikipedia, the free encyclopedia

This page gives an estimate as to how many problematic BLP articles the English Wikipedia has hosted.

The analysis here makes the following assumptions:

  1. The ratio of BLPs to all articles has stayed constant since 2005
  2. The percentage of BLPs that are problematic enough to potentially generate a complaint has remained constant since 2005.
    1. For every BLP that actually generates a complaint to OTRS, 1.5 more are problematic but unreported. (If the number of complaints is c, the actual number of problematic bios is 2.5c)
  3. Wikipedia's reach before 2005 was low enough that problems on BLPs were not nearly as significant as they are today.

Data[edit]

Based on a search of the OTRS queues for all tickets in the "quality" queue that were created between the beginning of July 2009 and the end of December 2009 and were not closed in a way that suggested they were spam or duplicates, Wikimedia gets ~6.6 complaints per day regarding BLPs. At the time of this search, Wikipedia had approximately 430,000 BLPs and 3,172,000 articles.

Historical data for number of articles is from Wikipedia:Size of Wikipedia#The data set.

Analysis[edit]

Using this information, we can find:

  • 13.56% of articles are BLPs.
  • 0.00384% of BLPs generate a complaint on any given day.
  • Because the rate of new articles has remained relatively linear since 2005, we can find a linear approximation for the number of "bad BLPs" per day:
    • Where B is the number of potentially-complaint-inducing BLPs and d is the number of days since 1 January 2005.

Results[edit]

Integrating over this line gives us an estimate of the number of potentially-complaint-inducing BLPs since the beginning of 2005: 221343

Note that this is only a very rough estimate. Changing one of the parameters, such as the ratio of reported/unreported complaints can increase or decrease the final result by several thousand.