Insensitivity to sample size

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Insensitivity to sample size is a cognitive bias that occurs when people judge the probability of obtaining a sample statistic without respect to the sample size. For example, in one study subjects assigned the same probability to the likelihood of obtaining a mean height of above six feet [183 cm] in samples of 10, 100, and 1,000 men. In other words, variation is more likely in smaller samples, but people may not expect this.[1]

In another example, Amos Tversky and Daniel Kahneman asked subjects

A certain town is served by two hospitals. In the larger hospital about 45 babies are born each day, and in the smaller hospital about 15 babies are born each day. As you know, about 50% of all babies are boys. However, the exact percentage varies from day to day. Sometimes it may be higher than 50%, sometimes lower.

For a period of 1 year, each hospital recorded the days on which more than 60% of the babies born were boys. Which hospital do you think recorded more such days?

  1. The larger hospital
  2. The smaller hospital
  3. About the same (that is, within 5% of each other) [1]

56% of subjects chose option 3, and 22% of subjects respectively chose options 1 or 2. However, according to sampling theory the larger hospital is much more likely to report a sex ratio close to 50% on a given day than the smaller hospital which requires that the correct answer to the question is the smaller hospital (see the law of large numbers).

Relative neglect of sample size were obtained in a different study of statistically sophisticated psychologists.[2]

Tversky and Kahneman explained these results as being caused by the representativeness heuristic, according to which people intuitively judge samples as having similar properties to their population without taking other considerations into effect. A related bias is the clustering illusion, in which people under-expect streaks or runs in small samples. Insensitivity to sample size is a subtype of extension neglect.[3]

After examining statistics, Howard Wainer and Harris L. Zwerling concluded that kidney cancer rates are lowest in counties that are mostly rural, sparsely populated, and located in traditionally Republican states in the Midwest, the South, and the West, and that they are also highest in counties that are mostly rural, sparsely populated, and located in traditionally Republican states in the Midwest, the South, and the West. Various environmental and economic reasons were advanced for these facts. What was actually going on here had nothing to do with the impact of such factors on the incidence of kidney cancer. The explanation, which Wainer and Zwerling did not grasp, was sample size. Because of the difference in sample size, the incidence of a certain kind of cancer in small rural counties is more likely to be further from the mean, in one direction or another, than the incidence of the same kind of cancer in much more heavily populated urban counties.[4]


  1. ^ a b Tversky, Amos; Daniel Kahneman (1974). "Judgment under uncertainty: Heuristics and biases". Science. 185 (4157): 1124–1131. PMID 17835457. doi:10.1126/science.185.4157.1124. 
  2. ^ Tversky, Amos; Daniel Kahneman (1971). "Belief in the law of small numbers". Psychological Bulletin. 76 (2): 105–110. doi:10.1037/h0031322. 
  3. ^ Kahneman, Daniel (2000). "Evaluation by moments, past and future". In Daniel Kahneman and Amos Tversky (Eds.). Choices, Values and Frames. p. 708. 
  4. ^ Jones, Ben (January 22, 2015). "Avoiding Data Pitfalls, Part 2: Fooled by Small Samples". Data Remixed. Retrieved 12 April 2017.