# Talk:Law of large numbers

WikiProject Statistics (Rated C-class, Top-importance)

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

C  This article has been rated as C-Class on the quality scale.
Top  This article has been rated as Top-importance on the importance scale.
WikiProject Mathematics (Rated C-class, High-priority)
This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Mathematics rating:
 C Class
 High Priority
Field: Probability and statistics
One of the 500 most frequently viewed mathematics articles.

## Possible merger?

Should Borel's law of large numbers get merged into this article and made a redirect page? Michael Hardy (talk) 16:04, 10 January 2008 (UTC)

Yes, certainly. — ciphergoth 07:47, 21 May 2008 (UTC)
The problem with this is to make it clear exactly what "Borel's law of large numbers" is in the context of the larger article, since presumably Borel's law of large numbers is notable enough to br mention specifically. Melcombe (talk) 09:43, 16 June 2008 (UTC)

## Merge from Statistical regularity?

correction of misdirected merger proposal from March2008. Melcombe (talk) 13:11, 12 May 2008 (UTC) -- and copying from Talk:Law of Large Numbers corrected again — ciphergoth 07:44, 21 May 2008 (UTC)

Is there any information over at Statistical regularity that might be of use? I added a {{mergeto}} tag to that article. If there is nothing worthwhile then perhaps simply replace it with a redirect?—Preceding unsigned comment added by User A1 (talkcontribs) 02:43, 30 March 2008
It looks to me very much like the two articles are covering the same ground, so yes, a merger makes sense to me. I can't see any material in that article that needs to be copied into this one. — ciphergoth 07:44, 21 May 2008 (UTC)

Agreed. In fact the other article should probably simply be removed. OliAtlason (talk) 15:28, 21 May 2008 (UTC)

I second that. Just replace Statistical regularity with a redirect to LLN. Aastrup (talk) 13:18, 12 June 2008 (UTC)
I think that there may have been an intention that Statistical regularity should be part of a collection of non-mathematically-formal articles partly related to gambling, as in mention of Gambler's fallacy. Melcombe (talk) 09:47, 16 June 2008 (UTC)
I've read a bit on the subject and it would seem that's it's an umbrella term. I invite you to read Whitt, Ward (2002) Stochastic-Process Limits, An Introduction to Stochastic-Process Limits and their Application to Queues, Chapter 1: Experiencing Statistical Regularity. The first chapter is available online [1]. Aastrup (talk) 22:14, 16 June 2008 (UTC)
And I've changed the Statistical regularity article. —Preceding unsigned comment added by Aastrup (talkcontribs) 22:20, 16 June 2008 (UTC)

## Interpreting

Consider the paragraph:

Interpreting this result, the weak law essentially states that for any nonzero margin specified, no matter how small, with a sufficiently large sample there will be a very high probability that the average of the observations will be close to the expected value, that is, within the margin.

This is simply explaining in words what convergence in probability is. I don't consider it useful. I'll remove it shortly if no-one objects. Aastrup (talk) 22:17, 23 July 2008 (UTC)

It is true that it is simply an explanation of convergence in probability in words. However, this may be very insightful to those who are not well versed in probability theory or even mathematical formalism. The section that includes this paragraph would be substantially poorer without this explanation. I suggest leaving it in. OliAtlason (talk) 07:34, 24 July 2008 (UTC)
I think any time you can add text interpretation to a math article it is hugely helpful, even if those who already know it all find it just extra words. PDBailey (talk) 22:01, 24 July 2009 (UTC)
Generally I agree. But in this instance I don't. We are talking about convergence. The sample average converges towards the mean, this should simply be interpreted as the distance between the two grows smaller as the number in the sample increases and tend to infinity. That we are dealing with two forms of convergence and that thier exact definitions are different from oneanother is not of interest to the non-mathematician. And if the reader is interested in the exact difference between the two then there's an article about convergence of random variables. Aastrup (talk) 08:50, 25 July 2009 (UTC)
And it is repetitive. Convergence in probability is stated in words twice. It would be neater if it just said Convergence in probability| and anybody who didn't know the term could go there and learn. Aastrup (talk) 08:55, 25 July 2009 (UTC)

## Citations

I removed the annoying references/citations tag and added a few references. Should there be more citations? Should I have left the tag where it was? I dont think so. Aastrup (talk) 19:44, 24 July 2009 (UTC)

## What if there is no expected value?

From reading this article many can get the wrong impression that a sequence of averages almost surely converges, and converges to the expected value. But in reality the law of large numbers only works when expected value of the distribution exists, and there are many heavy-tailed distributions which don't have an expected value. Take, for example, the Cauchy distribution. A sequence of sample means won't converge, because the average of n samples drawn from the Cauchy distribution has *exactly* the same distribution as the samples. I think the article definitely needs a section about this misconception with examples and a neat graph of diverging sequence of averages, but as you might see, my English is too bad for writing it myself. --87.117.185.161 (talk) 12:53, 21 November 2009 (UTC)

## A Proof of the Strong Law of Large Numbers

There is a proof of the Strong Law of Large Numbers that is accessible to students with an undergraduate study of measure theory, its established by applying the dominated convergence theorem to the limit of indicator functions, and then using the Weak Law of Large Numbers on the resulting limit of probabilities. Would this be appropriate for inclusion with the group of articles on the Laws of Large Numbers?Insightaction (talk) 21:17, 20 January 2010 (UTC)

## Image

I have created and uploaded an image similar to the current image, but in SVG format instead of GIF and with source code available. It also looks a little different and has different data (new data may be generated by anyone with the inclination using my provided source code (or their own)). I would like to propose that we switch to my image, File:Largenumbers.svg. --Thinboy00 @175, i.e. 03:11, 3 February 2010 (UTC)

Nice job! --Steve (talk) 06:39, 3 February 2010 (UTC)
By the way, I was motivated to make one myself, so I just put in the animated gif of red and blue balls. I think my animated one and your non-animated one are complementary and both should be in the article...different readers may respond better to one or the other. Let me know if you have any comments or suggestions! :-) --Steve (talk) 08:54, 27 February 2010 (UTC)

## Merge suggestion

It has been suggested at the Wikipedia:Proposed mergers page, that Law of averages be merged with Law of large numbers (LLN). Please state your comments regarding this action. --TitanOne (talk) 20:58, 3 March 2010 (UTC)

• Oppose Makes no sense on the average. History2007 (talk) 21:40, 12 March 2010 (UTC)
• Oppose The two ideas are not the same, and there's already a cross-reference in both articles' "See also" sections. I don't think an article describing a theorem with an established proof should be merged with an article describing a lay term and its common usage in a false belief. (Although some texts may use the term "law of averages" to refer to the law of large numbers, the Gambler's fallacy usage is more common and the focus of the article.) The difficulty in preventing students from conflating the so-called-"law" of averages/Gambler's fallacy with the law of large numbers is an extremely common problem for introductory probability and statistics instructors. I agree that Law of averages needs to be merged, but I think it should be merged into Gambler's fallacy instead (maybe with a disambiguation note at the top for users arriving via Law of averages who intend to find the Law of large numbers). --Firefeather (talk) 20:45, 26 March 2010 (UTC)
• Disagree.... Looking through "google books" and the web, the term "Law of averages" refers to either law of large numbers (ratio of heads to tails approaches 1, which is true), or gambler's fallacy (a run of heads is compensated by a run of tails, so difference between heads and tails approaches 0, which is false). The way it's presented here now is gambler's fallacy. And gambler's fallacy seems somewhat more common online, but I didn't look enough to be sure. Therefore my initial impression is that this page should be a disambiguation or short article sending interested readers to learn more at either law of large numbers or gambler's fallacy. What do other people think? --Steve (talk) 22:53, 3 March 2010 (UTC)
• Delete this article (meaning Law of Averages), with redirect to LLN. The cited source uses the term “law of averages” as a synonym for LLN, and does not provide the interpretation given in this article. The examples section looks like WP:OR. The first one describes the gambler's fallacy, for which we already have a fairly developed article; the second example should be dubbed “idiot’s fallacy” or something like that — really, is there a person who would think that out of 99 coin tosses, exactly 49.5 of them should be heads?; the third example is just a corollary from the strong law of large numbers, or from the second Borel-Cantelli lemma; the last example is not even funny — people don't think that in the longrun a good team and a bad team would perform equally, that would contradict the mere notion of “skill”. // stpasha » 22:18, 1 June 2010 (UTC)
• Note: As above, the discussion was also occurring on the Law of averages talk page despite it being pointed out that discussion is here, so I have copied the immediately above to here. Melcombe (talk) 09:41, 4 June 2010 (UTC)

## When the LLN does not hold?

I'm looking for a quick answer, trying to resolve a certain issue. Does the LLN hold even when we're collecting samples from and for a model/function that has infinite VC dimension?

I was reading some papers on statistical learning (http://dl.acm.org/citation.cfm?id=76371). They mention that "C is uniformly learnable if and only if the VC dimension of C is finite," where "a learning function for C is a function that, given a large enough randomly drawn sample of any target concept in C, returns a region in E (a hypothesis) that is with high probability a good approximation to the target concept." My understanding of "concept" here is function. Perhaps my understanding of "concept" is wrong or the LLN has limitations. — Preceding unsigned comment added by 150.135.222.152 (talk) 04:02, 18 October 2011 (UTC)

• Yes; if the VC dimension is infinite the learning function (with high probability) still converges pointwise to the correct characteristic function. But the convergence won't be uniform. Unfortunately, this is hardly a "quick" answer.... Ben Standeven (talk) 17:48, 29 August 2014 (UTC)

## Finite variance

The article says that finite variance is not required, without citation or justification. Every single other source I saw said that finite variance is in fact needed. This writing titled Law of Large Numbers even specifically says that finite variance is needed, and uses the Cauchy distribution as an example where the variance is not finite, and the Law of Large Numbers does not hold (in the section 'Cauchy case'). — Preceding unsigned comment added by 62.49.144.162 (talk) 10:23, 29 May 2012 (UTC)

There are citations separately (slightly later) for the weak and strong forms of the law and when they hold. The article is quite specific about what these laws mean, but it may be that your source says the "law" means something else, such as the variance of the mean descreasing to zero (for which the variance would need to exist): in fact your source seems to be using this in its proof. In this case your source is stating a sufficient condition for the law to hold ... it can and does hold under weaker conditions. The "law" is the description of the behaviour of the mean, not really any one statement of conditions under which it can be said/shown to hold. However, the article could do with better citations. Melcombe (talk) 12:44, 29 May 2012 (UTC)
The Cauchy distribution is a bad example, it does not have a mean, and hence no finite variance or higher moments. And the proof using characteristic functions does not seem to use the assumption of finite variance. I'll add a citetation. — Preceding unsigned comment added by 83.89.65.201 (talk) 08:25, 30 June 2012 (UTC)

Why on earth does an article on the law of large numbers lead to Nasim Taleb's vanity page? I have also deleted the rest of the sentence, which was unencyclopedic, and unnecessary. If you disagree, could you please show a reference from the serious lln literature that mentions lightning or references the black swan?

Otherwise it's not appropriate. — Preceding unsigned comment added by 82.132.235.94 (talk) 19:29, 31 August 2012 (UTC)

First of all, you removed information without explaining in an edit summary -- twice. That makes the edits subject to revert. Secondly, it might help the rest of us who can't read your mind to explain what you are referring to with your comment "lead to Nasim Taleb's vanity page". And finally, give a detailed explanation as to why you think the sentence "This assumes that all possible die roll outcomes are known and that Black Swan events such as a die landing on edge or being struck by lightning mid-roll are not possible or ignored if they do occur" is "unencyclopedic and unnecessary"; it is linked to a page with well sourced explanations as to why it is enyclopedic. Additionally, if you think Black swan theory is unencyclopedic you need to make your case at Talk:Black swan theory. Something very important you need to learn about Wikipedia: It is a collaborative project; it is not your personal website or plaything. Cresix (talk) 19:42, 31 August 2012 (UTC)
I agree with deleting the sentence. It makes a simple thing sound complicated.
There is a universal common-sense intuitive understanding of what it means to "roll a die". According to that understanding, black swan events are irrelevant and the simple sentence is completely correct.
Go find a random person on the street and ask him to roll five dice and calculate the average. Suppose that one of the five dice lands on a corner, or is eaten by a squirrel. The person will naturally and immediately pick up a die and re-roll it to get the fifth number to average. In effect, they will be ignoring the "bad roll". What else would you expect them to do? If the squirrel eats the die, then maybe they would count it as "rolling a 10 trillion"?? No! That's silly, no one would ever think to do that. The only sensible thing to do is to re-roll. If they did not have a spare die, they would say "I was not able to successfully roll the die, go ask someone else." People understand that "rolling a die" is not complete until you have an answer which is either 1,2,3,4,5,6.
There are two requirements for the term "black swan events" to be technically applicable: The events are (1) rare, and (2) sufficiently consequential to affect long-term average behavior. For example, the performance of a stockmarket investor, even averaged over 15 years, may be significantly altered by the amount he lost in a single hour during a market crash. So that's a black swan event. When you are rolling dice, a black swan event is impossible because the distribution of possible numerical results is so restricted: 1,2,3,4,5,6. It is never 10 billion!
So again, black swan events should certainly not be mentioned in the context of rolling dice. It is an irrelevant tangent. --Steve (talk) 01:13, 1 September 2012 (UTC)
Agree with Steve. McKay (talk) 03:30, 1 September 2012 (UTC)

## A few technical remarks

"with the accuracy increasing as more dice are rolled." This is not correct, and in the figure the accuracy for n=100 is greater than for n=200 or even 300.

"Convergence in probability is also called weak convergence of random variables". I don't think this is standard or fortunate. Convergence in distribution is already called weak convergence. The MSC (Mathematics Subject Classification) category 60F05 is "Central limit and other weak theorems", meaning theorems with convergence in distribution, not convergence in probability (as far as I know).

"Differences between the weak law and the strong law". It may be interesting to add here that the Weak Law may hold even if the expected value does not exist (see e.g. Feller's book). This underlines that, in their full generality, none of the laws follows directly from the other.

"Uniform law of large numbers". The uniform LLN holds under quite weaker hypotheses. This is definitely uninteresting to the average reader, but a reference to the Blum-DeHardt LLN or the Glivenko-Cantelli problem might be very valuable to a small fraction of readers.

"Borel's law of large numbers, named after Émile Borel, states that if an experiment is repeated a large number of times, independently under identical conditions, then the proportion of times that any specified event occurs approximately equals the probability of the event's occurrence on any particular trial;" The LLN as stated might as well be Bernoulli's original LLN from 1713. There seems to be no reason to attribute *that* statement to Borel. Compare e.g. http://www.encyclopediaofmath.org/index.php/Borel_strong_law_of_large_numbers93.156.35.219 (talk) 02:30, 2 January 2013 (UTC)

## Why?

OK, I'll be up front and admit that the math on this page is beyond me. I looked up the Law of Large Numbers to try and find out why it happens. (I mean why it happens, not how it happens.) So can someone explain in plain (or even complicated) English why something that is random each time you do it (eg tossing a coin or betting on roulette) tends to give a pattern over a large number of incidences? Why is it that we can anticipate (roughly) what the average will be, rather than its being completely random and not able to be anticipated? Surely there's a place for that issue in the article, if there is some literature on it. Thanks.89.100.155.6 (talk) 20:36, 25 January 2013 (UTC)

## Difference between the Strong and the Weak Law - mistake

"In particular, it implies that with probability 1, we have that for any ε > 0 the inequality $|\overline{X}_n -\mu| < \varepsilon$ holds for all large enough n.[1]". I do not believe this sentence (if I am wrong, please ignore me and delete this post). If you fix ε > 0 then for every n there is some small probability that $|\overline{X}_n -\mu| >= \varepsilon$. Indeed, with non-zero probability all the first n tosses are say > \mu + ε.

I agree, that the statement is wrong. However your explanation makes no sense to me. The strong law uses almost sure pointwise convergence. The wrong statement corresponds to uniform convergence.

--93.219.149.62 (talk) 19:54, 6 March 2014 (UTC)

## condition E(|X|)<inf is same as that the random variables x has Lebesgue integrateable expectation

What if we would use different legitimate integration methods for expectation definition as:

1. REDIRECT [[2]]
2. REDIRECT [[3]]

which is much more general than Lebesgue one?

Then we will shurely have random variables with finite expectation where L.L.N do not hold.http://www.math.vanderbilt.edu/~schectex/ccc/gauge/venn.gif — Preceding unsigned comment added by Itaijj (talkcontribs) 20:23, 2 February 2014 (UTC)

## Methodological mistakes

In my opinion, many statements expressed on this page are not correct, such as:

"It follows from the law of large numbers that the empirical probability of success in a series of Bernoulli trials will converge to the theoretical probability. For a Bernoulli random variable, the expected value is the theoretical probability of success, and the average of n such variables (assuming they are independent and identically distributed (i.i.d.)) is precisely the relative frequency."

"The LLN is important because it "guarantees" stable long-term results for the averages of random events."

"According to the law of large numbers, if a large number of six-sided die are rolled, the average of their values (sometimes called the sample mean) is likely to be close to 3.5, with the precision increasing as more dice are rolled."

This confuses conclusions from the mathematical theorem proven from Kolmogorov's axioms (of which there is very little for the axioms are very weak and do not provide a definition or constraints strong enough for a meaningful interpretation of probability), from its intuitive interpretation that requires additional assumptions, equivalent to assuming the law itself true a priori. See a more elaborate explanation here:

http://math.stackexchange.com/questions/777493/do-the-kolmogorovs-axioms-permit-speaking-of-frequencies-of-occurence-in-any-me

Stable relative frequencies in the real world are discovered empirically and are not conclusions from any mathematical theorem. Ascribing "P()'s" to events with a frequency interpretation in mind is the same as already assuming the relative frequencies of those events converge in the limit of an infinite number of trails to a definite number, this "P()". The only thing the theorem allows to conclude, is that if all the relative frequencies involved in the given reasoning are stable in the first place, the difference from a finite number of trails between the measured and "ideal" mean is likely to be less than so and so.

Jarosław Rzeszótko (talk) 06:56, 2 May 2014 (UTC)