Talk:Law of large numbers/Archive1

How the article looks

This article is in dire need of being written so that non-mathematicians can successfully find their way through the first sentence. -- Wapcaplet 04:33, 1 Mar 2004 (UTC)

I agree, is there a way to explain it for the lay person?

The law of larges numbers basically says this: As the numbers of entities in a specific group increases, the likelyhood of a particular event occuring (however unlikely it may be) also grows. If you roll a die once, the chances of rolling a ONE is one-sixth. but if you rolls a die 100 times, the likely hood of rolling a ONE at some point during those 100 rolls is very near 100%.

• No, the LLN is a much stronger result than this. It says that you are likely to get close to 17 ONEs in 100 rolls of a die. This is the fundamental thing that makes statistics worthwhile: if you have a lot of observations then the average of the sample is close to the average of the population. This is why you can , eg, do clinical trials in only a few thousand people and extrapolate to approving drugs for a population of hundreds of millions. I agree that the article could use revision in the introduction. I don't really like the proof, either. Only a very small number of people will understand it, and for them there is a simpler, shorter proof under similar assumptions: by algebra the variance of a sum is proportional to the number of summands, and so the variance of an average decreases as the number of summands increases. (TSL)

It should be noted that the proof of the weak law (with convergence in probability) has to be more or less as it is. A semi-intuitive argument using the decreasing variances may be useful to the understanding, but that doesn't really prove convergence in probability.

Can't you make this argument? I may be missing something, but it looks right to me:
Chebyshev's inequality tells us that $\operatorname{P}( \left| \overline{X}_n-\mu \right| \geq \varepsilon) \leq \frac{\sigma^2}{{n\varepsilon^2}}$ (as already noted).
By elementary properties of variance, $\lim_{n \rightarrow \infty} \sigma^2 = 0$.
Therefore, by Chebyshev's inequality, $\lim_{n \rightarrow \infty} \operatorname{P}( \left| \overline{X}_n-\mu \right| \geq \varepsilon) \leq 0$.
--Delirium 08:08, 4 November 2005 (UTC)
Never mind, I suppose this may run into some trouble with limits also being defined in terms of epsilons, juxtaposed with the presence of an explicit epsilon in the same ratio. This explanation seems more intuitive to me, but perhaps the one in the article is more rigorous. --Delirium 08:09, 4 November 2005 (UTC)

Another possible definition of the LLN

The long-run relative frequency of repeated independant events get closer and cloer to the true relative frequency as the number of trials increases 202.7.183.131 11:42, 18 January 2006 (UTC)

• True relative frequency? What is that? The law states that the average of repeated independent events converges toward the expectation. That is it. Aastrup 20:11, 18 January 2006 (UTC)

A proof of the strong law would be rather long, but I think that the strong law is such an important result that it ought to be shown with its proof. What do you think?

Some notes on history

I'm putting some notes on history here to remind myself to add a history section later. Someone else can feel free to do so though, especially if you know more about it than I do:

• Throughout the 19th century, the "law of large numbers" simply meant the weak law (convergence in probability); I think Bernoulli proved this, but I have to look that up.
• Some interesting stuff happened to clarify/extend it in the late 19th and early 20th centuries
• Émile Borel proved a special case of the strong law (almost-sure convergence) in 1909, for Bernoulli trials (cite?)
• Francesco Cantelli provided the first relatively general proof of the strong law (which he called "uniform convergence in probability") in his 1917 paper: "Sulla probabilità come limite della frequenza." Atti Reale Accademia Nazionale Lincei 26: 39-45.
• Aleksandr Yakovlevich Khinchin coined the now-current term "strong law of large numbers" to describe what Cantelli had called "uniform convergence", in a short published letter of 1928: "Sur la loi forte des grands nombres." Comptes Rendus de l'Académie des Sciences 186: 285-87.
• Andrey Kolmogorov proved that the strong law holds in cases other than independent identically-distributed variables, subject to some other conditions. I think this was in 1929 (cite?).

--Delirium 09:43, 4 November 2005 (UTC)

intro

in the first sentence do the words 'average' and 'mean' alternate on purpose? Spencerk 04:16, 5 December 2005 (UTC)

I think by 'average' they are refering to the average of the sample from a large population, but the term 'mean' refers to the entire population. However, I think the first sentence is wrong, because the law should refer to the size of the sample, not the size of the population. I will let someone else change it, if they agree.

Possibly more important, the wall street journal noted that people are frequently misusing the 'law of large numbers' for example, "in a January appearance on CNBC, eBay chief executive Meg Whitman said, "Now, our businesses are getting larger and we will obviously face the law of large numbers, but we have actually changed the trajectory of the growth curve in our two largest businesses over the last three quarters."

Suggested First Paragraph revision

The Law of Large Numbers is a fundamental concept in statistics and probability. Stated in a formal style of language the law is described as follows:

If an event of probability p is observed repeatedly during independent repetitions, the ratio of the observed frequency of that event to the total number of repetitions converges towards p as the number of repetitions becomes arbitrarily large.

This means that the more units of something that are measured, the closer that sample average will be to the average of all of the units -- including those that were not measured. (The term "average" is specifically "the arithmetic mean.)

For example, the average weight of 10 apples taken from a barrel of 100 apples is probably closer to the "real" average weight than the average weight of 3 apples taken from that same barrel. This is because the sample of 10 is a larger number than the sample of 3. And then, if you took a sample of 99 apples out of 100 apples, the average would be almost exactly the same as the average for all 100 apples.

While this rule may appear to be self-evident to many readers, the development and use of this law allows statisticians to draw conclusions or make forecasts that would not be possible otherwise. In particular, it permits precise measurement of the likelihood that an estimate is close to the "right" number.

There are two versions of the Law of Large Numbers, one version called the "weak" law and the other is called the "strong" law. This article will describe both versions in technical detail, but in essence the two laws do not describe different actual laws but instead refer to different ways of describing the convergence of the sample mean with the population mean. The weak law states that as the sample size grows larger, the difference between the sample mean and the population mean will approach zero. The strong law states that as the sample size grows larger, the probability that the sample mean and the population mean will be exactly equal approaches 1.0.

One of the most important applications of the Law of Large Numbers, is called the Central Limit Theorem which, generally, describes how sample means tend to occur in a Normal Distribution around the mean of the population regardless of the shape of the population distribution, especially as sample sizes get larger. (See the article Central Limit Theorem for details of this application, including some important limitations.) This helps statisticians evaluate the reliability of their results because they are able to make assumptions about a sample and extrapolate their results or conclusions to the population from which the sample was derived with a certain degree of confidence. See Statistical hypothesis testing as an example.

The phrase "law of large numbers" is also sometimes used in a less technical way to refer to the principle that the probability of any possible event (even an unlikely one) occurring at least once in a series increases with the number of events in the series. For example, the odds that you will win the lottery are very low; however, the odds that someone will win the lottery are quite good, provided that a large enough number of people purchased lottery tickets.

The remainder of this article will assume the reader has a familiarity with mathematical concepts and notation.

I offer this as a revision to the first paragraph to make the concept more accessable to readers who may not be familiar with statistics. It may not be right yet, but with editing perhaps it can be incorporated into the article. --Blue Tie 04:47, 11 July 2006 (UTC)

Oh... I also like the term "Miracle of Large Numbers" instead of "Law of Large Numbers"... :-) --Blue Tie 04:48, 11 July 2006 (UTC)

Didn't you mean "stated in informal language? Michael Hardy 14:57, 11 July 2006 (UTC)

Well, not a bad point from the perspective of a mathematician. However, I did not mean formal definition. I meant that the language was more formal than ordinary conversation (and I think this is a typical useage of the term "formal" when discussing general mathematical concepts in English Words rather than describing "forms", "proofs" or other specific items.). My audience is someone who does not know what the Law of Large Numbers is or even, perhaps, what statistics are or that there are such things as formal proofs. However, if you think that the language is wrong, perhaps it should be changed. However, I would not agree to the word "informal" for the audience I was looking to speak to. They would NOT agree!--Blue Tie 15:42, 11 July 2006 (UTC)

One reason I commented is that the proposed language does not distinguish between the weak and strong laws. That's OK if you're being informal and will come to the precise statement later. Michael Hardy 17:03, 11 July 2006 (UTC)

You are right, it does not distinguish between those two laws. I will see if I can adjust it by adding something in that regard, because even though the target audience may not have a full understanding, the opening paragraph should be a reasonably complete summary and an introduction to the topic. --Blue Tie 02:38, 12 July 2006 (UTC)

I have made some changes to the paragraph. PLEASE COMMENT AND CORRECT.--Blue Tie 12:24, 12 July 2006 (UTC)

I am still looking for comments or criticisms. --Blue Tie 21:45, 16 July 2006 (UTC)

What is this?

This is not the LLN as I learned it in college (two schools). What I learned was:

If a numerically-valued random event has known probabilities, the probability distribution of the average of a large number of independent occurrences of the event is concentrated near the expectation based on the probabilities.

Almost every mention of expectation in the article has been removed in favor of some concept of subsets samples vs. whole population. Who first stated the law formally and what was the original form? Gazpacho 19:36, 2 October 2006 (UTC)

How is your statement above, functionally different than this statement:
If an event of probability p is observed repeatedly during independent repetitions, the ratio of the observed frequency of that event to the total number of repetitions converges towards p as the number of repetitions becomes arbitrarily large.

Older versions of the page can be viewed on the "History Tab". Select a version of interest.

Incidentally, there is no discussion on here of subsets. There are examples for people who have not studied the concept of the Law of Large numbers showing how it applies conceptual examples. Is that what you meant?

It appears that you may not have a substantive problem... the page is not incorrect... but that you may have a problem with how it is worded. The effort is to make the page more accessable in first few paragraphs to people who are not experts. Do you have an improvement?

--Blue Tie 19:57, 2 October 2006 (UTC)

In the LLN proof using Chebychev, don't you need strict inequalities when switching to the probability *within* epsilon of mu?

I did not write the original equations here but looking at them they seem correct.. tautological. But perhaps if you were to add a constraint of epsilon > 0 you would be right. But I think without that constraint they do not have to be strict. --Blue Tie 22:10, 2 October 2006 (UTC)

The use of the word "converges" is troublesome as it suggests a certainty of outcome in the long run. I have edited the intro accordingly. Gazpacho 23:31, 2 October 2006 (UTC)

I have a couple of problems with your edit. First of all, you have made the reading of the article more dense and oblique rather than easier. You have done so because the term "converges" bothers you but that is not only "a" standard term for such processes but in this case it is "the" standard term particularly with respect to the "Strong" version of the law. Finally, you have removed some helpful descriptive content.
This is not an emotional matter for me nor is it a matter of ownership, but I think your edits have degraded the article. Before I edit them back I want to discuss it. First of all... why do you think there is a problem with the word "converges"? The truth is that the proof is that the mean does indeed converge to p... or that the probability of the mean being equal goes exactly to 1.00. It is not a matter of being "close" and your edit is technically wrong on that matter and somewhat contradicts the rest of the article. That cannot be permitted. Either the math must change or the heading must change. --Blue Tie 00:30, 3 October 2006 (UTC)

I believe it's important not to assume that people understand how "converges" is used here because that invites the misinterpretations that lead to the Gambler's fallacy. I have suggested a different wording. Gazpacho 01:18, 3 October 2006 (UTC)

We could link to Convergence so that they could understand the correct terminology. I am not sure how to deal with problems where people change the meaning of words, but I don't think that we should compromise wikipedia for those people. I realize "converge" is standard terminology and it has a very definite and accurate meaning in this context where it is supposed to be that this is the limit of the probability or the probability function leading inexoribly "almost surely" to mu. But, we could look at a thesaurus for other terms: assemble, coincide, combine, come together, concenter, concentrate, concur, encounter, enter in, focalize, focus, join, meet, merge, mingle, rally, unite. None of these quite get to the same meaning as converge, but "meet" and "coincide" are sort of close. Maybe those would fit.
saying "converging in probabiity toward p" is redundant. Not good form. I do not understand your objections to the clearer and more precise wording. --Blue Tie 02:23, 3 October 2006 (UTC)

I read your edit and I understand what you are trying to do though the wording seems awkward. As I grasp it, it seems that you are concerned that people will believe that they can predict the outcome of an individual random event based upon past events. I do not think that converge leads to that possible error. I think that the concept of statistical inferencing can lead to that misunderstanding though. Somehow I think you have got the wrong things labeled as the problem here. Something looks off about this. --Blue Tie 02:35, 3 October 2006 (UTC)

Maybe a separate section about "misconceptions" could be the answer. I do not know of anyone who has these misconceptions but I think maybe you have had some experience with people who have. --Blue Tie 02:37, 3 October 2006 (UTC)

Further Suggested Revision

I still think the article could use some simplification. Thoughts on these first few paragraphs?

The law of large numbers is a fundamental concept in statistics and probability that describes how the average of a random sample from a large population approaches the average of the whole population.

In formal language:

If an event of probability p is observed repeatedly during independent repetitions, the ratio of the observed frequency of that event to the total number of repetitions converges towards p as the number of repetitions becomes arbitrarily large.

In statistics, this says simply that as the size of a random sample increases, the average of the sample converges to the true average.

For example, consider flipping a fair coin (that is, it comes up heads 50% of the time). It is certainly possible that if we flip the coin 10 times, we may end up with only 3 heads (30%). The Law of Large Numbers shows that as you flip the coin more and more (say 10,000 times), the percentage of coin flips that are heads will come ever closer to 50%. Alternatively, it becomes less probable to maintain a rate of 30% heads as more coins are flipped, as this is below the true average.

While this rule may appear self-evident, it allows statisticians to draw conclusions or make forecasts that would not be possible otherwise. In particular, it permits precise measurement of the likelihood that an estimate is close to the "right" or true number.

Note, however, that the value of a single observation cannot be predicted using previous observations of independent events. (For example, if a coin has come up heads 30% of the time, it is incorrect to say that the coin is "due" to come up heads - see the Gambler's Fallacy.

--Topher0128 21:28, 25 October 2006 (UTC)