Talk:Law of total probability

Statistics High‑importance

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics articles
High	This article has been rated as High-importance on the importance scale.

Mathematics High‑priority

	Mathematics portal This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.MathematicsWikipedia:WikiProject MathematicsTemplate:WikiProject Mathematicsmathematics articles
High	This article has been rated as High-priority on the project's priority scale.

Gibberish for anyone wanting to learn[edit]

The only people who could possibly get anything out of this, already know.

mistake?[edit]

"because \Pr(A\mid B_n)\, is finite." Shouldn't that be 0/0, infinite, or undefined? 128.163.8.202 (talk) 10:40, 16 September 2015 (UTC)[reply]

Statement[edit]

I like the statement of the definition of total probability. It is readable and specialist terms are well referenced which enable the lay reader to gain some understanding of what is meant. Is there a good example of this proposition working in practice? Blueawr (talk) 19:04, 31 August 2012 (UTC)[reply]

Notations[edit]

I read somewhere that $P(A|B):={\frac {P(A\cap B)}{P(B)}}$

in which case the two "different" notions of total probability would be aequivalent, since obvisously

$P(A\cap B_{n})=P(A\mid B_{n})P(B_{n})$ for $n=1,2,3,\ldots$

Note however that i have very little knowledge of the whole domain, so maybe i just missed the point...

--lu 10:42, 26 Apr 2005 (UTC)

This is good. The article really should related to this. Jackzhp (talk) 15:55, 12 March 2011 (UTC)[reply]

conditional probability[edit]

The conditional probablity of A given B is only defined if the probability of B is non-zero. This is really unsatisfactory in this context, since you end up having to partition the space, avoiding zero-measure sets.

There must be a way around this tecnicallity. Or is it necessary? --67.103.110.175 02:55, 27 May 2006 (UTC)[reply]

OK, self, the statement should run more like, given Bi, mutually exclusive, whose probabilities sum to one ... Mathworld[1] So you can ignore that part of the space with zero probability, or include it, according to the situation. All that is important is that the part of the space with positive probability be covered. --67.103.110.175 02:55, 27 May 2006 (UTC)[reply]

It is absolutely necessary to be able to condition on continuous random variables. That entails conditioning on events of probability zero and getting nonzero conditional probabilities. Michael Hardy 21:52, 27 May 2006 (UTC)[reply]

Thanks for reverting and keeping the page accurate. However, in this case, it is a vanity and disservice not to mention that P(A|B) will not be defined for P(B)=0, in the discrete case. As for the continuous case, these are going to be a countable collection of events covering the space, so zero probability events remain insubstantial. Ozga 00:02, 28 May 2006 (UTC)[reply]

They are not at all insubstantial in the continuous case. When one finds E(Pr(A | X)). where X is a continuous random variable, the event on which one conditions always has probability 0. Michael Hardy 01:04, 30 May 2006 (UTC)[reply]

As in E(Pr(A | X))=Pr(A), as one reintegrates up the slices of A? --Ozga 17:28, 31 May 2006 (UTC)[reply]

I find this sentence confusing: "In the discrete case, the statements above are equivalent to the following statement, which also holds in the continuous case and is not the same as the law of alternatives." I think it should be reworded. Also, it is rather tricky to define the conditional probability of an event given the value of a continuous random variable. I think it is more or less

\mathrm {Pr} (A|X=x)=\lim _{\epsilon \rightarrow 0}{\frac {\mathrm {Pr} (A\cap \{x-\epsilon <X<x+\epsilon \})}{\mathrm {Pr} (\{x-\epsilon <X<x+\epsilon \})}},

provided that the density function of X is continuous at x, but take that with a grain of salt. --130.94.162.64 23:48, 15 June 2006 (UTC)[reply]

Indeed, it can be considered "tricky", since probably most people who grasp the idea intuitively do not know the Radon-Nikodym theorem, on which one of the usual definitions relies. But nonetheless perfectly doable. Michael Hardy 00:56, 16 June 2006 (UTC)[reply]

I still find that sentence confusing. From reading the entire article, I get the impression that the "law of total probability" has a perfectly unambiguous meaning that can be applied in a consistent way to both the discrete and continuous cases. The disclaimer that "nomenclature is not wholly standard" seems both a little unnecessary and rather foreboding to me. Perhaps instead, one could mention (in the beginning of that section) that in the discrete case it is also called the "law of alternatives". Also, the article should have references. --130.94.162.64 18:46, 17 June 2006 (UTC)[reply]

I think that's exactly what we should do. MisterSheik 22:34, 16 February 2007 (UTC)[reply]

conditional probability no2[edit]

The text states: "The law of total probability can also be stated for conditional probabilities. Taking the $B_{n}$ as above, and assuming $X$ is not mutually exclusive with $A$ or any of the $B_{n}$ ". Note that is assumes that $X$ is not mutually exclusive with any of the $B_{n}$ . Doesn't this imply that $X$ is the sample space if each $B_{i}$ has a single event? — Preceding unsigned comment added by 134.58.253.57 (talk) 08:06, 11 May 2012 (UTC)[reply]

Definition[edit]

From my user talk page 83.67.217.254 14:39, 26 August 2007 (UTC)[reply]

Your edit was horribly wrong. We DO NOT want to condition on the event's actually occurring; we DO want to allow general random variables and not just indicator variables of events; and the fact that we sometimes condition on events of probability 0 and therefore need to talk about densiety functions in order to apply the result to those cases in no way means the identity is wrong. Michael Hardy 14:27, 26 August 2007 (UTC)[reply]

Hey, thanks for the flowers Michael! :-) You were referring to this edit, which you reverted with a charming and welcoming comment. I'll be back later. 83.67.217.254 14:39, 26 August 2007 (UTC)[reply]

In the meantime you may want to reflect on the fact that the definition of conditional probability does not involve random variables, but events. In other words, in the definition of P(A|B), both A and B are events, not random variables. 83.67.217.254 14:44, 26 August 2007 (UTC)[reply]

You are writing nonsense. The definition there is the definition of conditional probability given an event. There is also such a thing as conditional probability given a random variable. That is the concept needed here. I have a Ph.D. in statistics and I know this material. Michael Hardy 17:02, 28 September 2007 (UTC)][reply]

And I repeat: the edit I reverted was horribly horribly wrong. Michael Hardy 17:04, 28 September 2007 (UTC)[reply]

I agree the definition is inconsistent —Preceding unsigned comment added by 221.47.185.2 (talk) 01:54, 28 September 2007 (UTC)[reply]

OK, I've got a minute or two and I will elaborate (barely):

The conditional probability of an event given another event is just a number, not a random variable, so there would be no sense in evaluating its expected value, as this law does.
To say that this is true of any event is clearly nonsense because almost any specific concrete example you pick would be a counterexample proving that the proposed law is in fact false. And I would claim that that is trivial and obvious.
This law is found in many many many textbooks. Look it up. Don't insist on making a complete crackpot of yourself.

Michael Hardy 18:01, 28 September 2007 (UTC)[reply]
PS: The word definition that you've chosen as a heading is wrong. Concepts have definitions; propositions have statements. This is a proposition, not a concept. If you don't even know such elementary things, you shouldn't be acting that way you are. Michael Hardy 18:02, 28 September 2007 (UTC)[reply]

I have now added a new section to conditional probability that I hope will clear up the confusion on this point. Michael Hardy 19:47, 25 October 2007 (UTC)[reply]

I find it very interesting that your new definition almost exactly replicates my original edit. While I'm glad you are starting to understand, conditional probability given a continuous random variable is still not covered (unsurprisingly) by this novel definition of yours. Therefore, as I was suggesting in my edit comment, the definition of the law of total probability for continuous variables is still ill-defined. Horribly horribly yours, 83.67.217.254 13:01, 27 October 2007 (UTC)[reply]

It comes nowhere near replicating your edit, since you wrote of the conditional probability given an event, not about the conditional probability given a random variable, and your statement was clearly erroneous: you didn't need to understand what the law of total probability says in order to see that the statement as you wrote it was wrong, since virtually any example you picked would be a counterexample. I have not given any new definition in this article. I added a new section to a different article. There is nothing "novel" in the definition I gave. It is standard. I did not give the case of continuous random variables. The case I did give should make the idea clear.

Do not speak of "the definition of the law". One defines concepts; one states propositions. This law is a proposition, not a concept.

It is becoming apparent that you are here only to have fun arguing and you're dishonest. Michael Hardy 20:47, 27 October 2007 (UTC)[reply]

Other editors can judge for themselves whether or not your definition of conditional probability given a (discrete) random variable is equivalent to my edit on this article. You are entitled to your opinion and I am no longer interested in this aspect.

Since this definition of probability given a random variable is standard, you should have no problems referencing it (with page numbers if possible) and extending it (with references) to continuous variables. Until then, the current definition of the law of probability will not cover continuous variables.

By the way, as a minor point, I inform you that, despite your insistence to the contrary, I will keep writing about the "definition of the law", as opposed to, I suppose, the "statement of the law", since I find no ambiguity and you seem to understand what I mean perfectly well.

As for your accusation of "fun arguing"; I don't know what your concept of "fun" is, but I can assure you that your groundless and repeated personal attacks are not. In particular, I'd be interested to know when exactly I have been dishonest.

Also, in my user page you accuse me of "cowardice" for editing Wikipedia "anonymously". Firstly, I am not editing any more anonymously than you are. Wikipedia does not require the user names to correspond to real names, and provides no way to anybody to verify whether that is the case. If by "anonymous" you mean "unregistered", I'll have you know that I have been a happy unregistered user for quite some time, that Wikipedia welcomes unregistered users (and so should its editors) and that edits by unregistered users should not be discriminated against. This means by the way that you are not assuming good faith. Anyway, if you think I am a "coward" for not registering under my real name, I beg to differ and put to you that I find it foolish for anybody to register under their real name and run the risk (for some users apparently higher than for others) to have a permanent public record of being incompetent and behaving like a dick.

Finally, can I ask you to kindly rein in your use of bold and italics? It makes you look hysterical and desperate, and it's not conductive to a civil discussion. Thanks. 83.67.217.254 12:32, 28 October 2007 (UTC)[reply]

Definition of the law? The definition of a law, proposition or theorem makes as much sense as the definition of a sentence. Would you say, "The first sentence of the Gettysburg Address has 40 words, one main clause, one adverbial phrase, two adjectival phrases, . . ."? No, you would say, "The first sentence of the Gettysburg Address is, 'Four score and seven years ago . . .'" What you are calling the definition of the law is the law. What you see is what you get. Robert O'Rourke (talk) 04:37, 8 September 2010 (UTC)[reply]

It's starting to become apparent that you are a retired lawyer with a lot of time on his hands. Now you're trying to make it appear that I said it's cowardly to edit Wikipedia anonymously. You know that I said I have nothing against anonymous editing and that I sometimes edit anonymously myself. I never said you're a coward simply for editing anonymously. I said that your behavior is that of a coward and that in your case it is out of cowardice that you edit anonymously. Michael Hardy 16:19, 28 October 2007 (UTC)[reply]

Can you prove this accusation? Or are you just failing to assume good faith? Actually, don't answer that, please let's stick to the more important issue at hand. 83.67.217.254 16:28, 28 October 2007 (UTC)[reply]

I think I can prove that in some of your comments you acted as if the accusation is true. Michael Hardy 23:34, 31 October 2007 (UTC)[reply]

So you can't prove your accusation(s). 83.67.217.254 00:26, 1 November 2007 (UTC)[reply]

As I said, it's becoming apparent that you're a retired lawyer with time on his hands. Michael Hardy 02:40, 1 November 2007 (UTC)[reply]

I agree with Michael Hardy. The edit was clearly wrong and the reversion was necessary. You can't change the definition of a term in a proposition and expect it to remain true. In this case it is untrue. If N is an event, Pr(A|N) is a number. The expectation of a number is the number, so E[Pr(A|N)]=Pr(A|N). The law now states that Pr(A|N)=Pr(A), which isn't always true. Robert O'Rourke (talk) 22:27, 22 August 2010 (UTC)[reply]

Another problem with the edit [2] is its recommendation for bringing in the conditional density function. An authority in probability theory recommends an alternative approach: "We have now defined a conditional expectation E(Y|X) in terms of a conditional distribution, and this is quite satisfactory as long as one deals only with one fixed pair of random variables X, Y. However, when one deals with whole families of random variables the non-uniqueness of the individual conditional probabilities leads to serious difficulties, and it is therefore fortunate that it is in practice possible to dispense with this unwieldy theory. Indeed, it turns out that a surprisingly simple and flexible theory of conditional expectation can be developed without any reference to conditional distributions." ^[1]. He goes on to define condtional expectation given a sigma-algebra and conditional expectation given one or more random variables as the conditional expectation given the sigma algebra they generate. With condtional expectation defined, condtional probability can be defined as the conditional expectation of an indicator function. Robert O'Rourke (talk) 03:29, 25 August 2010 (UTC)[reply]

Law of total probability, conditional probability and conditional expectation[edit]

As per discussion above, the description currently given does not cover continuous random variables. I am no expert in this subject, but I suspect that the definition of the law currently suggested by this article cannot be extended to continuous variables without introducing the concepts of Borel sets, sigma field generated by a variable and the conditional expectation (as opposed to conditional probability) of a variable given that sigma field (see e.g. Mikosch 1998, sections 1.4.2--1.4.3).

I also suspect that the unreferenced definition of conditional probability given a random variable is original research. I believe that the "law of alternatives" reported in a separate section of the article is a better description covering discrete random variables. Incidentally, I think that contrary to what the article currently states, the partition defined by a discrete random variable is indeed a special case of a generic partition of the probability space.

Thanks for any insights. 83.67.217.254 23:09, 31 October 2007 (UTC)[reply]

It is not original research; it is found in many textbooks. I'll dig one out and cite it. For the most general definition, some measure-theoretic stuff is needed. Maybe I'll add them when I'm feeling more ambitious. But I think the very simplest example, which is the only one I gave, fully conveys what the concept is supposed to accomplish. 23:33, 31 October 2007 (UTC)

Yes, you do need to introduce Borel sets and sigma-algebras. To paraphrase Harry Truman, if you can't stand the integral, get away from the law of total probability.

It is a problem that Conditional expectation#Definition of conditional probability only goes as far as defining the conditional probability given a Borel sigma-algebra. Here [3] is a definition of conditional probabability given a random variable as the condtional probability given the Borel sigma-algebra generated by the random variable. With this definition, the proof is simple. Robert O'Rourke (talk) 22:52, 22 August 2010 (UTC)[reply]

Please do not be overly ambitious if this is not your area of expertise. Hopefully this request will attract an expert third opinion. 83.67.217.254 00:21, 1 November 2007 (UTC)[reply]

The discrete version doesn't require sigma-algebras; the most general versions do. Michael Hardy (talk) 23:24, 22 August 2010 (UTC)[reply]

References[edit]

^ Feller, William (1971). An Introduction to Probability Theory and Its Applications, Vol. 2, 2nd ed. New York: John Wiley & Sons. p. 162. ISBN 0471257095.

Assessment comment[edit]

The comment(s) below were originally left at Talk:Law of total probability/Comments, and are posted here for posterity. Following several discussions in past years, these subpages are now deprecated. The comments may be irrelevant or outdated; if so, please feel free to remove this section.

Geometry guy 18:38, 21 May 2007 (UTC) This has been improved a bit since last May and had several references added, and I imagine this article will always be fairly short, so I'm upgrading from Stub to Start class. If you disagree please explain why on this comments page. --Qwfp (talk) 12:42, 24 January 2008 (UTC)[reply]

Last edited at 12:42, 24 January 2008 (UTC). Substituted at 20:03, 1 May 2016 (UTC)

Pr(A) vs. P(A)?[edit]

Why does this article use "Pr(A)" rather than "P(A)" as most articles (and references) do? — Preceding unsigned comment added by 99.46.205.49 (talk) 14:14, 7 November 2017 (UTC)[reply]

"Law" or "Theorem"[edit]

I suppose there is a tradition in mathematics teaching that a proposition is called a "law" if establishing it as a theorem or a definition is less important that drilling into students heads. However, since this is a topic in mathematics, it would be appropriate to have a section that indicates a proof of this theorem. It seems to me that any coherent discussion of mathematical probability should mention probability spaces and probability measures, although it need not do so in the introduction.

The content of mathematical articles in Wikipedia is schizophrenic. Some are detailed and formal. Some, like the current article are informal; they treat the topic in a Platonic sense, as if the subject is a physically real phenomena and the mathematics is merely an observer's description instead of a definition. To a non-specialist the Platonic view is more useful than the formal view. However, mathematical articles should acknowledge the basic organization of mathematics - definitions, assumptions, theorems, even if they choose not to present a lot of formal detail.

I see no reason for the current article to describe the "law" of total probability with the more general term "proposition" if the specific term "theorem" is appropriate.

Tashiro~enwiki (talk) 08:42, 31 August 2018 (UTC)[reply]

the alternative 'these terms are simply omitted from the summation'[edit]

ok so events of zero probability can partition like for a continuous random variable Y, the events (Y=k), for k in the range of Y actually partition Y's probability space's sample space

but how would you use the 'alternative' to compute P(A), for some A (an event i.e. an element of Y's probability space's sigma-algebra)? Thewriter006 (talk) 22:54, 28 March 2021 (UTC)[reply]

the law of total probability for conditional probabilities doesn't need to assume independence[edit]

In the original article the derivation assumes independence, but you don't need to assume any independence to arrive at the original equation. Having the independence comment makes it seems like the derivation is only valid with the assumption.

${\begin{aligned}P(A|C)&=\sum _{n}P(A|C\cap B_{n})P(B_{n})\\&=\sum _{n}{\frac {P(A\cap C\cap B_{n})P(B_{n}\cap C)}{P(C\cap B_{n})P(C|B_{n})}}\\&=\sum _{n}{\frac {P(A\cap C\cap B_{n}){\cancel {P(B_{n}\cap C)}}}{{\cancel {P(C\cap B_{n})}}P(C|B_{n})}}\\&=\sum _{n}{\frac {P(A\cap C\cap B_{n})}{P(C|B_{n})}}\\&=\sum _{n}{\frac {P(B_{n})P(C|B_{n})P(A|C\cap B_{n})}{P(C|B_{n})}}\\&=\sum _{n}{\frac {P(B_{n}){\cancel {P(C|B_{n})}}P(A|C\cap B_{n})}{\cancel {P(C|B_{n})}}}\\&=\sum _{n}P(B_{n})P(A|C\cap B_{n})\end{aligned}}$ 71.178.191.95 (talk) 00:24, 17 August 2023 (UTC)[reply]

[1] Feller, William (1971). An Introduction to Probability Theory and Its Applications, Vol. 2, 2nd ed. New York: John Wiley & Sons. p. 162. ISBN 0471257095.

[1]