Talk:Conditional probability

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics articles
???	This article has not yet received a rating on Wikipedia's content assessment scale.
???	This article has not yet received a rating on the importance scale.

Philosophy: Logic Start‑class Mid‑importance

This article is within the scope of WikiProject Philosophy, a collaborative effort to improve the coverage of content related to philosophy on Wikipedia. If you would like to support the project, please visit the project page, where you can get more details on how you can help, and where you can join the general discussion about philosophy content on Wikipedia.PhilosophyWikipedia:WikiProject PhilosophyTemplate:WikiProject PhilosophyPhilosophy articles

Start

This article has been rated as Start-class on Wikipedia's content assessment scale.

Mid

This article has been rated as Mid-importance on the project's importance scale.

Associated task forces:
/
	Logic

Mathematics Start‑class Mid‑priority

	Mathematics portal This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.MathematicsWikipedia:WikiProject MathematicsTemplate:WikiProject Mathematicsmathematics articles
Start	This article has been rated as Start-class on Wikipedia's content assessment scale.
Mid	This article has been rated as Mid-priority on the project's priority scale.

I've struck out the sentence about decision trees. There is certainly no sense in which conditional probability calculations are generally easier with decision trees. Decision trees can indeed be interpreted as conditional probability models (or not), but in any event, they are a very, very small part of the world of conditional probability, and making an unwarranted assertion about a minor topic is out of place. Wile E. Heresiarch 17:13, 1 Feb 2004 (UTC)

Wrong?

Conditional probability is the probability of some event A, given that some other event, B, has already occurred

...

In these definitions, note that there need not be a causal or temporal relation between A and B. A may precede B, or vice versa, or they may happen at the same time.

This statement is totally confusing - if event B has already occurred, there has to be a temporal relation between A and B (i.e. B happens before A). --Abdull 12:50, 25 February 2006 (UTC)[reply]

I've reworded it. --Zundark 14:32, 25 February 2006 (UTC)[reply]

Great, thank you! --Abdull 11:24, 26 February 2006 (UTC)[reply]

Since the subject of the article is completely formal, I dislike the references to time, expressions like "temporal relation" or one event "preceeding" another, because I find them informal in this context. In the framework of probability space where we are working time is not formally introduced: what "time" does an event $A$ take place? In fact, when we specifically want to represent or model how our knowledge of the world (represented by random variables) is growing as time passes, we can do it by means of filtrations. And I feel the same goes for the "causal relation", in the article such notion is not defined formally.--zeycus 15:22, 23 February 2007 (UTC)[reply]

The purpose of this paragraph is to dispel the common misconception that conditional probability has something to do with temporal relationships or causality. The paragraph is necessarily informal, as a probability space does not even have such concepts. (By the way, contrary to your suggestion on my Talk page, this paragraph was added by Wile E. Heresiarch on 10 February 2004. The rewording I mentioned above did not touch this paragraph, it simply removed incorrect suggestions of temporal relationships elsewhere in the article. All this can be seen from the edit history.) --Zundark 08:36, 24 February 2007 (UTC)[reply]

I apologize for attributing you the paragraph. I understand what you mean, but I think it is important to separate formal notions from informal ones. So I will add a short comment afterwards. --zeycus 9:42, 24 February 2007 (UTC)

Undefined or Indeterminate?

In the Other Considerations section, the statement If $P(B)=0$ , then $P(A\mid B)$ is left undefined. seems incorrect. Is it not more correct to say that $P(A\mid B)$ is indeterminate?

$IfP(B)=0,thenP(A\cap B)=0$ regardless of $P(A)orP(A\mid B)$ .

Bob Badour 04:36, 11 June 2006 (UTC)[reply]

It's undefined. If you think it's not undefined, then what do you think its definition is? --Zundark 08:54, 11 June 2006 (UTC)[reply]

Indeterminate as I said, the definition of which one would paraphrase to incalcuable or unknown. However, an indeterminate form can be undefined, and the consensus in the literature is to call the conditional undefined in the abovementioned case. There are probably reasons for treating it as undefined that I am unaware of, and changing the text in the article would be OR. Thank you for your comments, and I apologize for taking your time. -- Bob Badour 00:07, 12 June 2006 (UTC)[reply]

Something about this is bothering me. Suppose $X$ is normal standard. I am considering $A\equiv X=0$ and $B\equiv X\in \{0,5\}$ , for example. Clearly $P(A)=P(B)=0$ . However, I feel that $P(A\mid B)$ should be defined, and in fact equal to ${\frac {f(0)}{f(0)+f(5)}}$ where $f$ is the density function of $X$ . In order to informally justify this, I would define $A_{\epsilon }=(-\epsilon ,\epsilon )$ and $B_{\epsilon }=(-\epsilon ,\epsilon )\cup (5-\epsilon ,5+\epsilon )$ for any $\epsilon >0$ . Then, if I am not wrong, $\lim _{\epsilon \to 0^{+}}{P(A_{\epsilon }\mid B_{\epsilon })}={\frac {f(0)}{f(0)+f(5)}}\approx 0.999996273$ .

Suppose someone tells me that a number has been obtained from a normal standard variable, that it is 0 or 5, and that I have a chance for a double-nothing bet trying to guess which one of them it was. Shouldn't I bet for the 0? And how can I argument it, if not with the calculations above? Opinions are most welcome. What do you think? -- zeycus 18:36, 22 February 2007 (UTC)[reply]

I think you are absolutely right. However, the theory needed to obtain this is a lot more complicated than the theory needed to understand conditional probability as such. Should the article state clearly from the start that we are deling with discrete distributions only, and then perhaps have a last section dealing with generalization to continuous distributions?--Niels Ø (noe) 19:33, 22 February 2007 (UTC)[reply]

It's already valid for continuous distributions, at least now that I've cleaned up the definition section. It's not usual to define P(A|B) when P(B) = 0, but if someone can find a decent reference for this then it might be worth adding to the article. --Zundark 11:25, 24 February 2007 (UTC)[reply]

I was not able to find any source defining

P(A\mid B)

when

P(B)=0

. I posted in a math forum, and after an interesting discussion someone gave a nice argument (a bit long to be copied here) justifying why it does not make sense. I consider the point clarified. --zeycus 15:30, 28 February 2007 (UTC)[reply]

I'm afraid these comments are almost entirely wrong. It is perfectly possible to condition on events of probability zero, and this is in fact common. Consider tossing a coin. If one does not know if the coin is fair or not, in the Bayesian world one assigns a probability distribution to the parameter p representing the probability of getting a head. This distribution reflects the degree of belief one has in the fairness of the coin. In the event that this distribution is continuous it is perfectly reasonable to condition on the event that

p=1/2

, even though this event has probability zero. To define these conditional probabilities rigourously requires measure theory, and this approach agrees with the naive interpretation given in first level courses. A good reference is Basic Stochastic Processes by Zastawniak and Brzezniak. PoochieR 21:45, 6 November 2007 (UTC)[reply]

To repeat myself from above: Should the article state clearly from the start that we are deling with discrete distributions only, and then perhaps have a last section dealing with generalization to continuous distributions?--Niels Ø (noe) 12:31, 7 November 2007 (UTC)[reply]

No, because we aren't dealing only with discrete distributions. --Zundark 12:39, 7 November 2007 (UTC)[reply]

Some of the most important modern uses of conditional probability are in Martingale theory, with direct practical applications in all areas of mathematical finance. It is simply impossible to deal with these without conditioning on events of probability zero, so I think it's important that you should include these. A way round would be to make it clear that the definition you have given is a naive definition, which only works for conditioning on events with probability > 0; however to give the definition which works for conditioning on any event requires the use of measure theory. The measure theoretic definition agrees with the naive definition where that is applicable. The natural way to express the measure theoretic formulation is in terms of conditional expectations, conditional on sigma-algebras of events; in this formulation

P(A|B)=E(I(A)|\sigma (I(B)))

, where

I(A)

is the indicator random variable of event A. A better reference than the one I gave before is: Probability with Martingales, David Williams, Ch.9. PoochieR 09:41, 8 November 2007 (UTC)[reply]

I am quite happy to edit your definition with respect to

P(A|B)

when

P(B)=0

but you cannot leave it as it is. The correct definition to make the discrete case correspond with the more general case is to define

P(A|B)=0

when

P(B)=0

. There are no problems then with the naive interpretation, and the benefit of agreeing with the more sophisticated approach. In many ways it is similar to the debates that used to go on regarding

0!

.PoochieR 18:16, 15 November 2007 (UTC)[reply]

Not to be too rude, PoochieR, but

P(A|B)=0

whenever

P(B)=0

is missing an obvious case: we clearly want

P(B|B)=1

! Don't forget that the marginal distribution is, nonetheless, intended to be a distribution. ub3rm4th (talk) 10:10, 14 December 2008 (UTC)[reply]

This is a general encyclopedia. I think it's important to write a readable and accessible article, as far as possible, and as a matter of presentation, I think we do that best by limiting ourselves to discrete situations. The purely continuous cases (requiring integrals and such), and the mixed cases (requiring measure theory) can either be treated

further down in the article,
in separate articles,
or by reference to external sources like MathWorld.

--Niels Ø (noe) (talk) 14:00, 23 November 2007 (UTC)[reply]

About conditioning on zero-probability condition: see also Conditioning (probability). Boris Tsirelson (talk) 16:34, 14 December 2008 (UTC)[reply]

Use of 'modulus signs' and set theory

Are the modulus signs in the "Definition" section intended to refer to the cardinality of the respective sets? It's not clear from the current content of the page. I think the set theory background to probability is a little tricky, so perhaps more explanation could go into this section?

I absolutely agree.--Niels Ø (noe) 14:13, 29 January 2007 (UTC)[reply]

I may be wrong, but it seems to me that the definition $P(A\mid B)={\frac {\mid A\cap B\mid }{\mid B\mid }}$ is not just unfortunate, but simply incorrect. Consider for example the probability space $(\Omega ,F,P)$ with $\Omega =\{a,b,c,d\}$ , the set of events $F=2^{\Omega }$ and probabilities $P(\{a\})=0.4$ , $P(\{b\})=0.3$ , $P(\{c\})=0.2$ and $P(\{d\})=0.1$ . Let $A=\{a\}$ and $B=\{a,b\}$ . Then $P(A\mid B)={\frac {P(A\cap B)}{P(B)}}={\frac {P(A)}{P(B)}}={\frac {4}{7}}$ . However, ${\frac {\mid A\cap B\mid }{\mid B\mid }}={\frac {\mid A\mid }{\mid B\mid }}={\frac {1}{2}}$ . --zeycus 4:46, 24 February 2007 (UTC)

The text talks about elements randomly chosen from a set. The author's intent clearly is that this implies symmetry.--Niels Ø (noe) 08:29, 24 February 2007 (UTC)[reply]

Yes, you are absolutely right. But then, why defining conditional probability only in that particular case, when it makes sense and is usually defined for any probabilistic space with the same formula

P(A\mid B)={\frac {P(A\cap B)}{P(B)}}

. --zeycus 8:43, 24 February 2007 (UTC)

I agree this is a weak section in the article; one should not have to guess about the author's intentions. Anyway, I think the idea is to gereralize from the fairly obvious situation with symmetry to the general formulae. Of course, that kind of reasoning does not really belong under the heading "Definition". Go ahead; try your hand at it!--Niels Ø (noe) 10:07, 24 February 2007 (UTC)[reply]

I've restored the proper definition. --Zundark 11:06, 24 February 2007 (UTC)[reply]

Valid for continuous distributions?

Two events A and B are mutually exclusive if and only if P(A∩B) = 0...

Let X be a continuous random variable, e.g. normally distributed with mean 0 and standard deviation 1. Let A be the event that X >= 0, and B the event that X <= 0. Then, A∩B is the event X=0, which has probability 0, but which is not impossible. I don't think A and B should be called exclusive in this case. So, either the context of the statement from the article I quote above should be made clear (For discrete distributions,...), or the statement itself should be modified.

Would it in all cases be correct to say that A and B are exclusive if and only if A∩B = Ø ? Suppose U={0,1,2,3,4,5,6}, P(X=0)=0 and P(X=x)=1/6 for x=1,2,3,4,5,6 (i.e. a silly but not incorrect model of a die). Are A={X even}={0,2,4,6} and B={X<2}={0,1} mutually exclusive or not?--Niels Ø (noe) 14:13, 29 January 2007 (UTC)[reply]

I wonder if that definition is correct. In the article mutually exclusive, n events are defined as exclusive if the occurrence of any one of them automatically implies the non-occurrence of the remaining n − 1 events. Very similarly, in mathworld:

n events are said to be mutually exclusive if the occurrence of any one of them precludes any of the others.

As Niels said, that is in fact stronger than saying

P(A\cap B)=0

. Somehow, I think the definition now in the article should be labeled as "almost mutually exclusive". Shouldn't we just say that

A

and

B

are mutually exclusive just if

A\cap B=\emptyset

, and avoid all this fuss?--User:Zeycus 10:03, 20 March 2007 (UTC)[reply]

No answer in three weeks. In a few days, if nobody has anything to say, I will change the definition in the article.--User:Zeycus 14:03, 9 April 2007 (UTC)[reply]

Done.--User:Zeycus 8:30, 13 April 2007 (UTC)

PLAIN ENGLISH

Thank you for your fine work. However, it would be useful to more people if you would provide a definition of your math notation such as http://upload.wikimedia.org/math/6/d/e/6de3a4670340b7be5303b63574cb3113.png

An example?

Here's an example involving conditional probabilities that makes sense to me, and usually also to the students to whom I teach this stuff. It clearly shows the difference between P(A|B) and P(B|A).

As the example is currently in the article, there's no need to repeat it here.--Niels Ø (noe) (talk) 11:28, 16 December 2007 (UTC)[reply]

So that's my example. Do you like it? Should I include it in the article? Can you perhaps help me improve on it first?--Niels Ø (noe) 11:48, 13 February 2007 (UTC)[reply]

No replies for 10 days. I don't know how to interpret that, but I'll now be bold and ad my example to the article.--Niels Ø (noe) 09:25, 23 February 2007 (UTC)[reply]

Suggestions for improvement:

* It's a bit confusing that the odds of having the disease and odds of false positive BOTH being 1%. It would be better to have one be different, say 10%.

* I think some people (including myself) see things better graphically. You can represent the same problem (using your original numbers) as a 1.0 x 1.0 square, with one edge divided up into 0.99 and 0.01, and the other edge divided up into 0.99 and 0.01. Now you have one large rectangle (0.99 x 0.99) which represents the those that test negative and are negative, and a tiny rectangle (0.01 x 0.01) that represents those that are testing negative but are positive. The remaining two tall and skinny rectangles (0.99 x 0.01) and (0.99 x 0.01) represent those who are testing positive. One of those skinny rectangles represents positive and testing positive, the other represents negative and testing positive. Those are about the same size, so that would give you the half the false positive rate. I think exploding the rectangles, exaggerating the size of the 0.01 portons, and clearly labelling them would help to.

Clemwang 04:05, 22 March 2007 (UTC)[reply]

I agree with Clemwang that a diagram would make the explanation more intuitive.Jfischoff (talk) 01:03, 5 August 2009 (UTC)[reply]

So, my example has been in the article for about half a year now. My text has some imperfections - e.g. the way equations are mixed into sentences, which is grammatically incorrect. I hoped someone with a better command of English than myself might correct that, but nothing has happened. I wonder, did anyone actually read this example?

Replies to Clemwang: I don't think having all three probabilities equal 1% is really a problem. Of course, without making the example less realistic, one might let them be 1%, 2% and 3%, say. - I like the type of diagram you suggest; in my experience, they are good for understanding this type of situation, but (surprisingly to me), tree diagrams are more helpful for solving problems (i.e. fewer students mess things up that way). In this particular example, the graphical problem of comparing 1% to 99% is severe; the best solution would actually be if someone could come up with a meaningful example to replace mine, where the three probabilities are 10%, 20% and 30%, say.--Niels Ø (noe) 11:23, 17 September 2007 (UTC)[reply]

Is it just me or is there a typo in the example of the article where it gives the final result of false positives as .5% Shouldn't it be 50%, as it says on this page? 146.244.153.149 22:22, 29 September 2007 (UTC)[reply]

I'm not sure what you mean. It says at one place: "0.99% / 1.98% = 50%". Reading "%" as "times 0.01", it just says 0.0099 / 0.0198 = 0.50, which is correct.--Niels Ø (noe) 06:53, 2 October 2007 (UTC)[reply]

A comment on the (quite good!) example currently in the article: remove the sentence "With the numbers chosen here, the last result is likely to be deemed unacceptable: half the people testing positive are actually false positives." This is misleading and makes the example seem more mysterious than it is. The reason the probability of not having the disease conditioned on a positive test is so large is because the disease is *so* rare... Very little harm is done to the public in general, only the small percentage who tested positive. It is a small point but for the effect of increasing the wow factor of the example I think some of its simplicity is being hidden. If no comment in 10 days (today is july 14 2009), I will remove the sentence. —Preceding unsigned comment added by 76.169.198.27 (talk) 07:45, 14 July 2009 (UTC)[reply]

I'm glad you like the example (I made it up!). I think that sentence serves a purpose: To help a reader get his/her mind arounhd the example and fully understand the difference between A|B abd B|A. Alternatively, the numbers could be tweaked a bit to make the false positives more acceptable - but I think it is better as it is.--Noe (talk) 09:04, 14 July 2009 (UTC)[reply]

Improving the Independence Section

Someone ought to expand the discussion of independence to n terms. For instance, three events E, F, G are independent iff:

P(EFG)=P(E)P(F)P(G),

P(EF)=P(E)P(F)

P(EG)=P(E)P(G)

P(GF)=P(G)P(F)

And so on. Verbally, every combination k (as in n choose k) of the n events (k=2,3,...,n), must be independent for ALL of them to be independent of each other. Most textbooks I've seen include independence definitions for more than two events.

—The preceding unsigned comment was added by 171.66.41.25 (talk • contribs).

This article is only concerned with independence as far as it relates to conditional probability. The general case is covered in the article on statistical independence. --Zundark 21:27, 20 July 2007 (UTC)[reply]

WikiProject class rating

This article was automatically assessed because at least one WikiProject had rated the article as start, and the rating on other projects was brought up to start class. BetacommandBot 03:52, 10 November 2007 (UTC)[reply]

P(A | B,C) etc

The article is all about the conditional probability of A given B. What about the probability of A given B AND C? (Plus extensions to more variables.)

Maybe the answer is blindingly obvious to people who know about the subject, but to an ignoramus with only three numerate degrees to his name, who nevertheless finds probability theory the most counter-intuitive maths he's ever studied, it's as clear as mud.

--84.9.73.211 (talk) 18:17, 1 February 2008 (UTC)[reply]

A, B and C would be "events". An event is any subset of the "sample space" U, i.e. the set of all possible outcomes. What you call P(A|B,C) or P(A|B and C) would be

P(A|B\cap C)

, i.e. the probability of A given the event

B\cap C

has happened. Here,

B\cap C

is the intersection of B and C, i.e. the event that happens if both B and C happen at the same time. Confused? Try reading this again, with the following dicing events in mind:

U={1,2,3,4,5,6}

A={X even}={2,4,6}

B={X>3}={4,5,6}

C={X<6}={1,2,3,4,5}

B\cap C

={4,5}

A\cap B\cap C

={4}

Then,

P(A|B\cap C)={\frac {P(A\cap B\cap C)}{P(B\cap C))}}={\frac {1/6}{2/6}}={\frac {1}{2}}

.

As this is equal to P(A), in this case A happens to be independent of

B\cap C

.

Did that help?--Noe (talk) 20:01, 1 February 2008 (UTC)[reply]

Well, partially, but I'd hoped for a formula in terms of conditional and marginal probabilities. Moreover, I think bringing time into it confuses the issue. For example, suppose A = "It rains today", B = "It rained yesterday", C = "It rained the day before yesterday". Clearly the events described in B and C can't happen "at the same time" in any sense, although the propositions B and C can both be true. P(A|B,C) then means "the probability that it rains today knowing that it rained on the previous two days". OK, suppose I know P(A), the probability of rain on any single day; P(A) = P(B) = P(C) because the day labels are arbitrary. Suppose I also know P(A|B), the probability of rain on one day given rain the previous day; P(A|B) = P(B|C) by the same argument. How do I work out P(A|B,C) in terms of these probabilities (and possibly others)?

You need to supply something like the probability of it raining three days in a row -- P(A,B,C|I); or alternatively, the probability of it raining both the day after and the day before a rainy day -- P(A,C|B).

Does C give you any more information about A than you already have through B ? Maybe it does, maybe it doesn't. It depends, given the data, or the physical intuition, that you're assessing your probabilities from. Jheald (talk) 12:35, 11 March 2008 (UTC)[reply]

BTW, the

\cap

notation used here (intersection) is applicable to sets. When dealing with logical propositions such as the ones above, it's more appropriate to use the notation of conjunction:

\wedge

. However, most publications seem to use the comma notation instead. Also Pr(.) is often used nowadays for a single probability value, to distinguish it from p(.) or P(.) for a probability density. So Pr(A|B,C) would be my preference.

--84.9.94.255 (talk) 00:02, 11 March 2008 (UTC) (formerly 84.9.73.211)[reply]

Merge with Marginal distribution

The Marginal distribution article does not present enough information to stand on its own. It should be merged into this article.

Neelix (talk) 14:52, 13 April 2008 (UTC)[reply]

I disagree. Marginal distribution should be expanded. Michael Hardy (talk) 15:47, 13 April 2008 (UTC)[reply]

Oppose, per Michael Hardy. Marginal distribution is a sufficiently important and distinctive idea that it deserves its own article. Plus it's a rather different thing from a conditional distribution. Jheald (talk) 16:06, 13 April 2008 (UTC)[reply]

Maybe I should expand on this a bit. "Marginal probability" is a rather odd concept. The "marginal probability" of an event is merely the probability of the event; the word "marginal" merely emphasizes that it's not conditional, and is used in contexts in which it is important to emphasize that. So the occasions when it's important to emphasize that are very context-dependent. For those reasons I can feel a certain amount of sympathy for such a "merge" proposal. But on the other hand, just look at the way the concept frequently gets used, and that convinces me that it deserves its own article. Wikipedia is quite extensive in coverage, and it's appropriate that articles are not as clumped together as if coverage were not so broad. Michael Hardy (talk) 18:05, 13 April 2008 (UTC)[reply]

That makes does make sense. I will remove the merge suggestion. The marginal distribution article, however, still requires expansion. I will place a proper notice on that article. Neelix (talk) 18:10, 13 April 2008 (UTC)[reply]

First impression

I feel that the whole page needs rewriting. The following statement, for example, cannot be a definition because it contains many implications and does not make sense when taken alone:

Marginal probability is the probability of one event, regardless of the other event. Marginal probability is obtained by summing (or integrating, more generally) the joint probability over the unrequired event. This is called marginalization. The marginal probability of A is written P(A), and the marginal probability of B is written P(B). —Preceding unsigned comment added by 207.172.220.58 (talk) 15:42, 14 May 2008 (UTC)[reply]

That paragraph was not optimally clear; I've tried to rewrite it, but there ought to be a section on Marginal probability in the main body of the article. An example with a table of joint probabilities in which the margins are the marginal probabilities might help to clarify this; something like:

     B1   B2   B3  | TOT
A1   23%  17%  31% |  71% 
A2   16%   4%   9% |  29%
-------------------+-----
TOT  39%  21%  40% | 100%

but preferably with something concrete, meaningful, and realistic, instead of the abstract Ai and Bj. The totals, given in the margins, are marginal probabilities. --Lambiam 09:18, 19 May 2008 (UTC)[reply]

Distributions and variables?

The first sentence of the lead says:

This article defines some terms which characterize probability distributions of two or more variables.

I think this sentence is superfluous, and makes things harder than they need be. One can teach good parts of probability theory, including conditional probability, without ever mentioning distributions or variables. E.g., rolling a die, you have a space U={a,b,c,d,e,f} (representing 1,2,3,4,5,6, but I use letters to emphasize that I'm NOT introducing a random variable taking numerical values), a simple probability function P(i)=1/6 for all i in U, events like A and B being subsets of U, an extended probability function P(A) (in this symmetrical case, P(A) = n(A)/n(U) where n counts elements in a set). With A = {even} = {b,d,f} and B = {more than four eyes} = {e,f}, you can find P(B|A), say.--Noe (talk) 08:55, 18 September 2008 (UTC)[reply]

I completely agree. 58.88.53.246 (talk) 16:47, 27 September 2008 (UTC)[reply]

Conditioning on a random variable

What does it mean to condition on a continuous random variable? The definition given here does not seem to extend to such cases. 58.88.53.246 (talk) 16:22, 27 September 2008 (UTC)[reply]

Other Considerations Section

I am removing the "other considerations" section. The full text of the section was as follows:

If B is an event and P(B) > 0, then the function Q defined by Q(A) = P(A|B) for all events A is a probability measure.

Many models in data mining can calculate conditional probabilities, including decision trees and Bayesian networks.

These remarks are stated completely out of context and I feel they do not belong here, at least without furhter explanation. Instead, I have moved the remark about P(A|B) being a probability measure to the article on probability spaces, where it fits more naturally. I have simply removed the remark about data mining entirely; perhaps a section on "applications of conditional proability" (or bayesan theory) could use it, but as an isolated remark it seems irrelvant. Birkett (talk) 08:04, 16 December 2008 (UTC)[reply]

condition should be called C not an A (in which case non condition is called C)

I think it would be much cleaner to keep condition as C. I know letters don't matter in math and it is contextual but nevertheless ...
It made me little confused when I read it first time. 86.61.232.26 (talk) 15:54, 1 January 2009 (UTC)[reply]

Specific properties of sets ?

Hi all. Any source I am picking to look up conditional probability doesn't mention anything about the properties that events $A$ and $B$ need to pertain to. That is, it is always stated that it can be any $A$ and $B$ .

Now since events are mere sets of elementary outcomes of that probability space $\Omega$ , wouldn't it make sense to add the restriction that for $P(A|B)$ , $A$ needs to be a subset of $B$ since otherwise an elementary outcome not included in $B$ wouldn't be altered in its probability of occurring when $B$ had occurred?

Also see page 134 of this book: free book on probability

Don't mind me if I'm wrong ... totally not an expert.

--Somewikian (talk) 15:32, 23 January 2009 (UTC)[reply]

I don't get your point. What exactly would be bad about the set of probability space A not being a subset of B?MartinPoulter (talk) 17:05, 23 January 2009 (UTC)[reply]

I am not certain whether by 'probability space A' you mean the same thing as 'event A'. Anyways, I didn't read the aforementioned page in that textbook carefully enough (check it out yourself: textbook p. 134 or p. 142 in the pdf file I gave a link to above). The thing they are doing is deriving the equation for conditional probability and they do so using the condition that for an elementary outcome

\omega _{i}

the conditional probability after occurrence of an event

E

that does not include said elementary outcome

\omega _{i}

is set to zero. This might sound more confusing than it actually is ... please check out the textbook I mentioned and let me know whether you think including this here would make sense (I find the derivation given in that textbook pretty nice and easy to grasp ... an I am a total amateur).--Somewikian (talk) 21:23, 23 January 2009 (UTC)[reply]

An elementary outcome not included in

B

WILL BE altered in its probability of occurring when

B

had occurred. Namely, its probability will VANISH! —Preceding unsigned comment added by 79.176.243.81 (talk) 19:27, 23 January 2009 (UTC)[reply]

You're correct, but it might just be because the equation for conditional probability is derived and defined that way.--Somewikian (talk) 21:24, 23 January 2009 (UTC)[reply]

But this is quite natural. In the light of the new information, that outcome becomes impossible (known not to happen). What else could be its new probability, if not zero?

Logically thinking I agree with you absolutely ... I am just someone that likes his definitions right and complete. I think I will just add a section on deriving the equation given in this article - no harm in that I hope. --Somewikian (talk) 08:21, 24 January 2009 (UTC)[reply]

It is grossly wrong to say that A must be a subset of B in order that P(A|B) exist. If you were to insist on that, you would be throwing out MOST of the situations in which conditional probability is used. And also most of the situations in which you see it in elementary exercises in textbooks. Michael Hardy (talk) 17:12, 24 January 2009 (UTC)[reply]

Oh please I already said I got the section in that textbook I was referring to wrong. Did you just not see that or did you just have to make a point.

According to that textbook: the derivation of

P(A|B)

includes setting for each elementary event/occurrence

\omega _{i}\not \in B

the conditional probability to zero, as in

P(\omega _{i}|B)=0\forall \omega _{i}\not \in B)

.

Again though: I am not a stats buff. I am a complete amateur at stats - so please go and blame the authors of that book and not me since I am just citing their work.--Somewikian (talk) 17:36, 24 January 2009 (UTC)[reply]

There, I added a section on deriving the equation ... bash me for errors and please correct them while you're at it. --Somewikian (talk) 08:28, 25 January 2009 (UTC)[reply]

Question

IF A and B are mutually exclusive event,are they independent event? —Preceding unsigned comment added by 203.78.9.149 (talk) 09:05, 3 February 2009 (UTC)[reply]

Usually not. Drawing a card from a 52-card deck of playing cards, "Aces" and "Kings" are mutually exclusive:

P(A)=4/52; P(K)=4/52; P(A and K)=0; P(A|K)=P(K|A)=0;

but "Aces" and "Hearts" are independent:

P(A)=4/52; P(H)=13/52; P(A and H)=P(Ace of Hearts)=1/52=P(A)*P(H); P(A|H)=1/13=P(A); P(H|A)=1/4=P(H);

and "Aces" and "Unicorns" (which do not exist) are both mutually exclusive and (arguably) independent:

P(A)=4/52; P(U)=0; P(A and U)=0=P(A)*P(U); P(A|U) undefined(?); P(U|A)=0=P(U).

--Noe (talk) 14:16, 3 February 2009 (UTC)[reply]

Reducing sample space

The related Dutch article about conditional probability states:

In probability theory we use the term 'conditional probability' if we know that an event, say B, has happened, by which the possible outcomes are reduced to B.

However, on this page it is defined as the probability of some event A, given the occurrence of some other event B, without saying anything about statistical dependency. Further on it says that if A and B have no impact on the probability of each other, events A and B are independent, which means that P(A|B) = P(A).

This doesn't say anything about the situation being conditional or not; it only says that probabilities are in both perspectives the same. "given the occurrence of some other event B" is a perspective rather than a (limited) situation.

Furthermore, there is a fundamental difference between reducing the sample space and affecting probabilities. Throwing two dice don't affect each other's probabilities, but the total sample space is reduced by knowing the outcome of one die.

What is your opinion about the given definition on the Dutch page? Heptalogos (talk) 10:52, 13 February 2009 (UTC)[reply]

Question

Given P(B|A) = 1, and P(B|~A)=p, can I find P(A|B) without knowing anything about P(A) or P(B)? I have an algorithm, which if correct, gives the correct result all the time. If incorrect, it gives the correct result 10% of the time. It seems that if I get a correct result, I should be able to have some confidence that the algorithm is correct. Intuitively, I think it should be possible, but I can't seem to formulate it right. —Preceding unsigned comment added by 71.193.61.249 (talk) 17:21, 10 July 2009 (UTC)[reply]

Short answer: no, it depends on P(A). Longer answer: See Bayes theorem or ask at Wikipedia:Reference desk/Mathematics. —3mta3 (talk) 17:47, 10 July 2009 (UTC)[reply]

Clarity

This article would be very confusing to someone who is not mathematically minded (e.g. me). When reading the article I was confronted with a large wall of formulae with no clear explanation of what they meant. There needs to be a section in this article with a clear explanation with examples of what conditional probability is and how it can be used by a non mathematically minded person without the use of complex formulae. —Preceding unsigned comment added by 79.74.192.244 (talk) 10:38, 3 August 2009 (UTC)[reply]

What does this notation mean and does it make sense?

P(X\in A\mid Y\in B)={\begin{cases}{\frac {\int _{y\in B}\int _{x\in A}f_{X,Y}(x,y)\,dx\,dy}{\int _{y\in B}\int _{x}f_{X,Y}(x,y)\,dx\,dy}}&{\text{if }}B{\text{ has positive measure}}\\\\{\frac {\sum _{y\in B}\int _{x\in A}f_{X,Y}(x,y)\,dx\,dy}{\sum _{y\in B}\int _{x}f_{X,Y}(x,y)\,dx\,dy}}&{\text{if }}B{\text{ has measure zero}}\end{cases}}

Does the notation above mean the following?

P(X\in A\mid Y\in B)={\begin{cases}{\frac {\int _{y\in B}\int _{x\in A}f_{X,Y}(x,y)\,dx\,dy}{\int _{y\in B}\int _{x\in \Omega }f_{X,Y}(x,y)\,dx\,dy}}&{\text{if }}B{\text{ has positive measure}}\\\\{\frac {\sum _{y\in B}\int _{x\in A}f_{X,Y}(x,y)\,dx\,dy}{\sum _{y\in B}\int _{x\in \Omega }f_{X,Y}(x,y)\,dx\,dy}}&{\text{if }}B{\text{ has measure zero}}\end{cases}}

If not, what does it mean?

If B is an uncountable set of measure 0 and all the terms of the sum are 0, then what is meant? I don't see how the proposed definition (if a definition is what it is supposed to be) makes sense. Michael Hardy (talk) 11:00, 9 September 2009 (UTC)[reply]

I agree, it doesn't make sense in its present form. Moreover, trying to define such a quantity misses the whole point of the Borel–Kolmogorov paradox (i.e. that one should condition on a sigma algebra, not an event). I would suggest using the formal definition on Conditional_expectation#Definition_of_conditional_probability (though I was the one who wrote it, so am somewhat biased).–3mta3 (talk) 12:10, 9 September 2009 (UTC)[reply]

Being equally biased, I want to say that I did my best explaining all that in Conditioning (probability). Boris Tsirelson (talk) 13:55, 9 September 2009 (UTC)[reply]

I think Michael Hardy meant

P(X\in A\mid Y\in B)={\begin{cases}{\frac {\int _{y\in B}\int _{x\in A}f_{X,Y}(x,y)\,dx\,dy}{\int _{y\in B}\int _{x\in \Omega }f_{X,Y}(x,y)\,dx\,dy}}&{\text{if }}B{\text{ has positive measure}}\\\\{\frac {\sum _{y\in B}\int _{x\in A}f_{X,Y}(x,y)\,dx}{\sum _{y\in B}\int _{x\in \Omega }f_{X,Y}(x,y)\,dx}}&{\text{if }}B{\text{ has measure zero}}\end{cases}}

In any case I think the approach to justifying the measure zero case using the limit of conditioning on a set that shrinks to the required set, as in Conditioning (probability) should be said explicitly as it understandable without measure theory. Melcombe (talk) 16:28, 9 September 2009 (UTC)[reply]

Yes, but let us not forget that the limit depends crucially on the choice of the shrinking sequence of non-negligible sets. (This is also the origin of the Borel–Kolmogorov paradox.) In some lucky cases we have a "favorite" sequence; in other cases we do not. Boris Tsirelson (talk) 17:32, 9 September 2009 (UTC)[reply]

Another problem with such approach is that it can lead to a non-regular conditional probability (also generating paradoxes); see Talk:Regular conditional probability#Non-regular conditional probability. Boris Tsirelson (talk) 17:36, 9 September 2009 (UTC)[reply]

I am the guilty party for the erroneous

dy

terms in the discrete case. Likewise, I agree the notation

\int _{x\in \Omega }

is much clearer than just plain

\int _{x}

. As for the Borel–Kolmogorov paradox, yes, "the relevant sigma field must not be lost sight of", to quote Billingsley. If there is a way to make the statement rigorous that would be great. Thanks all who've worked on these articles. Btyner (talk) 01:08, 10 September 2009 (UTC)[reply]

I do not know what was meant initially, but I do know one case that really leads to formulas of this kind. Let Y be a random variable whose distribution is a mix of an absolutely continuous part (that is, having a density) and a discrete part (that is, a finite or countable collection of atoms) (thus, no singular part is allowed), while X — absolutely continuous. Now, if a measurable subset B of the real line contains no atoms of Y and is of positive probability (w.r.t. Y) then only the absolutely continuous part is relevant, and the formula with integrals holds. If, however, B is of zero Lebesgue measure but still of positive probability w.r.t. Y (thus, contains at least one atom) then only the discrete part is relevant, and the formula with a sum over atoms y holds. Note however that Omega will not appear in the formulas; integrals are on the real line (and its subsets). Boris Tsirelson (talk) 08:21, 10 September 2009 (UTC)[reply]

(Unindenting for ease) I am unclear about the supposed relation between the various articles but, assuming that "Conditioning (probability)" is aimed at those who work at the level of measure theory, can this one (conditional probability) be aimed at those looking for something simpler? For example in the above formula, can "measure zero" be replaced by "probability zero". The text here is making enough assumptions that the complicated situations Boris Tsirelson refers to cannot occur. I think what is needed here is just a pointer that more complicated cases are discussed in the other articles. Melcombe (talk) 09:18, 10 September 2009 (UTC)[reply]

As for me, "Conditioning (probability)" is aimed to connect three levels. Boris Tsirelson (talk) 10:58, 10 September 2009 (UTC)[reply]

Here is an instructive example.

First, the conditional probability of the event

Y>0

given that Y belongs to the two-point set

\{-a,2a\}

(where a>0) is

P(Y>0\mid Y\in \{-a,2a\})={\frac {2f_{Y}(2a)}{f_{Y}(-a)+2f_{Y}(2a)}}.

If you (the reader) believe that it is just

{\frac {f_{Y}(2a)}{f_{Y}(-a)+f_{Y}(2a)}}

, then substitute it into the total probability formula and observe a wrong result for the unconditional probability

P(Y>0).

Or alternatively, think about an infinitesimal interval

(a,a+da)

and the corresponding set

(-a-da,-a)\cup (2a,2a+2da).

(You see, "the relevant sigma field must not be lost sight of", indeed.)

Second, the conditional density of X (under the same condition on Y) is

{\frac {f_{X,Y}(x,-a)+2f_{X,Y}(x,2a)}{\int (f_{X,Y}(x,-a)+2f_{X,Y}(x,2a))\,dx}}.

Boris Tsirelson (talk) 10:51, 10 September 2009 (UTC)[reply]

Thus, the naive summation over points of B (without special weights) does not work even under most usual assumptions on the joint distribution. As I like to say it, a point (of a continuum) is not a measurement unit; thus, it is senseless to say that one point is just one half of a two-point set. Sometimes it is two thirds (or another fraction)! Boris Tsirelson (talk) 11:09, 10 September 2009 (UTC)[reply]

I don't follow your choice of case to consider, but it seems you are saying that the limit of

{\frac {\sum _{y_{i}\in B}f_{X,Y}(x,y_{i})\,\delta y_{i}}{\sum _{y_{i}\in B}\int _{x\in \Omega }f_{X,Y}(x,y_{i})\delta y_{i}\,dx}}

as the δy_i approach zero, depends on their relationship as they approach zero, which looks correct. But can the formula in the article be saved, sensibly, by restricting B to a single point. Melcombe (talk) 12:50, 10 September 2009 (UTC)[reply]

Yes, this is what I say. Yes, for a single point the formula is just a formula from textbooks (rather than someone's original research or error), and can (and should) be saved. With some stipulations about "almost everywhere" if you want it to be rigorous, or without, if this is not the issue. Boris Tsirelson (talk) 14:17, 10 September 2009 (UTC)[reply]

Now about my case to consider. I mean that we do not observe a random variable Y but only such a function of it:

A={\begin{cases}-Y&{\text{if }}Y<0,\\0.5Y&{\text{if }}Y>0.\end{cases}}

Boris Tsirelson (talk) 14:23, 10 September 2009 (UTC)[reply]

I have replaced the text with the above conclusion as to what is wanted in the definition section. Unfortunately the overall structure is now poor because there is now something like a "derivation" in the definition section, and then a derivation section. Melcombe (talk) 14:52, 21 September 2009 (UTC)[reply]

Replacement for inaccurate bit

Suggested replacement for incorrect text (but I was hoping someone would make a good change themselves).--

For example, if X and Y are non-degenerate and jointly continuous random variables with density ƒ_{X,Y(x, y) then, if B has positive measure,}

P(X\in A\mid Y\in B)={\frac {\int _{y\in B}\int _{x\in A}f_{X,Y}(x,y)\,dx\,dy}{\int _{y\in B}\int _{x\in \Omega }f_{X,Y}(x,y)\,dx\,dy}}.

The case where B has zero measure can only be dealt with directly in the case that B={y₀}, representing a single point, in which case

P(X\in A\mid Y=y_{0})={\frac {\int _{x\in A}f_{X,Y}(x,y_{0})\,dx}{\int _{x\in \Omega }f_{X,Y}(x,y_{0})\,dx}}.

It is important to note that if A has measure zero the conditional probability is zero. An indication of why the more general case of zero measure cannot be dealt with in a similar way can be seen by noting that that the limit, as all δy_i approach zero, of

P(X\in A\mid Y\in {\text{ one of }}(y_{i},y_{i}+\delta y_{i}))={\frac {\sum _{y_{i}}f_{X,Y}(x,y_{i})\,\delta y_{i}}{\sum _{y_{i}}\int _{x\in \Omega }f_{X,Y}(x,y_{i})\delta y_{i}\,dx}}

depends on their relationship as they approach zero. Melcombe (talk) 12:49, 11 September 2009 (UTC)[reply]

Much improved, but the last bit seems a little strange as y_i seems to denote separate things on the left and righthand sides. Btyner (talk) 01:08, 12 September 2009 (UTC)[reply]

Where is the problem? There are the y_i and the δy_i, which I think is a fairly usual notation. It isn't δ×y_i. The apparent sum over y_i might be better as a sum over i ... this chanfge might help. Melcombe (talk) 09:00, 14 September 2009 (UTC)[reply]

The LHS looks like a function of the pair (y_i, δy_i) is being defined, yet the RHS has no dependence on y_i due to the summation. Btyner (talk) 00:07, 15 September 2009 (UTC)[reply]

The LHS contains "one of", which means the union, and is similar to the summation.Boris Tsirelson (talk) 05:52, 15 September 2009 (UTC)[reply]

I didn't use the union symbol mainly because I was too lazy to look up how to do this, but also because I am somewhat against using over-sophisticated maths symbols where something simpler will do, particularly in what can be or should be non-technical articles or sections. We just need to find someything that fits in with the level of the rest of the article. Melcombe (talk) 09:25, 15 September 2009 (UTC)[reply]

And here is an interesting experimental fact: out of two readers, one (me) understood your notation, and one (Btyner) did not. Boris Tsirelson (talk) 12:49, 15 September 2009 (UTC)[reply]

Indeed, it occurred to me late last night that there might be an implicit union there. So

(y_{i},y_{i}+\delta y_{i})

is the open interval from y_i to y_i + δy_i, rather than

\{y_{i},y_{i}+\delta y_{i}\}

, which is how I (wrongly) interpreted it the first time. Anyway it seems to me you'd want the left endpoint to be closed not open. Would it be too cumbersome to write

P(X\in A\mid Y\in \cup _{i}[y_{i},y_{i}+\delta y_{i}))={\frac {\sum _{i}f_{X,Y}(x,y_{i})\,\delta y_{i}}{\sum _{i}\int _{x\in \Omega }f_{X,Y}(x,y_{i})\delta y_{i}\,dx}}

? Thanks all, Btyner (talk) 00:35, 16 September 2009 (UTC)[reply]

Looks OK. But would it be preferable (more usual?) to use an interval centered on y_i? And perhaps an approximation sign rather than an equals? Melcombe (talk) 09:28, 16 September 2009 (UTC)[reply]

Definitely prefer

\approxeq

. As for the interval, I think uncentered is traditional because it is cleaner for proofs involving the cdf. Btyner (talk) 19:15, 19 September 2009 (UTC)[reply]

I propose adding a link in external references

The proposed link is http://www.opentradingsystem.com/quantNotes/Conditional_probability_.html

It provides examples of calculations with Conditional probability.

Does anyone object?

Wrong?

Undefined or Indeterminate?

Use of 'modulus signs' and set theory

Valid for continuous distributions?

An example?

Improving the Independence Section

WikiProject class rating

P(A | B,C) etc

Merge with Marginal distribution

First impression

Distributions and variables?

Conditioning on a random variable

Other Considerations Section

condition should be called C not an A (in which case non condition is called C)

Specific properties of sets ?

Question

Reducing sample space

Other articles

Question

Clarity

What does this notation mean and does it make sense?

Replacement for inaccurate bit

I propose adding a link in external references