Talk:Simpson's paradox

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
WikiProject Mathematics (Rated B-class, Mid-priority)
WikiProject Mathematics
This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Mathematics rating:
B Class
Mid Priority
 Field:  Probability and statistics
WikiProject Statistics (Rated B-class, Mid-importance)
WikiProject icon

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

B-Class article B  This article has been rated as B-Class on the quality scale.
 Mid  This article has been rated as Mid-importance on the importance scale.

Changes to first few paragraphs[edit]

I'd like to change the first few paragraphs of this article to make it friendlier to folks afraid of math, and was wondering what other people thought. Here's a possibility:

Simpson's paradox is a statistical paradox described by E. H. Simpson in 1951, in which the accomplishments of several groups seem to be reversed with the groups are combined. This seeminhgly impossible result is encountered surprisingly often in social science and medical statistics.
As an example, suppose two people, Ann and Bob, who are let loose on Wikipedia. In the first test, Ann improves 60 percent of the articles she edits while Bob improves 90 percent of the articles he edits. In the second test, Ann improves just 10 percent of the articles she edits while Bob improves 30 percent.
Both times, Bob improved a much higher percentage of articles than Ann - yet when the two tests are combined, Ann has improved a much higher percentage than Bob!
The result comes about this way: In the first test, Ann edits 100 articles, improving 60 of them, while Bob edits just 10 articles, improving 9 of them. In the second test, Ann edits only 10 articles, improving 1 of them, while Bob edits 100 articles, improving 30 of them. When the two tests are added together, both edited 110 articles, yet Ann improved 69 of them (63 percent) while Bob improved only 40 of them (36 percent)!
Seems reasonable enough to me, although I wouldn't say "accomplishments" for "successes". "Success" in statistical jargon is not necessarily a positive thing! How about "ratings" instead?
I presume you are intending to leave the remaining paragraphs unchanged? -- Securiger
That was my thought, yes. So I'll go ahead and do this, then. DavidWBrooks 13:13, 17 Feb 2004 (UTC)
(However, looking it over again, I'll do my arithmetic correctly before I post it! Oops ... DavidWBrooks)
Is it a problem that the example explicitly refers to Wikipedia? (I'm thinking WP:SELF.) Avram 21:24, 10 March 2006 (UTC)


I am not a frequent editor but shouldn't description come before the examples and not the other way around? —Preceding unsigned comment added by (talk) 12:08, 28 November 2010 (UTC)

Nice work[edit]

I have recently been browsing the logic & game theory articles. This is the best I have seen so far. Congratulations to all concerned.

John Moore 309 12:36, 24 April 2006 (UTC)

I just read this article too, having come from Texture filtering and I am very impressed! This article is brilliant! -- 15:48, 27 January 2007 (UTC)

The same paradox?[edit]

I wonder if this is the same paradox and if it could be used as an example. I find it very easy to understand — and from real life.

Assume a population with 50% men and women and in both groups competence is spread in the same way. Imagine a situation where women are required to have more competence to get a promotion to management. You will then notice that women on the management level are more competent than male managers and that women in sub-management are more competent than men on the same level. This seems paradoxical at first considering that, on the whole, women and men are equally competent. Samulili

It's a nice example. In order to convince myself (and perhaps others) that it's the same paradox, I'll now assume that on average, the women are slightly less competent than the men (no offence, just to sharpen the paradox and make it clearer that Simpson is involved), and I'll add some numbers:
Suppose we have 100 men and 100 women. 18 of the men are highly competent, and 14 of them are in the management. Of the 82 less competent men, 6 are in the management. 17 of the women are highly competent, but only 8 of them are in the management. Of the 83 less competent women, 2 are in the management. Then, of the women in the management, 8/10=80% are highly competent, and of the sub-management women, 9/90=10% are highly competent. Of the men, only 14/20=70% of those in the management group are highly competent, and only 4/80=5% in the sub-management group are highly competent. So, in both groups, more of the women than of the men are highly competent, but combined, only 17/100=17% of the women are highly competent, while 18/100=18% of the men are.
Conclusion: This is indeed a Simpson paradox, and the only change compared to that suggested above is that I made it a little sharper by making the women less competent over all instead of just equally competent. However, I like the original better, and I think someone should go ahead and add it to the article. I'm afraid it takes skills beyond mine to write it in a simple way that makes it clear that it is a Simpson's paradox.--Niels Ø 20:04, 2 May 2006 (UTC)

How is this a paradox?[edit]

For Ann, the time that she royally screwed up barely counts, while the time that she did poorly counts the most. For Bob, the time that he royally screwed up hugely affected his total, while the time that he did amazing barely counts at all. I don't quite see why the results are surprising. Anyone care to enlighten me?

It all makes sense in the end, but it's still initially surprising for most people who are not aware of the explanation or suspect it. If you only know the partial percentages, then the total percentages would come as a surprise to most people. Obviously, once the weights are introduced, the initial surprise is exchanged for comprehension, but then a paradox is only a seemingly self-contradictory statement anyway, so I see nothing wrong with calling this a paradox. -Kvaks 01:09, 2 September 2005 (UTC)
Its not strictly a paradox, since there is a straight forward solution. But, its widely known by that name, so we ought to keep it. --best, kevin ···Kzollman | Talk··· 04:19, September 2, 2005 (UTC)
I do not support the idea that the phenomenon is not "really" a paradox. Many good paradoxes are based on representing a situation in such a way that a false conclusion seems obvious.--Niels Ø 08:18, 6 October 2006 (UTC)
Agreed with Niels. The key point to remember is that in the baseball batting average example, there are large differences in the number of at-bats between years. Rock8591 (talk) 06:18, 9 August 2009 (UTC)

One of the finer Wiki entries[edit]

The storytelling conceit, complete with sly reference to those other Simpsons, "Bart" and "Lisa," works well for me. This kind of explanation helps me in explaining a concept to others, even as I work to fully grasp it myself. The inclusion of the Wikipedia within the definition does not seem overly self-referential, as one observer has worried. Entries like this are the reason I seek out Wikipedia's take on things before looking to other, traditional sources. Thanks for an entertaining and elucidating entry! Matthew Treder 18:42, 2 May 2006 (UTC)

Agreed. The examples are clear, well written, and logical. And the references to Bart & Lisa Simpson are not only clever and fun, they also make it EXTREMELY easy for many people to remember this phenomenon as well as its associated name. If we name them Dick & Jane it would be far less memorable. How great it is when practicality and humor intersect! Jon Miller
Indeed! --WikiSlasher (talk) 13:01, 11 December 2007 (UTC)
The first (graphical) example could be made a whole lot clearer if the symbols x and y and relationships between them were explicitly defined. I would be pleased to contribute to this cause, but -- well, I am still bewildered by it. The other (real life) examples work quite well and make the fictitious, original, self-referent narrative unnecessary. Finally, if a vote gets taken, please cast mine in favor of "fallacy" -- not to exclude "paradox" but to strengthen the importance of this entry. (talk) 19:46, 2 January 2011 (UTC)
Why did I not get this earlier... mattbuck (talk) 14:22, 11 December 2007 (UTC)

I'm new to commenting here, so I apologize if I'm doing this wrong.

The question was raised as to whether or not it's appropriate for this article to reference Wikipedia [WP:Self]. I believe it may be, but should certainly be discussed. The point of avoiding self references, as I read that guideline, is to not use phrases such as "elsewhere on this site" or "in another Wikipedia article". The point is NOT to pretend that Wikipedia doesn't exist.

The article could reference bowling or mowing lawns or a great host of other activities where the characters' performance can be quantified. I suspect the Wikipedia reference was used simply because the author assumes that those reading it will be familiar with the process.

However, I don't believe that the act of editing Wikipedia articles is a good example of much anything, because most people I know who read Wikipedia have never edited anything. I've been reading for years and only today even created an account to post anything. So the example took a little more effort for me to understand than many other possible analogies could have.

And, continuing that thought and going back to the self reference guideline, the plan as I have understood it is to eventually do a printed Wikipedia. Regardless of the form, any time this article appears outside the website the chances of the reader understanding the example become greatly diminished.

In other words, I like the example used here, but a different example may be more comprehensible and practical.

Ha! That's funny! Thank's for putting Bart and Lisa in the Simpson's paradox. -- 03:02, 26 August 2006 (UTC)

A word[edit]

The Lisa-Bart example ends in this sentence: But it is possible to retell the story so that it appears obvious that Bart is more diligent. Would it not be more natural to say "tell" instead of "retell", since it is the original statement of the situation that appears to have this conclusion?--Niels Ø 08:18, 6 October 2006 (UTC)

Good poinmt. Thanks. I've changed that line to something that I think is even better: But it is possible to have told the story in a way which would make it appear obvious that Bart is more diligent. --Keeves 12:13, 6 October 2006 (UTC)

The kidney case[edit]

I expanded the text on the two factors at the end of the section to relate more specifically to the medical example. Reading what I've written, it seems natural to ask: Why did doctors give the inferior treatment B to the milder cases, when A is better in those cases too? I have not consulted the references on this case story, but perhaps someone who has (or will) can answer my question. I imagine one of two answers: (i) Before this particular investigation, they did not know that B was inferior even in the milder cases. (ii) Treatment A is more expensive, and is therefore primarily given to those patients who need it the most. In fact, if there are no other confounding variables involved, and if A is more expensive than B, then, within a given budget, the largest number of cures is obtained by treating as many as possible from the large-stone-group with A.--Niels Ø 13:29, 13 October 2006 (UTC)

Thanks for your changes, it reads more clearly. I don't have access to the original study, but from the review and title it appears to compare surgery, ultrasound and/or using catheters. Unsurprisingly the open surgery (treatment A) is the most effective, and probably is the most the expensive with the greatest post-treatment complications. TobyK 13:36, 31 October 2006 (UTC)

Suggested addition to aid paradoxical comprehension[edit]

existing section under 'Explanation by example' subtitle

[Who is more accomplished? Lisa and Bart's mutual friends think Lisa is better—her overall success rate is higher. But it is possible to have told the story in a way which would make it appear obvious that Bart is more diligent.]

append with the addition of

+ [However, some will note that the use of statistical analysis to present a biased view is not uncommon, for example in politics. On close inspection, one may find that Bart's edits are of a higher quality, elucidating complex subjects poorly understood by the general populace. Although Lisa and Bart's mutual friends think Lisa is better, history may judge Bart's legacy to humanity to be more significant.]

This may help answer those who fail to comprehend the paradoxical nature

Teeteetee 09:51, 2 March 2007 (UTC)

How so? The quality of the edits is unrelated to the paradox we're dealing with here; it's entirely about the number of edits.--Niels Ø (noe) 09:56, 2 March 2007 (UTC)
Extracted from the article's sub-section. . . .
" worth of work/Success/managed/achieved successful/worse/we feel/disappointed/accomplished/mutual friends think/better/diligent "
Are these "entirely about the number of edits" ? Teeteetee 19:34, 4 March 2007 (UTC)
OK' I didn't put that as clearly as I should have. The point is, we need not distinguish very good edits from minor improvements; that's not what the example is about. Whether they elucidate complex subjects is utterly irrelevant. However, the words accomplished and diligent that you quote may be misleading for the same reason: They seem to suggest some edits not merely improve articles, but that they display particular diligence, which (though of course true) is, as I said, utterly irrelevant.--Niels Ø (noe) 20:25, 4 March 2007 (UTC)
I do not understand your meaning.
I have tried several times to understand.
If you could avoid criticising existing aspects of the article I might better understand.
Do you agree with the following statement ?
"If Bart only edited one article (and that one edit brought about world peace), Lisa's lifetime of editing thousands of articles may statistically appear better (to friends, family, politicians, religious leaders, and others viewing the statistical view), but may be judged by history to be worth less than Bart's one edit."
Teeteetee 11:52, 8 March 2007 (UTC)
Sure, but it's got nothing to do with Simpson's paradox. The Bart-and-Lisa example is solely about the number of edits that were improvements, and the number' that were not. It does not distinguish between large improvements and small improvements.--Niels Ø (noe) 16:24, 8 March 2007 (UTC)
By using "it"(in the sentence above "It does not distinguish..."), I assume you mean Simpson's Paradox.
If so, you appear to be writing "Simpson's Paradox does not distinguish between large improvements and small improvements"
or, put alternatively,
When Simpson's Paradox occurs improvements can be difficult to distinguish.
Teeteetee 17:29, 12 March 2007 (UTC)
If you are seriously suggesting changes to the article, I think you should either be bold and make those changes, or explain clearly at this talk page what you'd like to change, and why. I've no idea what your point is.--Niels Ø (noe) 22:01, 12 March 2007 (UTC)
Thankyou for the advice, but, I was bold on 01March2007. Also, I hoped I had clearly explained my suggestion above (at 09:51, 2 March 2007)
My original article edit can be found here> [4] at the end of the 'Explanation by example' section.Teeteetee 12:31, 13 March 2007 (UTC)

Well, I believe I have made my concerns clear, where as I do not understand what your point is. Do you think your contribution is related to Simpson's paradox, or does it merely offer an alternative angle on the Lisa-and-Bart example, an angle unrelated to Simpson's paradox? Do you actually understand Simpson's paradox, or are you trying to understand it?--Niels Ø (noe) 12:57, 13 March 2007 (UTC)

I believe I understand Simpson Paradox.
I also believe context aids understanding.
I was attempting to provide others with some context. Teeteetee 13:50, 3 April 2007 (UTC)
Then I am at a loss. I am certain I understand Simpson's paradox, and I am certain it (in the Bart-Lisa-example) has nothing to do with distingushing between large and small improvements. The context is clear (wikipedia editing, some edits being improvements, other not). Adding more context - irrelevant to the paradox - will confuse matters by having readers trying to understand how it is relevant. Please explain, what is the point?--Niels Ø (noe) 14:49, 3 April 2007 (UTC)

How is the Electoral College an example of Simpson's paradox?[edit]

In both the Lisa/Bart example and the kidney stones example, there is a 3x2 table with 6 entries. How can the Electoral College data be presented in this way? There are the 2 parties, so that's the "2" dimension. But what is the "3" dimension?

Example the "2" dimension the "3" dimension
Lisa / Bart Lisa / Bart Week 1 / Week 2 / Total
kidney stones Treatment A/B small stones / large stones / together
Electoral College Rep / Dem ??? / ??? / total number of Electoral College votes

--Occultations 21:46, 15 May 2007 (UTC)

I suspect the analogy (the College cannot reproduce the paradox exactly since the outcome in each state is only related to the difference in votes through the sign of the difference, not magnitude. One could not lose the College if every state was won.) is that one can "win" the nationwide popular vote, but under certain circumstances can lose in the College. Baccyak4H (Yak!) 03:07, 16 May 2007 (UTC)
I've removed the Electoral College example, it's not an example of Simpson's paradox. Unless, that is, someone can show how it fits the 3x2 table pattern. --Occultations 12:53, 28 May 2007 (UTC)

Do we need the fake example?[edit]

We have four different real-world examples now, some with statistics. Do we need the "bart/lisa" fake example to explain it any more? At the very least, I'd like to move the real examples up above the pretend one - I think lots of people stop reading when the article lurches into "explaining" mode. - DavidWBrooks 23:41, 22 May 2007 (UTC)

I was about to make an almost identical heading. It's a pretty asinine self-reference in addition to being original research. Milto LOL pia 04:32, 23 May 2007 (UTC)
I agree with the removal of fake examples (as I've just done with the baseball example). This section should be moved below the examples, and then transformed into a general discussion of what may cause the paradox to appear (talking about weighted averages, confounding variables, etc). Schutz 07:12, 23 May 2007 (UTC)
Then I'll do the move, and we can do the transformation later. - DavidWBrooks 10:00, 23 May 2007 (UTC) .. oops, never mind: somebody already did.
You're still welcome to do the transformation now that I have done the move :-) Schutz 13:44, 23 May 2007 (UTC)
But that will require thought and skill - I hoped I could get away with a nice, mindless move. - DavidWBrooks 14:00, 23 May 2007 (UTC)
Too late :-) I'll think about the transformation, but, as you say, it requires quite a bit of thinking first. Before that, I'll add a few more references and reformat the examples, and hopefully (if I can get around to doing it), add 2 images. Schutz 21:27, 23 May 2007 (UTC)

I have readded the example after User:Miltopia removed it, since the consensus above was for now to move the example rather than delete it. We all agree that we have enough real examples and do not need fake examples on top of that; however, this section is the only one that goes beyond giving an example, but also discuss the question of weighted averages. I don't think it is very good, or that it covers everything it should, but at the moment it is better than nothing. If nothing happens with it in the near future, then it can be removed. Schutz 07:44, 24 May 2007 (UTC)

The Bart-Lisa example is pointless and misleading. The whole point of Simpson's paradox is that differences in underlying groups may be causing changes that lead to misleading results when the groups are not taken into account - the underlying groups are important in themselves and must be investigated for a proper analysis. But in the Bart-Lisa case, the underlying groups are 'week 1' and 'week 2'. Why are the success rates of editing being divided into weeks? The only reason for doing so would be that the success rates are changing consistently across weeks for both Bart and Lisa. But I can obvious see no reason why 'week' would be an appropriate grouping factor. This example makes the impression that you should divide your data into different groups for no reason and assess across those meaningless groups - perhaps doing so until you get the answer you want (e.g. Bart should be better than Lisa. We don't see in across both weeks, so we divide into weeks and aha! there we see it. If we hadn't seen it within weeks, maybe we should divide into days...) (talk) 14:39, 18 February 2010 (UTC)
I don't understand this comment, or maybe I misunderstand the definition of Simpson's Paradox. I thought that the term applied to any case where the two group results agreed with each other but disagreed with the aggregate result--regardless of the underlying cause. If so, then the Bart-Lisa example fits the definition perfectly.
And, it was by far the easiest one for me to understand because one didn't need to understand or even know anything beyond the simple data that were presented. After I thought about the Bart-Lisa case for a long time, it suddenly hit me how it is possible--even easy!--for the seemingly paradoxical situation to occur. The underlying cause in the Bart-Lisa case is not systematic (a lurking variable). The cause is random variability. The case is very plausible, since the sample sizes are so low. But does it matter what the source of the paradox is? (Random variability vs. lurking variable?). It seems to me that the paradox is the paradox regardless of the underlying causality. (talk) 23:46, 18 August 2014 (UTC)


Would it be an idea to add Correlation does not imply causation into the 'See also' section? Apologies if this has already been covered, I don't find any references to it. Flex Flint 08:57, 17 July 2007 (UTC)

I'd also suggest that Milo Schield's fine paper "Simpson's Paradox and Cornfield's Conditions" ( be added to the references, and mention Cornfield's conditions somewhere in the main sections. Haruhiko Okumura (talk) 08:42, 14 August 2008 (UTC)

The correlation/causation issue is important in its own right, but has little to do with "Simpson's Paradox." I would suggest removing this part of the text in the extant introduction. Scrooge62 (talk) 18:28, 2 December 2009 (UTC)
Yes I think Correlation does not imply causation should have a link somewhere from this article - if it's not appropriate at any pther point, it should be in "See also".
I think the lead is fine as it stands. Correlation/causation is a much wider topic than Simpson's paradox, but it seems to me the ONLY relevance of Simpson's paradox is that it is ONE of the counterexamples that can be used to reject the intuition saying that correlation DOES imply causation.--Noe (talk) 08:15, 3 December 2009 (UTC)
I would emphatically stress that Correlation does not imply causation is very strongly connected to Simpson's Paradox. Correlation is based on the unconditional (or marginal) relationship between two variables. But causation would be based on their conditional relationship controlling for confounding factors. The fact that a conditional relationship can have the opposite sign of an unconditional relationship is precisely Simpson's Paradox and is also precisely the reason why correlation cannot be taken to imply causation. No two concepts could be more strongly related! -- --Geomon (talk) 06:10, 18 January 2010 (UTC)

Vector vs. Line[edit]

I reverted a diff [5] changing vector to line in one instance. First, the section it's in is called "Vector Interpretation", so referring to vectors is the expected language of that section. Second, the word change was made in only one instance, making the whole paragraph internally inconsistent as it switched from line in the first instance to vector in all other. qitaana (talk) 22:17, 26 February 2008 (UTC)

Low birth weight paradox[edit]

How is this an example of Simpson's paradox? From the information given, I see only a medical "paradox", not a statistical one. (talk) 22:23, 15 May 2009 (UTC)

I agree. It looks like the example states that, given that a child is low birth weight, it has a lower infant mortality rate if born to a smoking mother. It would only be an example of Simpson's paradox if, given the child is born to a smoking mother, it has a lower infant mortality rate if it were low birth weight. JokeySmurf (talk) 05:36, 16 May 2009 (UTC)
I don't see how that would be Simpson's paradox either. If low birth weight meant lower mortality in both smokers and non-smokers, but higher mortality in the population as a whole, that would be an example of Simpson's paradox. (talk) 13:52, 16 May 2009 (UTC)
It's poorly stated, but the paradox is that normal birth weight infants of smokers have about the same mortality rate as normal birth weight infants of non-smokers, and low birth weight infants of smokers have a much lower mortality rate than low birth weight infants of non-smokers, but infants of smokers overall have a much higher mortality rate than infants of non-smokers. This is (of course) because many more infants of smokers are low birth weight, and low birth weight babies have a much higher mortality rate than normal birth weight babies. The reference does explicitly state that it is an example of Simpson's paradox. (talk) 20:18, 8 July 2009 (UTC)
Page updated accordingly. (talk) 07:52, 12 February 2014 (UTC)

Health care disparities[edit]

The newly added section Health care disparities sounds interesting. However, as it stand, I don't think it belongs. EITHER, it should be expanded to make it an illuminating exapmle of the paradox, OR it should be removed or boiled down to at most one sentence and a reference.--Noe (talk) 08:00, 24 September 2009 (UTC)

Stigler's law[edit]

To where it reads, Since Edward Simpson did not actually discover this statistical paradox, I propose to add [note 1: See Stigler's law]. To see how this would affect the over-all appearance of this article, view the proposed revision in my sandbox. --Pawyilee (talk) 02:31, 22 February 2010 (UTC)

There being no objection, I moved it into the article. --Pawyilee (talk) 14:34, 23 February 2010 (UTC)

Kidney stones[edit]

user:DavidWBrooks recently removed the first table in the kidney stone example, which showed only the results when no distinction is made for kidney stone sizes. As the section now stands, I don't find it satisfactory. I think it needs to be made clearer that false conclusions may be drawn when the lurking variable is not identified. One way to clarify this would be to put back the table (reverting half the edit in question), and I'm inclined to do that - but I'll wait and see...-- (talk) 15:37, 7 April 2010 (UTC)

I removed it because it seemed redundant, unnecessary - the current table (it seems to me) shows everything that the first table showed; in fact, it contains that entire table. Listing two different tables made it seem, I thoiught, as if something changed between them, but the second table was merely an expansion. However, if others disagree, then I certainly will bow to the majority. - DavidWBrooks (talk) 17:19, 7 April 2010 (UTC)
YYes, the second table contain all info, but the way the section reads now fails to make an important point clear. The easiest way to fix that is to revert your edit, but I'm sure there are other ways (and probably better ways) to fix it. Feel free.-- (talk) 07:11, 8 April 2010 (UTC)
It seems to me that these sentences following the table make the point clear: "The paradoxical conclusion is that treatment A is more effective when used on small stones, and also when used on large stones, yet treatment B is more effective when considering both sizes at the same time. In this example the "lurking" variable (or confounding variable) of the stone size was not previously known to be important until its effects were included." But perhaps not; perhaps the matter needs to be expanded or clarified. - DavidWBrooks (talk) 13:18, 8 April 2010 (UTC)
What made the fallacy clearer was that the "combined case" and the "obvious" conclusion was stated before the extra information was added and the refined conclusion reached. I think this was a more paedagogical presentation.-- (talk) 18:47, 8 April 2010 (UTC)
We have clarified our disagreement: It struck me as redundant, even a bit confusing. Anybody else have an opinion? - DavidWBrooks (talk) 19:28, 8 April 2010 (UTC)


Although this situation is called Simpson's paradox, this article is very useful in illustrating a fallacy in statistics that can be corrected. Of course, Simpson's paradox goes away when one properly accounts for external variables. For example in the Male/Female admissions lawsuit, the statistics can be shown with a common weighting of departments (apples-to-apples-comparison). If this is done, there is no paradox. —Preceding unsigned comment added by Fulldecent (talkcontribs) 06:38, 14 November 2010 (UTC)

True - like most paradoxes, it's only paradoxical when when described in a misleading way. You can see a paradox as a challenge to find the right way of describing the situation. - Do you suggest a change to the article?-- (talk) 11:04, 14 November 2010 (UTC)

on my removal of "how likely"[edit]

I've just removed the section on "how likely" Simpson's paradox is. The reason for this is that in order to make sense of the statement you need to assume a probability distribution for the entries of a 2x2x2 table (presumably what the section's statement about "assuming certain conditions" was a reference to). My basic argument here is that without a statement of those "certain conditions" the statement is essentially meaningless, so we have to go to the paper to find out what it means.

Omitting explicit reference to a probability distribution in choosing an object "at random" is commonly done in elementary expositions of statistical concepts when the situation is simple enough that the distribution can be inferred from the surrounding context, or there is in some other sense enough "intuition" to suggest a natural choice. Examples like the "Bertrand paradox" show that this it not unproblematic. I would argue that here, there is not enough context or "intuition" to give those without any precise understanding of statistics/probability any sense of what it means to fill in a 2x2x2 table "at random" according to the distribution assumed in the Perlman paper, and it is potentially misleading to present the context-free assertion as if it has enough context to determine an intuitive meaning. (I should point out that the paper itself makes no claims that this distribution is "the only one" worth considering--- nor does it argue, for example, that actual statistical practice in filling in 2x2x2 tables is at all comparable to the model they assume when they calculate the .0166 figure cited here. It just computes various probabilities in a model.) (talk) 06:28, 30 January 2011 (UTC)

Definition needed for "quality modifier"?[edit]

In the Bart/Lisa example, which I found the most helpful example, especially the graph, do we need to define "quality modifier" or at least provide a link to another article? Frankly, I am not sure what is meant by this phrase. How can numbers be qualitatively different? Clearly, this is a use of "quality" that is different than the common use, where quality is usually contrasted quantity. Here's the sentence: "Also when the two tests are combined using a weighted average, overall, Lisa has improved a much higher percentage than Bart because the quality modifier had a significantly higher percentage. Therefore, like other paradoxes, it only appears to be a paradox because of incorrect assumptions, incomplete or misguided information, or a lack of understanding a particular concept."--Bruce Hall (talk) 04:19, 8 March 2011 (UTC)

Presentation of tables[edit]

Except in the sex bias case (where there are more than two departments), many of the exampls presented seem to have exactly the same structure and could be presented in the same format. Presenting the various examples in different formats seems confusing to me. If the point of doing so is that different readers will "get it" from different ways of presenting it, it would be clearer to present THE SAME example in different formats. My suggestion would be to present the various examples in more or less the same way. Personally, I prefer the presentation of the kidney stone example with the "group 1, 2, 3, 4" and the two effects listed. What do you think?-- (talk) 15:46, 8 March 2011 (UTC)

Another example[edit]

DavidWBrooks has reverted my newly added example of Simpson's Paradox, which read as follows:

The National Assessment of Educational Progress average test score in mathematics for American 9-year-old children rose, from 1978 to 2004, by 10.0%. But the average score of white 9-year-olds rose by 10.3%, that of Hispanics by 13.3%, that of blacks by 16.7%, and that of all others by 12.8%.[1] Thus while no racial/ethnic group experienced a gain of less than 10.3%, the children as a whole experienced a gain of only 10.0%, a result that is due to the shift over time in the percentages of the various groups in the total.

His reason for reverting was that (1) it's a weak example since the composite went up, not down, and (2) we really don't need another example. As for (1), I see his point, but I propose adding the following to illustrate the importance of the example:

Jack Jennings of the Center on Education Policy uses these data in asserting[2] that when the composite data are used, "one important trend tends to be overlooked -- namely, the notable gains made by African American and Latino students in reading and math achievement".

As for point (2), I definitely disagree with it -- just five examples are not enough in my opinion. The more examples we have, the more likely a reader is to find one that resonates with him, one that he can latch onto as, for him, a memorable example. Different people will be prone to latch onto different examples, but I'd bet that more people will latch onto the test scores example than, say, the kidney stone research example.

Comments, anyone? Duoduoduo (talk) 21:08, 9 May 2011 (UTC)

Since I removed it, my thinking is obvious! I think (just my opinion) Duoduoduo might be interested in this example at least as much for its intriguing policy implications, rather than as a good example of the paradox. The point of examples in a wikipedia article is to illustrate the concept, not to make an argument for it or to cover all possible bases - and five examples is way more than enough to do that. IMHO, of course! - DavidWBrooks (talk) 22:27, 9 May 2011 (UTC)
I think it's a good real world example that seems to differ from the existing examples because it is not the actions that are changing over time but the composition of the actors. Mathematically it's the same, but I find it intuitively quite different. If space is an issue, I prefer it to the made up Bart/Lisa, Wikipedia-emphasizing example (though I like the thoughtful math put into that example). -- Michael Scott Cuthbert (talk) 15:32, 13 April 2012 (UTC)

How many examples do we need?[edit]

The article has six examples of the paradox appearing in real-world situations, which is IMHO excessive. I'd like to remove this one, because it's the least detailed and informative; in fact, it's kind of confusing:

Health care disparities
An examination of racial differences in the management of localized prostate cancer in Pennsylvania simultaneously revealed that whites were more likely to receive prostate surgery than blacks, that whites and blacks were equally likely to get surgery, and that blacks were more likely to get surgery than whites. This example statistical analysis used hypothetical data. All of the above conclusions were correct, but they reflected answers to subtly different questions that relied on different parsings of the same aggregate data.[15]

Any objections? - DavidWBrooks (talk) 17:38, 23 September 2011 (UTC)

Agree to deleting this example. For one thing, the above text is a travesty of the referenced material. For another, the data in the reference do not in fact provide an example of Simpson's paradox as there is no reversal. Qwfp (talk) 18:01, 23 September 2011 (UTC)
Let's do it, then. *POOF!!!!*- DavidWBrooks (talk) 18:10, 23 September 2011 (UTC)

Civil Right Act of 1964[edit]

"This arose because regional affiliation is a very strong indicator of how a congressman or senator voted, but party affiliation is a weak indicator." This statement is obvious from the chart, and can even be made more formal. Let's say I pick a Senator or Congressman that voted on the Civil Rights Act of 1964 in a uniformly random way, and you have to guess whether they voted for or against the Civil Rights Act. You get to ask one of two questions: "Do they represent a formerly Confederate State?", or "Are they a Republican or Democrat?". Which question do you ask?

Which question you should ask is obvious: asking about the party affiliation is absolutely no use in making your guess, they best you can do after asking this question is to simply guess "yes", which will be correct 70% of the time. On the other hand, if you ask what region of the country they represent, you can be correct 91% of the time: you'll be right 90% of the time by guessing "yes" if they come from the north (which happens 75% of the time), and you'll be right 94% of the time by guessing "no" if they come from the south (which happens 25% of the time). Obscuranym (talk) 15:56, 14 June 2012 (UTC)

I have removed the Civil Rights example, partly because we're getting too many real-world examples - we still have four, all of which are pretty well known - and partly because it's not sourced. The data and analysis may be accurate, but they're not reported elsewhere, as the other examples are. Since we don't really need this many examples - the situation is quite clear that it crops up in reality in many different circumstances - it's no drawback to just get rid of it. - DavidWBrooks (talk) 21:33, 14 June 2012 (UTC)
I think the Civil Rights Act example should be restored. A quick google search confirms that many math websites use this example when discussing Simpson's Paradox and specifically cite this Wikipedia article. This suggests that math educators consider it at least as relevant an example as the others. In fact, the Wikipedia article on the Civil Right Act itself links back to this article. This example should not have been removed without updating that page as well. The example is not original research since the data is taken directly from the CRA Wikipedia article. The text in the example could be improved but the example should be restored. I will update the text and restore the example unless there is a reasonable objection. - Ricklethickets (talk) 11:11, 21 July 2012 (UTC)
If you must, but add a good reference and make sure it's clear, because the previous wordy example wasn't. The websites that "cite" the article seem to be mostly wikipedia scrapers - a very circular argument, at best. To be honest, I think it's a lame example of the Simpson's paradox because the issue it examines is not very clear, a mix of geography and politics that requies a knowledge of that period of American history to seem surprising. The other examples are much more straightforward. - DavidWBrooks (talk) 13:38, 21 July 2012 (UTC)

Suggesting an intial simpler example[edit]

I suggest to put an even simpler and clearer example at the begining like this:

Boys and girls applyed for physics or math scholarship

10 boys

10 girls

2 boys applyed for math => 1 awarded this is 50%

8 boys applyed for physics => 2 awarded this is 25%

9 girls applyed for math => 4 awarded this is 44.4%

1 girl applyed for physcis=> 0 awarded this is 0%

in total:

3 boys out of 10 had scholarship this is 30%

4 girls out of 10 had scholarship this is 40%

--Wisamzaqoot (talk) 22:17, 31 July 2012 (UTC)

This is clearly sexist. (talk) 07:27, 20 October 2013 (UTC)

Psychologists section[edit]

If psychologists really say this to their subjects, then it is they who are confused:

"Psychological interest in Simpson's paradox seeks to explain why people deem sign reversal to be impossible at first, offended by the idea that a treatment could benefit both males and females and harm the population as a whole. "

I think I understand what is trying to be said, but no sane person could describe a situation where a treatment benefits males and females but not the population as a whole. Every member of the population is M or F. Therefore every member will benefit. End of story.

Now, you could certainly have a Simpson's paradox in such a case, but you would have to say something like "a treatment could benefit both males and females, yet a group receiving the treatment did worse on average than a group not receiving it"

When you say it like this though, its not very counterintuitive, if you've read the earlier part of the article. Hence I think it's not needed and I just removed the illogical clause. — Preceding unsigned comment added by Wstrong (talkcontribs) 19:10, 16 February 2013 (UTC)

Graph of Bart & Lisa Example[edit]

I believe that the 'Bart' graph (identified as the lower one) in this example is seriously misleading. The percent of articles improved by Bart (14.2% in the 1st week and 100% in the 2nd week) do not fit at all with the graph, which appears to show the opposite: a much greater percent improvement in the first week.

Less seriously, but still confusingly, the height of the bars appears to be only crude estimates. Bart's high contribution of 100% appears to be quite the same height as Lisa's high contribution of 71.4%, while Lisa's low contribution of 0% appears to be the same height as Bart's low contribution of 14.2%.

All in all, a reader has a better chance of understanding the paradox if he/she ignores this graph entirely.Stoddj (talk) 21:51, 8 July 2013 (UTC)

Introductory Graph[edit]

The graph next to the introduction seems to be misleading, as in that case the groups are distinguished by the variable in which the trend appears as opposed to the examples in which the groups are distinguished by some other variable.Yehoshua2 (talk) 06:15, 9 September 2013 (UTC)

I agree. The examples and the graphic are quite different forms of Simpson's paradox and I don't see how one is explained by the other. So it seems odd to have a linear-trend-reversal as the most prominent graphic but then only give examples of ratio-reversal. I suggest adding (or replacing) another example that explains that graphic in more detail.
Alternatively, why not use the introductory graphic of the Gerrymandering article as the lead since it is the prime example of ratio-reversal? Georg.anegg (talk) 11:32, 8 November 2017 (UTC)
Perhaps a change, but don't use the gerrymandering graphic - that's not an obvious illustration of Simpon's paradox at all. It will confuse readers who know nothing about the topic. - DavidWBrooks (talk) 11:59, 8 November 2017 (UTC)


This paradox is related to the old joke about the man who left Scotland for England and thereby raised the average IQ of both countries. It is particularly clear in the joke that the overall average cannot change, because the two situations (before and after) are simply different partitionings of the same set. Yet the sub-averages can both move in the same direction. (They both go up if the Scotland average is higher than the England average and the man is in between.)

You could modify the joke so that the overall average moves in the opposite direction, e.g. if two men leave Scotland and only one goes to England. But that would not only reduce the elegance of the joke; in fact, the constancy of the overall average in this case brings out the true nature of the paradox: not only can the overall average and the sub-averages move in strictly opposite directions, but more generally the overall average and the sub-averages are decoupled in a surprising sense. — Preceding unsigned comment added by (talk) 14:26, 16 January 2014 (UTC)

Note that the initial populations of Scotland and England need not have any particular ratio for this to work. I haven't thought for long enough yet about the precise relationship between this and the other examples.

Does Simpson's Paradox always disappear when causal relations are brought into consideration?[edit]

Does Simpson's Paradox always disappear when causal relations are brought into consideration, as the text currently implies?

In the example of Lisa and Bart, I do not see any causal relationships being brought into consideration. (talk) 00:58, 16 August 2014 (UTC)

"UC Berkeley gender bias" departments[edit]

what are the respective departments. I don't seem to able to find them in the given citation or the actual research paper. (talk) 20:13, 23 May 2016 (UTC)


  1. ^ [1] for the racial breakdown, and [2] for the combined results.
  2. ^ [3]

Let's remove the whole Bart and Lisa section[edit]

Somebody placed a "tone" hatnote on this article but didn't give any details about what bothered them. I hate it when editors do that and often remove such hatnotes, under the assumption that if they can't be bothered to explain their reasoning then the rest of us shouldn't be hassled by their concerns, but in this case he/she/it has a point.

I suspect the problem is the long and clumsy section titled "Description," which gives an imaginary example of the paradox involving Bart and Lisa (because no wikipedia article is allowed to exist without a Simpsons reference) and which is, indeed, written in an unnecessarily loose tone.

I do not believe the section is necessary at all, because we have several real-world examples that provide just as much illustration. I would like to kill that section altogether. What do others think? - DavidWBrooks (talk) 18:28, 25 August 2016 (UTC)

Well, I don't know if "no wikipedia article is allowed to exist without a Simpsons reference", but given the title of this page, I see why this particular one would be a good candidate for a Simpsons reference ;) Joking aside, I've never been a big fan of this section -- "long and clumsy" sums it well indeed. In my mind, it can go. Schutz (talk) 18:11, 30 August 2016 (UTC)
It's been a week. I want to be cautious about making such a big change, deleting something that's been in the article for so long, so I'll ask again: What do others think of deleting the section? - DavidWBrooks (talk) 23:47, 1 September 2016 (UTC)
Actually, now that I have reread the section in more details, I see that it has changed since I last read it. In the past, the numbers used as example were in the order of 100, making the Bart and Lisa example similar to other, real-life, examples. I see that the example now uses a total of 5 events in each case, making it very easy to understand what is happening. In addition, one editor had the good idea of adapting two of my figures (the weighing scales and the vector interpretation) to this actual example. I think the result is quite nice, and actually adds to the whole article. However, everything from "Here are some notations:" still seems long and clumsy to me -- not to say that it adds little to what is in the tables. I would at least remove this part of the section, but the rest of the example is useful. (ideally, we should find a real examples using such low values, and use that one instead...). Schutz (talk) 08:54, 2 September 2016 (UTC)
It still seems unnecessary to me, but you're right that the "notation" section is really unnecessary. I have removed that portion, and the "tone" hatnote. - DavidWBrooks (talk) 13:05, 4 September 2016 (UTC)

Trump section[edit]

"This aggregate phenomenon of poor voters being seemingly less likely than the rich to vote for Trump is driven by the following facts: (1) the majority of voters are white; (2) the white are more likely to be rich; (3) the white are more likely to vote for Trump. Once we control for race, we find that poor voters are in fact more likely than the rich to vote for Trump."

I don't like the choice of words here. Is controlling for race necessary? One could also say, once we control for cherrypicking, we find that poor voters are less likely to vote for Trump. DaßWölf 00:56, 14 November 2016 (UTC)

P.S. Also, we could easily imagine many ways to partition the 10,000 rich and 1,000 poor voters in the example (e.g. college graduates vs. others, baseball fans vs. others...) where the poor voters end up also less likely to vote for Trump. I can think of no obvious way to decide which partition is meritory. DaßWölf 01:00, 14 November 2016 (UTC)

I have removed it - it is original research and speculation. Plus, we have enough real-world examples. - DavidWBrooks (talk) 01:02, 14 November 2016 (UTC)

The number of examples[edit]

Do we really need six examples (counting Bart and Lisa) for the paradox? They take up more space than all the other sections combined. I would keep only two or maybe three at most, that should be illustrative enough for anyone IMO. DaßWölf 18:38, 3 June 2017 (UTC)

It is borderline excessive. We could certainly toss the low birth-weight paradox, referencing to it in "see also" - and frankly, I've never been a fan of the Bart and Lisa item, as above discussion shows; I don't think we need a fake example when we have such detailed description of real life examples. - DavidWBrooks (talk) 20:52, 3 June 2017 (UTC)
So I have removed the low birthweight paradox. - DavidWBrooks (talk) 01:36, 28 June 2017 (UTC)

Cogent visual argument up front[edit]

We should start off with a far more cogent visual argument?

Like some sketch with all of four groups misleadingly reversing the true overall trend, something akin to the image held back from us here at my Pat LaVarre [@PELaVarre] (29 July 2017). "I see this trend here, there, everywhere. And it reverses when I zoom out. ~~ Jabawack MethodsManMD Aug/2016 ..." (Tweet) – via Twitter. 

Ouch to see this sketch you would have to click out into Twitter, because I cannot upload that sketch directly here, because indeed I cannot "attest that I own the copyright on" this image. I only know I remixed it from Twitter's video transcode of their Gif of someone's two slides, it is copyright unknown to me.

It tumbled across my desk as F. Perry Wilson [@MethodsManMD] (10 August 2016). "A study suggesting pasta consumption can reduce BMI is a great example of Simpson's Paradox..." (Tweet) – via Twitter. 

It reached me today as a retweet from Emilio Ferrara [Jabawack] (29 July 2017).

I had forgotten which paradox was this paradox of 1899 Pearson et al., 1903 Yule, 1951 Simpson, but googling your work reminded me in a moment, thank you.

Pelavarre (talk) 19:06, 29 July 2017 (UTC)

A popular variant of Simpson's paradox?[edit]

Given that Men are more likely to be affected than women by some disease, and that young people are more likely to be affected, it is reasonable to conclude that young Men are the most affected group. But it's not true, of course.
— Preceding unsigned comment added by Georg.anegg (talkcontribs) 11:15, 8 November 2017 (UTC)

Suppose we have the following data of people affected by a disease, say:

under 30 over 30
Men 9/10 0/3
Women 0/5 1/1

Thus Men (total) have a higher rate (9/13) than Women (1/6),

and under 30's (total) have a higher rate (9/15) than over 30's (1/4).

Yet it is not true that Men-under-30 have a higher rate (9/10) than Women-over-30 (1/1). Do you consider this a variant of Simpson's Paradox? In some sense, it's a double application of the fallacious subset principle:
If Men have a higher rate than Women, then Men-under-30 have a higher rate than Women-under-30.
If under 30's have a higher rate than over 30's, then Women-under-30 have a higher rate than Women-over-30.
Hence Men-under-30 have a higher rate than Women-under-30 who have a higher rate than Women-over-30.

I feel like this is a fallacy that's committed extremely commonly (e.g. by newspapers) and as such deserves mention. (I have looked in other places too but couldn't find it. Please let me know if I missed it.) Georg.anegg (talk) 12:08, 3 November 2017 (UTC)

It doesn't strike me as much different than the examples we already have - not really worth adding as yet another example to the article, in my opinion. - DavidWBrooks (talk) 12:30, 3 November 2017 (UTC)

Implications for decision making needs sources and is probably wrong[edit]

This section suggests that obviously Treatment A should always be preferred. I think that's wrong. Think about it. If you saw a product on Amazon with 4 stars and 1000 ratings (small stones-treatment B) vs a product with 4.5 stars and 10 ratings (small stones-treatment A), you'd probably trust the 4 stars of the first product more than the 4.5 of the second. So it's entirely possible that you'd similarly prefer treatment A to B on small stones. Can't prove it on the actual number set without math I don't know, but logically it seems plausible. At least enough to demand further evidence and sourcing for the assertion to the contrary (talk) 05:36, 26 February 2018 (UTC)


Harper, M.


Lesser, L. (Winter 2010). Confounded. The Mathematical Intelligencer, 32(4), 53.  — Preceding unsigned comment added by (talk) 21:09, 4 June 2018 (UTC)