Talk:Regression toward the mean

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Statistics (Rated C-class, High-importance)
WikiProject icon

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

C-Class article C  This article has been rated as C-Class on the quality scale.
 High  This article has been rated as High-importance on the importance scale.
WikiProject Mathematics (Rated C-class, High-importance)
WikiProject Mathematics
This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Mathematics rating:
C Class
High Importance
 Field:  Probability and statistics

Issue with some phrasing[edit]

In the *Misunderstandings* section, there is this phrase "So for every individual, we expect the second score to be closer to the mean than the first score." This is not true. For example, individuals who have a score of exactly the mean should experience regression away from the mean. Regression toward the mean is specifically a phenomenon that affects the 'highest'/'lowest' performers. Everyone else expects to experience some sort of motion with respect to the mean, but not necessarily towards it.

In fact, there are a few statements surrounding that one that are related and misleading. I will now go and attempt to edit to clarify this statement in the page myself. --Ihearthonduras (talk) 18:27, 14 November 2017 (UTC)

Different use in finance[edit]

I added a note to the end of the introduction, because I'm almost certain that "mean reversion" as used in finance is fundamentally different from "reversion to the mean" or "regression to the mean" as described here. I don't think the Wikipedia article on Mean reversion (finance) is clear on this. As I understand it, as used in science and statistics, mean reversion is an effect that shows up when genuinely independent random samples are drawn successively from a fixed population having a constant frequency distribution.

As used in finance, it seems to be referring to a situation in which performance over successive time periods is not independent, but shows a negative correlation from one time period to the next. A fluke period of low returns is not followed by a typical period of average returns, simply due to the nature of a random process. On the contrary, a period of low returns has an actual tendency to be followed by a compensating period of high returns. Thus, the average return as holding periods increase decreases faster than it would if the process were a random walk.

The law of large numbers says that if you throw 10 heads in a row, then flip a coin 100 times more, the average number of heads for the whole 110 throws will be closer to 50/50, not because there's any tendency to throw more tails after a long series of heads, but simply because the maximum likelihood is that the 100 additional throws will be split 50/50 and the percentage for the whole series will decline from 10/10 = 100% heads to (10 + 50) / 110 = 55%. I've talked to a couple of financial specialists who have been quite definite that in finance, "mean reversion" does not just mean swamping out an unusual run with a series that simply has the mean value, it means active compensation--a run of low stock returns will (supposedly) tend to be followed, not by a run with mean-value stock returns, but by a run of higher-than-mean stock returns.

In the article, I'm doing my best to present this by paraphrasing what Jeremy Siegel says, but I admit that I'm going just a little farther by using the word "compensation." Dpbsmith (talk) 15:33, 22 December 2011 (UTC)

I think, with regards to finance, the random parts of the time series are generally modeled as a stationary process. I would conjecture with 95% confidence that stationary processes exhibit a "regression toward the mean" sort of phenomenon. This is probably the missing link that you want between these two articles. --Ihearthonduras (talk) 18:33, 14 November 2017 (UTC)

Why is it MORE likely that high performers are unlucky the next day?[edit]

Seems like they should be just as likely to be unlucky (or lucky) the next day as they were on the first day. Danielx (talk) 02:44, 20 December 2013 (UTC)

They aren't. However, if they are a high scorer for this event, then it's more likely that they have been lucky this time and entirely probable they won't be next time. Alpha3031 (talk) 13:17, 3 April 2015 (UTC)
They are. Note that the disagreement here hinges on a subtle difference in the frame of reference for the (un)luckiness. The highest performers are likely *to have had* (past) luck on the exam. The implicit assumption here, however, is that everyone is equally likely *to have* (future) luck on an exam, and that that luck is generated independently of the course and the individual (e.g. you get sick, your dog dies, etc). Supposing the expected luck to be 0, the high performers are just as likely to be lucky on the second exam as the were likely to be lucky before going into the first exam: expected luck = 0. However, their high performance on the first exam is evidence of good luck on the first exam. In fact, they are more likely *to have had* (past) good luck on the first exam than they are *to have* (future) good luck on the next exam. e.g. expected value of luck on the first exam of a high performer on the first exam, after the first exam has taken place = 1, expected value of luck on the second exam of a high performer on the first exam, before the second exam has taken place = 0. --Ihearthonduras (talk) 18:16, 14 November 2017 (UTC)

Regression effect/fallacy[edit]

The explanation given of regression towards mediocrity seems plausible. But it does not explain why this phenomenon also occurs with entirely random data. Generate (x, y) pairs from a bivariate normal distribution with the same marginal distributions and correlation 0.5. The regression effect will show up, and no genetic theory will account for it.

This is already discussed in the article, I'm just wondering if the example from genetics really explains something that is not already an artifact of the definition of the regression line.TerryM--re (talk) 22:52, 24 May 2016 (UTC)


"If your favorite sport team won the championship last year, what does that mean for their chances for winning next season? To the extent this is due to skill (the team is in good condition, with a top coach etc.), their win signals that it's more likely they'll win next year. But the greater the extent this is due to luck (other teams embroiled in a drug scandal, favourable draw, draft picks turned out well etc.), the less likely it is they'll win next year."

I don't see this as a good example of regression to the mean at all. There is such a huge amount of feedback loop going on (increased investment, morale, attracting better quality people etc.) that this is going to override any theoretical underlying probability based on 'normal conditions'. Also winning a championship is binary (as in you either do or you don't) - it might be more useful to talk about whether the final rank is higher or lower than their average rank. Btljs (talk) 11:53, 19 January 2018 (UTC)