Talk:Regression toward the mean

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
WikiProject Statistics (Rated C-class, High-importance)
WikiProject icon

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

C-Class article C  This article has been rated as C-Class on the quality scale.
 High  This article has been rated as High-importance on the importance scale.
 
WikiProject Mathematics (Rated C-class, High-importance)
WikiProject Mathematics
This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Mathematics rating:
C Class
High Importance
 Field:  Probability and statistics

Confusion of terms and concepts[edit]

In the *Other Examples* section the author appears to continuously confuse probability and statistics. They claim "If your favorite sport team won the championship last year, what does that mean for their chances for winning next season? To the extent this is due to skill (the team is in good condition, with a top coach etc.), their win signals that it's more likely they'll win next year." This is an extremely poor form confusion of the statistics of the victorious match (1 win out of 1 trial implying a 100% likelihood) with the probability of future matches and comparing the probability to that statistical sample as such. How could it be more likely to win than 100%? The entire section needs to be revised or removed. — Preceding unsigned comment added by 184.54.35.21 (talk) 08:05, 15 April 2018 (UTC)

Examples[edit]

"If your favorite sport team won the championship last year, what does that mean for their chances for winning next season? To the extent this is due to skill (the team is in good condition, with a top coach etc.), their win signals that it's more likely they'll win next year. But the greater the extent this is due to luck (other teams embroiled in a drug scandal, favourable draw, draft picks turned out well etc.), the less likely it is they'll win next year."

I don't see this as a good example of regression to the mean at all. There is such a huge amount of feedback loop going on (increased investment, morale, attracting better quality people etc.) that this is going to override any theoretical underlying probability based on 'normal conditions'. Also winning a championship is binary (as in you either do or you don't) - it might be more useful to talk about whether the final rank is higher or lower than their average rank. Btljs (talk) 11:53, 19 January 2018 (UTC)

Confusion about what is being implied[edit]

It says in this article, in the "Other statistical phenomena" section, "For example, following a run of 10 heads on a flip of a fair coin (a rare, extreme event), regression to the mean states that the next run of heads will likely be less than 10..." but what does that actually mean? Because, the odds of throwing 10 heads in a row haven't change at all due to that rare first event. The odds of throwing 10 heads in a row is still as it always was 1/1024 and those odds don't change over time, previous experience is completely irrelevant to any future probability. There is no God evening-out things, trying to make things fairer. It is true that the chance of throwing a run of 10 heads on a flip of a fair coin is a rare event and so if you keep throwing the coin it is highly probable that you will not throw another 10 heads in a row, but the probability of that same rare event happening again has not changed at all, the odds for the second time of a run of 10 heads on a flip of a fair coin are exactly the same as they were on the first. When you look at all the throws of the coins, say a couple hundred times later, the ratio of head to tail is highly likely, but only highly likely, to be close to 50/50 but that definitely does not mean that the probability in the second run of throwing 10 coins was less likely to end up with 10 heads than the first, which seems to be what is being subtly implied in this Wikipedia article. It is true that following a run of 10 heads on a flip of a fair coin (a rare, extreme event), the next run of heads will likely be less than 10, but your chance of throwing 10 heads on a flip of a fair coin (a rare, extreme event) was always less than 10, even before the first run, so what is significantly being said there?! The probability for the second run of throwing the coins hasn't been changed, the odds are still the same as they were for the first time, 1/1024 of them all turning out to be heads. Even if you did by chance throw another 10 heads the second time, then if you were to consider throwing the coin a third 10 times your calculated odds for the third time are still exactly the same as they were for the very first run, 1/1024 chance of getting 10 heads in a row! Once an extreme pure chance event happens it does not lessen the likelihood of another extreme pure chance event happening, it is perhaps less likely that you'll get two extreme pure chance events happening rather than just one but once an extreme pure chance event has happened it will have absolutely no effect on the future probability of another extreme pure chance event happening. The "Other statistical phenomena" section confuses me, what on Earth is it implying!? It almost sounds like a confidence trick, a scam is being marketed.

First sentence[edit]

"[...] the phenomenon that if a variable is extreme on its first measurement, it will tend to be closer to the mean or average on its second measurement, and if it is extreme on its second measurement, it will tend to have been closer to the average on its first." Put another way, this sentence says that if you measure something twice, one of the measurements would be closer to the mean than to other, which is a rather meaningless sentence. Or one could understand it as saying that if one measurement is extreme, then the two measurements would be different from each other. This of course depends on the measurement in question. Not a very good way to start an article. I would have corrected it but I came here to learn about the concept, so I don't really have what it takes to make this correction. --178.8.24.240 (talk) 15:44, 28 November 2018 (UTC)

Picking nits - 'regression AWAY from the mean'[edit]

Regarding this: individuals that measure very close to the mean should expect to move away from the mean. I found a couple of references to 'regression away from the mean' online, but someting's wrong here. A person who scores 1oo on an IQ test (the mean, by definition), may indeed have a different score in a retest, and so, would have moved 'away from the mean.' But is this 'regression?

The word regression implies a return to an expected value. In the case of normally distributed IQ scores, the most probably score of a random test-taker (also the mean) is given the value of 100. So a test-taker who does, in fact, score 100 on a first test is already at the most probably value. So when, after a second test, a score of 98 results, to what has the test-taker 'regressed' to?

The only way this makes sense is if each test score is considered a single outcome in a sample of test results - the sample representing a part of all possible test results. So if the testee took 100 tests, we might find a range of scores (error on IQ tests is within 4 points, I believe). If the mean of the 100 test scores were 102, then the first score of 100 would, in later tests, regress toward the mean value of all possible test results.

In this case, there is some logic to a regression away from the population mean, but only in the sense that the first score of 100 was away from the test-takers true ability, and thus, what we really have is a regression of the test-taker's score to his or her personal mean test score.

The other way to think of it - the only other way I can think of - is that the movement away from the mean after a first score is not a regression at all, but simply error variance. Thus, if a testee's true IQ were 100, we would expect later test scores to be different, but only because test results are not perfect measures of intelligence - there is error in the testing process.

Another way to think if the first case is also for IQ. If two parents each have an IQ of 100, we do not expect their children to 'regress away from the mean.' In fact, their children will be expected to have IQs normally distributed with a mean of ... 100. The fact that sibilings have a mean IQ difference of about 11 points does not come from any regression away from the mean. It is simply the result of the probably distribution of IQ, centered on the mean, with lesser probabilities as you move out from the mean.

MarkinBoston (talk) 18:50, 17 April 2019 (UTC)