Talk:Abelson's paradox

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Statistics (Rated Stub-class, Low-importance)
WikiProject icon

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

Stub-Class article Stub  This article has been rated as Stub-Class on the quality scale.
 Low  This article has been rated as Low-importance on the importance scale.
 

Notability[edit]

I have added the Notability template as this terminology does not seem to be in widespread useage: an improved indication of context might provide this if there is important usage in some field of application. It is not widespread in "applied statistics" but might be in some sub-field such as "psychometrics". Melcombe (talk) 13:23, 18 May 2010 (UTC)

This term is referenced in secondary, independent sources, as already noted. (Good gracious! The page number from the Cohen reference - the Bible of applied staticians!- is given, and the DOI reference goes on at length about Abelson's paradox!) The Effect Size debate that has captivated the social and behavioral sciences, medicine, etc., of the past half century, which has culminated in differentiating between practical and statistical significance, has led to the discussion of these paradoxes. I can't do more than ask editors to actually read the references, other than to cut and paste quotes from them - which seems inappropriate according to Wiki citation rules and will lengthen an article without adding any value. (But, if I have to - sigh - I will.) I'll allow a period of time of comment from others before removing a notability tag that appears to be placed despite the references, and the same with the "See Also" paradox. Edstat (talk) 14:21, 18 May 2010 (UTC)
But Wikipedia is intended for general readers, not for people who already know the topic being discussed. Why bother to write something that is inherently meaningless? Melcombe (talk) 14:41, 18 May 2010 (UTC)
The judgment that something is inherently meaningless is not constructive. There is a paradox in interpreting effect sizes obtained via applied statistics which is discussed and referenced in the literature. An encyclopedia article is not the place to review all that literature - rather, to just summarize the point. The summary here is that effect sizes, obtained in scientifically famous experiments or even in the humor of everyday life, cannot necessarily be interpreted based on their magnitude, and in fact, some of the most important effect sizes are miniscule, but represent tremendous effects.Edstat (talk) 15:05, 18 May 2010 (UTC)
I didn't say that something was inherently meaningless, just that the text as written has no meaning. Melcombe (talk) 16:09, 18 May 2010 (UTC)
The meaning of Abelson's paradox is practical importance is not captured by the magnitude of an effect size, contrary to all rules of thumb (and indeed agreed to by the originator of effect size rules of thumb - Jacob Cohen). This is what the text says, this is what the references say, and this is what I am saying. The question I raised on the "See also" page: what is it you are saying? Please clarify. Repeating the same phrase is clearly not bringing any clarity to your point.Edstat (talk) 16:16, 18 May 2010 (UTC)
I'll do that for you. The linked article (the one with doi: identifier) devotes 1 sentence to this “paradox” (I guess this can be interpreted as “going on at length”). The sentence is following: “In later work, Bob [Abelson] pointed out that small amounts of variance explained can sometimes correspond to important real-world effects, when the influence of a variable accumulates over repeated instances to produce a meaningful outcome (Abelson, 1985).” This is rather illuminating (at least more illuminating than the article itself). First, we learn that this is not called the Abelson’s paradox, and in fact nobody even claims this is a paradox. Second, we can infer that Abelson was talking of a simple fact that even if R² is small in magnitude it could be still significantly different from zero, providing valid insight in the underlying phenomenon. Shouldn't come as a surprise to anybody who learned in basic statistics course that nR² has asymptotically χ² distribution, meaning that with low R² but high n the null might still be rejected. Third, we learn that there is no actual paradox here (you might want to look up the definition of the word), since the difference between a “rule” and a “rule of thumb” is precisely in that the latter may not always hold.  // stpasha »  04:40, 2 June 2010 (UTC)
Apart from your insulting language, try looking up the Cohen reference on the page indicated or click this reference[1], or most of the references here: [2] - so no, you haven't done that for me.Edstat (talk) 13:59, 2 June 2010 (UTC)
The paradox has nothing to do with a small (vs. large) r being statistically significantly different from zero. One need not wait for an asymptotic condition to know this; for a one tailed test r = .05 (and hence r2 = .0025) is significant at the alpha = .05 level for n = 1,000, a factoid memorized in any intro stat class.* Moreover, the paradox applies even when Ha:X, for X not equal to zero. The paradox pertains to the generally accepted isomorphic relationship between ES magnitude and practical significance.Edstat (talk) 15:16, 2 June 2010 (UTC)
And finally, regarding the statement "nobody calls this a paradox," check the title of Abelson's paper.Edstat (talk) 15:42, 2 June 2010 (UTC)
*Other examples: r = approximately .16 (r2 = .026) for n = 100; or r = approximately .23 (r2 = .053), for n = 50.Edstat (talk) 16:32, 2 June 2010 (UTC)
I apologize if I may have sounded rude, that was unintentional. However I'm still not persuaded by your position that this topic is notable. But before we can engage in a meaningful discussion, can you please answer the following questions: 1) Are you either Sawilowsky, or Abelson, or Cohen in real life? 2) Have you actually read and understood the Abelson(1985) paper that you cite? 3) How would you assess your knowledge of statistics, on 0-10 scale?  // stpasha »  17:32, 2 June 2010 (UTC)
Apology accepted, and likewise. You aren't supposed to be persuaded "by my position" - you are supposed to be persuaded by references. It is frustrating trying to get editors to actually read references (e.g., how can one say no one calls this the Abelson paradox who read the title of the Abelson article or went to the Cohen reference?)
In response to your questions:
1.) Very unusual question. See Wikipedia:About "http://en.wikipedia.org/wiki/Wikipedia:About "Wikipedia is written collaboratively by largely anonymous Internet volunteers who write without pay. Anyone with Internet access can write and make changes to Wikipedia articles<snip>. Users can contribute anonymously, under a pseudonym, or with their real identity, if they choose."
2.) I have read and understood the Abelson article. Please Wikipedia:Assume_good_faith that it has been read, and don't insult the editor to presume it has not been understood.
3.) My knowledge of statistics is not relevant. The premise of Wikipedia is to rely on material in secondary (and tertiery) sources as a substitute for content knowledge. The Cohen citation is independent, secondary, and reliable. However, if there is some statistical nuance that you believe is being misrepresented in the article, or on the discussion page, feel free raise the issue. Making an ambiguous statement and repeating it over and over (as others have done) does nothing to clarify the issue (if there indeed is one), at hand.
Some history on this page that concerns me: An editor asked me to make comments on an apparently contentious issue[3]. FYI, it was an editor who had done considerable, imho, unwarrented and inappropriate warring edits against me. However, the content is the only thing that matters, not editing cabals, their personalities, or their warring edits. I went to the contentious issue, and I agreed with the position of that editor. I wrote extensively to support her/his opinion. The other side didn't like it, and immediately began Wikipedia:Hound#Wikihounding, including this page.
I actually expected AfD from some of the past warring editors, so I wasn't surprised. I am disappointed, however, that editors can go to those with admin rights, be blatantly inaccurate, and apparently be believed.
The bottom line: If any editor wants to actually read the references, discuss them, discuss the topic with clarity, I am willing to do so. If not, AfD!Edstat (talk) 18:44, 2 June 2010 (UTC)
Oh yes, I forgot to mention that if you had read the Abelson DOI reference of 2007 you would have noticed it was a eulogy for him, and Cohen, of course, passed away in 1998.Edstat (talk) 18:52, 2 June 2010 (UTC)
Here are two more references that are secondary, independent, and reliable that use the phrase "Abelson's paradox":
  • (1) Michael Borenstein, The Shift from Significance Testing to Effect Size Estimation, In: Alan S. Bellack and Michel Hersen, Editor(s)-in-Chief, Comprehensive Clinical Psychology, Pergamon, Oxford, 1998, Pages 313-349, ISBN 978-0-08-042707-2.[4], and
  • (2) Anthony R. Pratkanis, Anthony G. Greenwald, A Sociocognitive Model of Attitude Structure and Function, In: Leonard Berkowitz, Editor(s), Advances in Experimental Social Psychology, Academic Press, 1989, Volume 22, Pages 245-285, ISSN 0065-2601, ISBN 9780120152223, DOI: 10.1016/S0065-2601(08)60310-X. [5].Edstat (talk) 23:54, 3 June 2010 (UTC)

Update[edit]

Regarding the notability tag, editors are referred to the "See Also" discussion page.Edstat (talk) 22:23, 28 May 2010 (UTC)

Abelson's paper[edit]

It seems like all the references given here and at the Sawilowsky's paradox page point to the psychology journals. Not being familiar with their literature, and whatever terminology is used, it is hard for me to make a just decision. So I decided to take a look at the http://psycnet.apa.org/journals/bul/97/1/129.pdf Abelson's paper] (which seems to be the “main” reference, where the notion is introduced). Here's what I've found:

First, the author poses a question: “What percentage of the variance in the athletic outcomes can be attributed to the skill of the players, as indexed by the past performance records?” Usually we first want to see how skill affects the mean of the outcome, however in this model the variable in question is binomial, so the mean and the variance are in close relationship. Next Abelson comments: “This variance explanation question is analogous to those that characterize psychological investigations, but arises in a context where there exist strong intuitions”. Then the author considers a simple model where a batter may or may not strike a hit at any given time at bat. The model is Xit = Bi + eit, where i indexes players, and t indexes strikes (one might argue that the outcome must also dependent on the skill of the pitcher, but this is disregarded in the paper). Dataset consists of two players. One is “significantly above average”, and the other is “significantly below”, which presumably is encoded as skill (S) = 0 or 1. Sufficient statistics was this dataset are following:

Skill of batter Hit (X = 1) No hit (X = 0)
Above average (S = 1) 320 680
Below average (S = 0) 220 780

Abelson writes: “I circulated a one-item questionnaire to all graduate students and faculty in the department of psychology at Yale University. … The respondents were asked to refrain from answering if they knew nothing about baseball or the concept of variance explanation. Participants were asked to imagine a time at bat by an arbitrary chosen major league baseball player, and to estimate what percentage of the variance in whether or not the batter gets a hit is attributable to skill differentials among batters.” Abelson reports that the median answer in this questionnaire was 25%. What is the true answer? According to data provided, for high skill players the variance is 0.32·(1 − 0.32) = 0.2176, whereas for low skill players it is 0.22·(1 − 0.22) = 0.1716. The difference “attributable to skill” is (0.2176 − 0.1716)/0.1716 = 26.8%. So people have guessed more-or-less correctly — what's the paradox then?

It appears that the paradox is that Abelson himself calculates a wrong answer. He runs a regression for the model given above. R² in such regression is just the squared correlation coefficient. In this dataset the correlation between X and S is equal to 0.1126, which gives R² of roughly 1% (somehow Abelson even forgets to divide by the variance of S, so that his “answer” is even smaller: 0.3%). He comments: “The single at bat is a perfectly meaningful context. I might put the question this way: As the team’s manager, needing a hit in a crucial situation, scans his bench for a pinch hitter, how much of the outcome variance is under his control? Answer: one third of 1%.”. So apparently the essence of the paradox is in how this could have potentially being published in a peer-reviewed journal. Maybe it was reviewed by people without any knowledge of statistics...  // stpasha »  03:18, 9 June 2010 (UTC)

Discussing the article on the talk page probably violates the tag that says this is not a formum to discuss the subject, but I suppose those rules only apply to me. In any case, I can give a more lengthy response when time permits, but for now, might I gently suggest instead of lambasting experts in the peer-reviewed psychology literature in general, and the APA flagship journal in particular (at least until the quantitative methods section split to become Psychological Methods), and making smug remarks taunting with your self-estimated expertise, you might be better served by re-reading the article. If I understand your comments above, you mix the data from one part of the article with another. Omega squared is given as Sigma2/[(Mu)(1-Mu)], where Sigma = .025, Mu = .270; which indeed yields .00317 or about 1/3 of 1% as stated on p. 131. Later, in illustrating various aspects of the paradox, Abelson gives a specific data set (see the table you quote) on page 132, for which Phi is .113,* which yields a variance estimate of .012769, from whence he gets 1.3%.
Nice try, though.Edstat (talk) 03:13, 10 June 2010 (UTC)
(In other words, the author of the article discusses explained variance (the paradox of magnitude v. practical significance), but also raises potential questions about the paradox and attempts to answer them, and in so doing, brings out nuances regarding "differences" between two levels of variation.)Edstat (talk) 15:02, 10 June 2010 (UTC)
*Phi = [(320*680)-(220*780)]/sqrt(1000*1000*540*1460)=.112623. Edstat (talk) 03:58, 10 June 2010 (UTC)
"Abelson's reaction to his finding (and mine, and no doubt yours) was one of incredulity. But neither he nor the editor and referees of Psychological Bulletin (nor I) could finad any fault with his procedure." (Cohen, 1988, p. 535)Edstat (talk) 15:17, 10 June 2010 (UTC)

2nd call for action[edit]

OK, has this article been given enough references (see article, not the talk page)? Can we now remove the notability tag, or nominate it for AfD?Edstat (talk) 22:25, 18 June 2010 (UTC)

Hearing none, and having added citations, I'm removing the tag.Edstat (talk) 12:20, 1 July 2010 (UTC)

More citations, wikilinks[edit]

I have added yet a 9th reference to this paradox, which should put the notability issue to rest. Similarly, I have added internal wiki links to key terms and hence removed the wiki tag.Edstat (talk) 15:54, 31 October 2010 (UTC)

There seems to be a lack of understanding here of what citations are for on Wikipedia. Apparently there are 9 references for the fact that Abelson's paradox was "by" Robert P. Abelson, but none for anything else that is said, including what it is about. Melcombe (talk) 13:47, 1 November 2010 (UTC)
Well, you could move the references to other spots in the text. For example,
  • "The accumulation of small effect sizes into big outcomes is sometimes seen in sport where the difference between victory and defeat may be nothing more than a trimmed fingernail. In baseball Abelson (1985) found that batting skills explained only one third of 1% of the precentage of variance in batting performance (defined as getting a hit.) Although the effect of batting skill on individual batting performance is "pitifully small," "trivial," and virtually meaningless," etc.
which was said by Ellis (2010), can be moved to the end of the text, if this is your point.Edstat (talk) 12:31, 2 November 2010 (UTC)
I've moved the references that discuss the paradox to the application in the text, to differentiate them from the references that refer to the paradox by Abelson.Edstat (talk) 12:39, 2 November 2010 (UTC)

r2 of batting averages??[edit]

What does that mean? In order to have an r2 as I have always understood it, you need to have one or more explantory variables and a response variable. Batting average is only one variable. Michael Hardy (talk) 00:42, 18 November 2010 (UTC)

Here the response variable is hit/miss at each strike, and the predictor is the skill of the player (another binary variable). Abelson misinterprets “percentage of variance explained by the regressor” with the “percentage of variance explained by the linear regression” (R²), and then acts all surprised when the two quantities differ. The author is particularly astonished of the fact that having a dataset at the vertexes of a square (which is where the binary+binary variables are located) the degree of fit of a straight line turns out to be rather low. See my recount of the events in #Abelson's paper section here on the talk page.  // stpasha »  08:52, 18 November 2010 (UTC)