|WikiProject Statistics||(Rated Start-class, Mid-importance)|
|WikiProject Mathematics||(Rated Start-class, Mid-importance)|
"Very large sample sizes may reject the assumption of normality with only slight imperfections. But, industrial data with sample sizes of 200 and more, have easily passed the Anderson-Darling test."
This citation comes from text I wrote for the MVPstats help files. MVPstats is a statistical analysis software program. Although this software has evolved, it originally began in 1986 as a simple program to provide computation of the Anderson-Darling test statistic. The citation may be found here: http://mvpprograms.com/help/mvpstats/distributions/NormalityTestingGuidelines
This software was used quite frequently in work we were doing in industrial situations, that is why the reference is to "industrial data." I have personally tested thousands of distributions over the years, and yes the statement is accurate. Anderson-Darling is one of the more powerful test for normality. The question one is generally trying to answer in using this or any other test for normality is whether or not the data come from a distribution that can be adequately modeled with a normal distribution. As the citation suggests, with large sample sizes, there may exist slight deviations from the normal/Gaussian distribution, although the model may be adequate. And yes, I also have seen much larger samples sizes easily pass the Anderson-Darling test.
Mvpetrovich 17:09, 26 March 2007 (UTC)
According to the Stephens (1974) article cited in the reference section, the actual sample size correction is A^2* = A^2 * (1 + 4/n - 25/n^2) and the 5% statistic for normality is 0.787.
Who is right?
It would probably be a good idea to actually reference the book being cited by Shapiro: how to test for normality and other distributional assumptions. —Preceding unsigned comment added by 188.8.131.52 (talk) 03:20, 2 August 2008 (UTC)
Critical values table
The article is missing the table of critical values. Unfortunately, I don't have access to Stephens (1974). Perhaps someone with access could supply the numbers (assuming it's not a large table)? Many thanks. pgr94 (talk) 14:38, 6 November 2008 (UTC)
Statistic and the integral
The statistic in the sum form and the integral are exactly equal, which can be showed in three pages by computing simple integrals. I see my change for making this explicit was reverted; it should be added back. I'm not sure though on how to go about convincing people other than suggesting to do the exercise themselves:) --Kaba3 (talk) 23:10, 23 February 2011 (UTC)
- What you really need to do is to provide a citation for the result in the article itself ... it must have been published somewhere. Melcombe (talk) 17:01, 24 February 2011 (UTC)
@Kaba3 or (Pgr94 ?) You are right - the computing formula for the statistic (wish we had equation numbers) is given in D'Agostino and Stephens on page 101 so there is a citation to show it equals the integral form. I still think it is nicer to introduce the notion of distance first rather than rework the sentence to say "the statistic is given by ..." It helps to know what a statistic measures. I agree the connection should be made explicit. Isn't it possible to make that connection explicit at the computing formula?
- (Also found the statistic formula (labeled W-squared) in Anderson-Darling (1954) page 765 equation (2).) Mathstat (talk) 05:57, 26 February 2011 (UTC)
Can someone give a worked example how to do this?
This page has way too many boring formulas only a math guy would read. I really want some numbers plugged in so I can see what's happening.
Here's what I did with python: import numpy import scipy.stats x=numpy.random.randn(10000) scipy.stats.anderson(x) this gives the following: (0.43368580228707287, array([ 0.576, 0.656, 0.787, 0.918, 1.092]), array([ 15. , 10. , 5. , 2.5, 1. ]))
So I compare the 0.43 (A^2 test statistic) to [ 0.576, 0.656, 0.787, 0.918, 1.092], see that it is lower than all of them, so the hypothesis of normality is not rejected at the 15% significance level. Is this right?
I repeated the above with x=numpy.random.rand(10000) (not a normal distribution) and get the following: (114.45988296709584, array([ 0.576, 0.656, 0.787, 0.918, 1.092]), array([ 15. , 10. , 5. , 2.5, 1. ]))
So I compare the 114 number to [ 0.576, 0.656, 0.787, 0.918, 1.092], see that it is higher than all of them, and reject the hypothesis of normality at the 1% significance level. Is this right?