Talk:Goodness of fit

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Statistics (Rated Start-class, High-importance)
WikiProject icon

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

Start-Class article Start  This article has been rated as Start-Class on the quality scale.
 High  This article has been rated as High-importance on the importance scale.
WikiProject Mathematics (Rated Start-class, Mid-importance)
WikiProject Mathematics
This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Mathematics rating:
Start Class
Mid Importance
 Field: Probability and statistics


The Anderson-Darling test should probably be mentioned on this page, as it tests the goodness of fit of a distribution (talk) 19:49, 30 July 2009 (UTC)

In this equation: \chi^2 = \sum {(O - E)^2 \over E} should it be \chi^2 = \sum {\left(\frac{(O - E)}{E}\right)^2}? Otherwise chi wouldn't be unitless if the observed/expected values have units. I'm going to make the change, but I'd like confirmation that this is the case --Keflavich 15:19, 15 April 2006 (UTC)

I don't think this is the case, I'm changing it back[edit]

I'm no stats expert, but my textbook says otherwise —The preceding unsigned comment was added by (talkcontribs) 01:02, 21 April 2006.

It's because frequencies, not actual quantities are used[edit]

So both O and E are dimensionless

Reduced Chi-Squared[edit]

It seems to me that the reference given for the reduced chi-squared gives a different formula than is included in this article. The reference indicates the reducted chi-squared is Chi^2/DOF where the DOF=#obs-#params-1. Privong 13:34, 30 July 2007 (UTC)

Nevermind. I see where the formula comes from. Perhaps it might be wise though to also put it in terms of the degrees of freedom? Privong 14:34, 30 July 2007 (UTC)
The formula is still wrong. Every source I can find (including the reference currently in the article) explicitly state that summing the squares ratioed by the variance is Chi-squared, and that the reduced Chi-squared is when you further divide the total by the number of degrees of freedom. I'm going to update the article accordingly, since the consensus in all sources I can find is that the reduced version is divided by degrees of freedom. -- (talk) 17:21, 1 April 2009 (UTC)

More work?[edit]

I think this page needs more theoretical work than just an example. . . . Just my thoughts —Preceding unsigned comment added by (talk) 19:15, 28 May 2008 (UTC)

Pictures/diagrams would help. Charles Edwin Shipp (talk) 21:59, 19 October 2011 (UTC)

Very confusing as currently written[edit]

Both the first two comments stem from confusion caused by defining O and E to be frequencies. I think it would be much clearer if the formulae were rewritten in terms of quantities and degrees of freedom, as in the cited article. Unfortunately, I don't have time to do this right now.

what is 'lack of fit' mean? The ‘Lack of Fit F-value’ of 0.57 implies that the Lack of Fit is not significantly relative to the pure error. The Value of ‘P > F’ is 0.7246 which means that there is a 72.46% chance that a ‘Lack of Fit F-value’ this large could occur due to noise. so what is the range of ‘Lack of Fit F-value’ is not significantly relative to the pure error? what is the meaning of ‘Lack of Fit F-value’ is not significantly relative to the pure error? the modelis not good fitting? —Preceding unsigned comment added by (talk) 00:29, 19 March 2009 (UTC)

Reduced chi-squared only for linear models[edit]

The article presents reduced chi-squared as if it were applicable to all kinds of models. However, this is not true. Comparing reduced chi-squared to unity is informative about the goodness of fit if and only if the model is purely linear! For nonlinear models it doesn't make sense. To give a reference, look into Barlow 1993, I think where he derives the chi-squared distribution. Unfortunately, he justs states that as a matter of fact - as I am doing here - but does not give an explanation. Maybe somebody else can point his finger to why reduced chi-squared doesn't make sense for nonlinear models?
Regards, Rene —Preceding unsigned comment added by (talk) 14:15, 3 August 2010 (UTC)

Notation and connection to regression, OLS, ANOVA and econometrics[edit]

When I studied this, goodness-of fit was called R^2 and was defined in terms of SSR (sum of squared residuals), SSE (sum of squared errors), and SST (sum of squares total), which in turn were defined in terms of yhat, xhat, ybar, xbar, and residuals, which in turn were defined in terms of the regression line based on the estimated values for the coefficient with the independent variable and the y-intercept. I think that these terms should at least be mentioned in the article. (talk) 00:31, 1 September 2010 (UTC)

Copyright problem removed[edit]

Prior content in this article duplicated one or more previously published sources. The material was copied from: Infringing material has been rewritten or removed and must not be restored, unless it is duly released under a compatible license. (For more information, please see "using copyrighted works from others" if you are not the copyright holder of this material, or "donating copyrighted materials" if you are.) For legal reasons, we cannot accept copyrighted text or images borrowed from other web sites or published material; such additions will be deleted. Contributors may use copyrighted publications as a source of information, but not as a source of sentences or phrases. Accordingly, the material may be rewritten, but only if it does not infringe on the copyright of the original or plagiarize from that source. Please see our guideline on non-free text for how to properly implement limited quotations of copyrighted text. Wikipedia takes copyright violations very seriously, and persistent violators will be blocked from editing. While we appreciate contributions, we must require all contributors to understand and comply with these policies. Thank you. Danger (talk) 11:43, 9 October 2011 (UTC)


The opening appears to be widely plagiarized as far back as 2005 on the Psychology Wiki — Preceding unsigned comment added by (talk) 22:53, 11 December 2014 (UTC)

Multivariate Gof-tests?[edit]

It is not mentioned if there are multivariate tests to test the fit of a multivariate distribution. If there are no such tests, shouldn't it be suggested to use multiple tests for the marginal assumptions at least? - (talk) 20:45, 9 April 2015 (UTC)

Error in example?[edit]

This sentence can't be correct: "A \chi_\mathrm{red}^2 < 1 indicates that the model is 'over-fitting' the data: either the model is improperly fitting noise, or the error variance has been overestimated."

Such an assertion would imply that linear relations (like the ideal gas law) are invalid.

The reference given earlier is to a paper that does not exist.

Createangelos (talk) 12:02, 21 April 2015 (UTC)

The claim about < 1 makes sense; your implication is confused; an exact linear law does not imply the measurement error disappears.
What ref?
Glrx (talk) 00:34, 24 April 2015 (UTC)