# Talk:Degrees of freedom (statistics)

Jump to: navigation, search
WikiProject Statistics (Rated C-class, High-importance)

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

C  This article has been rated as C-Class on the quality scale.
High  This article has been rated as High-importance on the importance scale.
WikiProject Mathematics (Rated C-class, High-importance)
This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Mathematics rating:
 C Class
 High Importance
Field: Probability and statistics
One of the 500 most frequently viewed mathematics articles.

## Confusion

As a beginner of statistics, I still don't understand what this means.. can anyone explain it in a more layman term? -- —Preceding unsigned comment added by 219.79.235.199 (talkcontribs)

I still don't get it.. --Blakeops 19:57, 4 June 2006 (UTC)

The way the concept is explained seems latin german french to a layman trying to understand what actually is statistical degree of freedom . To comprehend the page it needs to be put in simple language ... someone please explain ........

Maybe a better definition could be : The number of values in a data set that are free to vary when restrictions are imposed on the set.149.169.118.217 20:53, 18 February 2007 (UTC)Rare.hero

I am really hoping the following sentence doesn't make sense: "In general, the degrees of freedom of an estimate is equal to the number of independent scores that go into the estimate minus the number of parameters estimated as intermediate steps in the estimation of the parameter itself.[2]" I am hoping that because it doesn't make sense to me. Briancady413 (talk) 23:01, 17 February 2010 (UTC)

### simplest context

The absolute simplest context in which degrees of freedom appears is in the problem of estimating a small finite collection of discrete densities. For examples if you have a random variable that can take on integer values 0,1,2,3 and nothing else. So you have 4 prior probabilities P(X=0), P(X=1), P(X=2), P(X=3) and you know that these four must sum to 1. The last unknown probability can be calculated from the first three if they are estimated so the fourth is not free to change on its own. But take care to note which prior can be "the fourth" is free to change in this context. We say then that this estimation problem has 3 degrees of freedom.

Degrees of freedom is analogous to asking how many independent variables are in the equation "a + b + c + d = 1". There arnt 4 independent variables in that equation if it is going to remain true for real numbers. If the domain for those variables is all the reals then there are 3 independent variables and one dependent variable. But for example ahead of time I dont demand that "a" be the dependent variable. I only demand that there be 3 independent variables and one dependent variable. We call that 3 degrees of freedom.

Jeremiah Rounds

There's a really good visual demonstration of degrees of freedom in "Statistics: An Introduction using R" by Michael J. Crawley (Wiley, ISBN 13:978-0-470-02298-6) p36-37. To paraphrase: Suppose we had a sample of 6 numbers with an average was 5. The sum of these numbers must be 30 otherwise the mean would not be 5. |_| |_| |_| |_| |_| |_| Fill each box in turn with a positive or negative real number. The first could be any number, for example 3. |3| |_| |_| |_| |_| |_| The next could be anything, say 9. |3| |9| |_| |_| |_| |_| The next could also be anything, say 4, 0 and 6. |3| |9| |4| |0| |6| |_| However, the last value can't be any number, it has to be 8 because the numbers must add to 30. There is total choice in selecting the first five numbers but none in selecting the sixth. There are five degrees of freedom when selecting six numbers. In general there are (N-1) degrees of freedom when estimating the mean from a sample of size N.

I think Crawley's description is excellent for stats beginners (like me) to be able to visualize what degrees of freedom means in a common context. This example is based on his example rather than directly copied; maybe someone with legal knowledge can decide whether this is allowed to be included! Chris--138.38.152.186 15:32, 11 June 2007 (UTC) AWSOME - Thanks you guys cleared this up for me. — Preceding unsigned comment added by 65.211.153.242 (talk) 00:02, 30 January 2012 (UTC)

## Cleanup

This page is a really horrible mess. I'll be back ... next week, I hope. Michael Hardy 17:51, 8 August 2005 (UTC) still confusing how is that??

This page is a mess and isn't very helpful. Could someone who knows something about statistics take the time to clean this up and make it useful? 67.170.10.225 06:24, 1 January 2006 (UTC)

## Plagiarism

I just removed content blatantly lifted from http://seamonkey.ed.asu.edu/~alex/computer/sas/df.html by a user at 70.162.179.53. This kind of crap is what makes Wikipedia look terrible. 12 April 2006 -- —Preceding unsigned comment added by 69.232.193.222 (talkcontribs)

## Link description

Umm.. Isn't the description behind that link incorrect? It says that degrees of freedom are given by the number of (independent) data points minus the number of model parameters, which is correct, but then in the first example "fits" a line to a single data point and says that df = 1-1 = 0. Last time I checked you needed two parameters to determine a line, not one, so shouldn't this be df = 1-2 = -1? The diagram with perfect fitting is correct, but df is said to be 1 whereas I'd expect 0. The third example is supposedly of overfitting, which the text correctly describes as having more model parameters than data points, but the diagram then has a line (still two parameters) fitted to three data points - isn't this just the other way around?!

[1] makes much more sense to me. AFAICS (in a model fitting context) it's about how many degrees of freedom the data has, in principle, available to vary from the best fit the model can produce, and the more it has them the more significant it is if the fit is good nevertheless. I'm not entirely sure that this is a good measure for the significance though. What if all of your models have only one parameter because you use a Space-filling curve to encode the true parameters in one real value? How to prevent a "model" that encodes all data points into one "parameter" using such a construct from being considered as a very good model of all data in the world? It's a pathological case for sure, but seems to me that it makes the df concept ill-defined, unless you can somehow restrict this using the concept of Hausdorff dimension or..? I can't see how the restriction could be defined, though. Maybe this is all nonsense, could someone enlighten me (and edit the article to a more respectable shape while you're at it? ;-) 82.103.198.180 22:07, 22 July 2006 (UTC)

The model fitting the straight line can be one parameter or two, if you fit a model with intercept and slope there are two terms (β0, β1), if you fita sub model without an intercept term there is one parameter estimated (β1). See Linear model for more information.
--Zven 01:53, 25 July 2006 (UTC)

## Degrees of Freedom confusion

Some of the confusion here is due to the term "Degrees of freedom" being used in different ways, with nuanced differences in meaning, in statistical theory and application.

The link [2] makes sense in that it alludes to this issue. Degrees of freedom is used to indicate the number of statistically independent pieces of information available with which to make inference. In statistical terms this is often stated as follows: Let Yi be independent and identically distributed random variables, i = 1, 2, ..., N; arising from a Normal distribution with mean 0 and variance 1 (the Standard Normal distribution).

In this special case, the sum of the squares of the Yi can be shown to have a Chi-square distribution with 'degrees of freedom' equal to N. (See e.g. 3 Thus historically in the early days of the development of statistical theory, 'degrees of freedom' was used to refer to the dimension of the space containing the data of interest.

Hopefully an expert in information theory can chime in here and solidify this use of the term 'degrees of freedom'.

I offer the following to start clarifying the different ways in which 'degrees of freedom' is used. I'll return and re-edit this over time, it's a bit complex to do in one sitting.

I would thus propose that the opening sentence be modified to read something like:

In statistics the term degrees of freedom (df) is used in three ways: (1) to indicate the amount of information available with which to make inference about a characteristic of interest, (2) to indicate the complexity of a mathematical model used in the inference process, and (3) a term referring to a characteristic of various probability distributions.

(1) Information content: 'Degrees of freedom' is a measure of the number of independent pieces of information with which to make inference about a characteristic of interest. Typically the pieces of information are called random variables and the characteristic of interest is the distribution of those random variables, or some mathematical quantity associated with that distribution. More specifically, the degrees of freedom of a set of random variables is the dimension of the space containing the set.

One of the most important distributions is the Normal distribution (the so-called Bell Curve), and its mean (the point about which the random variables cluster) and standard deviation (the degree to which the random variables deviate from the mean) are typically the mathematical quantities of interest.

(2) Model complexity: 'Degrees of freedom' refers to the number of parameters needed to completely specify a mathematical model of interest. For example, if the random variables at hand represent some measured quantity of people such as height or weight, the model of interest might be the Bell Curve that best represents the measured quantities, and two parameters are needed to completely specify the location and spread of that Bell Curve; namely the mean and the standard deviation. More specifically, the mean and standard deviation are a pair of real numbers; thus they can be represented by a point in two-dimensional space. The dimension of the space that can represent the parameters that specify a model of interest is called the 'degrees of freedom' of that model.

(3) Probability distributions: 'Degrees of freedom' refers to a characteristic of a probability distribution, such as the first central moment of the Chi-square distribution. Historically this naming is related to definition (1) above, for if Y1, Y2, ..., Yn are a collection of independent random variables each having a Standard Normal distribution, then the sum of the squares of the random variables has a Chi-square distribution with n 'degrees of freedom'.

In fitting a statistical model to data in N dimensions, the vector of residuals is constrained to lie in a subspace of N-P dimensions, where P is the dimension of the space spanned by the parameters of the statistical model. The total degrees of freedom is N. The model degrees of freedom is P. The residual degrees of freedom is N-P. The ratio of the model degrees of freedom to the residual degrees of freedom is an indication of the amount of information content of a model. This ratio is inversely related to information content - as the amount of information increases, the ratio of the model degrees of freedom to the residual degrees of freedom tends to zero. This condition is important in statistical distribution theory, and is a necessary condition in many theorems concerning the convergence of parameter estimates to the true underlying parameter value.

Rambling potential example perhaps more understandable to laypersons:

Statistical models are chosen typically to summarize or simplify the description of a complex set of information. Simplified descriptions are often extremely useful and allow decisions to be made that otherwise might be too complex. Summarizing a collection of weights of hundreds or thousands of people by modeling those weights as a Bell Curve allows many decisions to be made based on two numbers - the mean and standard deviation of the Bell Curve that best fits those hundreds or thousands of weights. For example, airplane manufacturers need to model the amount of weight that airline passengers will add to an airplane. The manufacturer can either keep a database containing the weights of all potential passengers, a very complex set of information, or they can keep two numbers, the mean and standard deviation of typical travelers. The 'degrees of freedom' of the weights of passengers is large, and complex to assess and maintain. The 'degrees of freedom' of the Bell Curve model is small and manageable: two degrees of freedom representing the mean and standard deviation of the Bell Curve. In order to estimate the mean and standard deviation, measurements of weight of some subset of potential travelers must be collected. No one would put much faith in a mean estimated from the weight of one potential traveler. Collecting the weights of several hundred or several thousand potential travelers will give a better idea of what an airplane manufacturer should plan for - thus the more data that goes in to an estimate of a distribution's parameters, the more 'degrees of freedom' or information content the estimate contains.

I offer the above to help clarify the confusion around 'degrees of freedom', but clearly much editing and refining is needed.

Smckinney2718 18:52, 3 November 2006 (UTC)

## Fractional values?

Who's written that fractional degrees of freedom are possible?? Why add confusion to an already confused article? --Gak 20:59, 8 February 2007 (UTC)

I don't know who added that, but I just added a bit on what they're used for. Michael Hardy 00:13, 9 February 2007 (UTC)

## Formal definition

Isn't there a formal definition of df in terms of fitted and observed values:

sum_i d(y_i_fitted)/d(y_i_observed)

i think this should be added somewhere, maybe someone more expert than i can do it.Stodge212

## Removed proposal to merge Residual Mean Square into this article

I couldn't work out why it said that "It has been suggested that Residual Mean Square be merged into this article or section". There is no mention of this on either discussion page, the two articles don't seem like sensible candidates for merging, and there is a proposal on the Residual Mean Square article that it should be merged into errors and residuals in statistics, which seems much more appropriate. I have therefore removed the merging proposal. If anyone objects feel free to reinstate it! Missdipsy 17:58, 13 August 2007 (UTC)

## What a mess!!

This page is a mess. It will take time to fix. The statement that d.f. is always one less than the sample size is nonsense. the "n − r − 1" identity is just as bad.

I'll be back..... Michael Hardy 04:22, 4 October 2007 (UTC)

I agree that the page is a mess; it isn't even close to being correct. I was going to link to it as a reference in a discussion, but removed the link in order to avoid confusion. Unless somebody intends to fix this page, I suggest deleting it. — DAGwyn (talk) 00:43, 16 February 2008 (UTC)
I agree it needs a lot of work. I think several of us have it on our list of things to do, but this must be one about the hardest concept in statistics to explain in a way that is both accessible and correct. Suggesting deleting it is a bit extreme though – there should clearly be an article on this, and at present it has a flag at the top warning the reader that it's in need of attention.
Ultimately it's all Ronald Fisher's fault for introducing the term into statistics without defining it... Clearly he knew it from studying (and teaching) physics and personally I wish this article had never been split from degrees of freedom (physics and chemistry), but i wasn't around when that happened and i don't wish to revisit that decision as it would be another distraction from improving this article. Qwfp (talk) 10:05, 16 February 2008 (UTC)

## Error on Page

I don't want to edit it outright, but there is a an error under the section "Residuals" on the line:

2. df = n-3 for cubic regression, because three points are needed to draw a cubic curve, this means 3 out of n points are lying on the curve, the rest n-3 are lying around the curve, this means standard error or aggregate fluctuations around the curve are because of n-2 points, so df = n -3 for cubic regression

You really need 4 point to draw a cubic, 3 points are only sufficient for a quadratic polynomial.

For this and other reasons, I agree with Michael Hardy that this page is in dire need of work, I just don't consider myself knowledgeable enought on this topic to do it myself. —Preceding unsigned comment added by Colinshep (talkcontribs) 18:00, 29 November 2007 (UTC)

## Novice Thoughts

I'm familiar with statistics, but struggle with the concept of degrees of freedom for some of the above mentioned reasons. In particular, as nobody ever seems to clearly define it, I struggle when the term dimension is introduced. Mathematically, dimension is a measure of linear independence (or presumably, transformed linear independence). I can see how this plays with the mean, but when you start looking at other moments, or parameters, the term dimension seems to lose its value. I would recommend that usage of the term dimension is done carefully, in a way that addresses these confusions directly, or alternatively, not used at all. —Preceding unsigned comment added by 192.91.173.42 (talk) 20:04, 12 March 2008 (UTC)

## Additions?

I think that one of the places someone might come across "degrees of freedom" is in analysis of variance tables of various types, so it might be worth mentioning these specifically. I note that there is an old redirect from Analysis of variance/Degrees of freedom to Analysis of variance but the latter doesn't really mention degrees of freedom and nor does it really discuss such tables... are there any articles on this? A second point relates to degrees of freedom in the context of likelihood ratio tests, where there is often an equivalence between the degrees of freedom of the test statistic and the number of extra parameters. This would at least allow some expansion beyond the linear-model content of the current article.

Melcombe (talk) 13:26, 12 June 2008 (UTC)
It might be possible to put in some R or Splus output and walk through it, talking about the significance of the df with regards to hypothesis testing etc. I've just tossed in a bit about the chi-square's df. I'm not sure how much we should be worrying about this, though, relative to getting the core of the article right. Is the article's explanation of df good as it is, now? (I think the real big problem is that as far as I can tell, even experts don't have that rigorous an idea about this, and I think there's been some tricks people have played with introducing new definitions over the years, fortunately all overlapping over the linear model bit.)--Fangz (talk) 15:24, 14 June 2008 (UTC)

## Geometric interpretation, definition.

I've started a new section on the geometric interpretation of degrees of freedom; in my view this is how d.f. should be defined, since it unifies all other definitions and interpretations. The section isn't complete yet (need to add more general examples, and relate to t and F distributions) -- will continue to work on it.

For now, leaving other sections untouched (even though they are poorly written and wrong in some places), will work on better integration later.

--Zaqrfv (talk) 23:58, 11 August 2008 (UTC)

## Splitsection: Effective degrees of freedom.

I've proposed a Splitsection for Effective degrees of freedom. Reasons:

1. This isn't degrees of freedom. It's about convenient distributional approximations and similar.
2. This section could, and probably should, be significantly expanded, which I'm reluctant to do when it's part of a larger (and important) page. Effective d.f. are used much more widely than is discussed here, for example in variance components models and Welch's t test. --Zaqrfv (talk) 21:55, 30 August 2008 (UTC)
Perhaps at some point this would be a good idea, but I don't see it best now, as the section is short and the article not overlong. That said, if you or others do expand the section, perhaps it should be revisited. Baccyak4H (Yak!) 20:09, 5 September 2008 (UTC)
But the argument isn't about length; it's about writing good, focused, scientific articles (which has very different requirements than, say, most articles in Special:Contributions/Baccyak4H). Effective d.f. has very little to do with the main topic of this page, and so expansion before splitting is inappropriate. --Zaqrfv (talk) 08:54, 10 September 2008 (UTC)
I'd support splitting for the reasons given, and also because this section is necessarily at a higher technical level than rest of the article, e.g. it requires an understanding of hat matrix and trace (linear algebra). (I'd prefer to keep this talk page for discussions of how to improve the article, however, rather than parenthetical asides about the general contributions of specific editors.) Qwfp (talk) 13:17, 10 September 2008 (UTC)
The section still seems pretty short, and nothing has been done for two years. Is it OK to remove the split notice? -- Eraserhead1 <talk> 17:36, 26 July 2010 (UTC)
So I went ahead and removed the notice.Fgnievinski (talk) 08:48, 2 November 2010 (UTC)

## Workable definition

Its nice to have a proper theoretical definition but it is useless to many readers. Very few people relate to "dimension of the domain of a random vector" yet many people do need to work with basic statistics and know about degrees of freedom. Lete's write for the common reader too.

As a start: "Statisticians use the terms "degrees of freedom" to describe the number of values in the final calculation of a statistic that are free to vary." [2] We could add: "Estimates of parameters can be based upon different amounts of information. The number of independent pieces of information that go into the estimate of a parameter is called the degrees of freedom (df). In general, the degrees of freedom of an estimate is equal to the number of independent scores that go into the estimate minus the number of parameters estimated as intermediate steps in the estimation of the parameter itself."[3]

Please help develop a definition and discussion which is useful to the rest of us. Rlsheehan (talk) 01:57, 19 September 2008 (UTC)

As discussed in Wikipedia:Manual of Style (mathematics)#Article introduction, an introduction for general readers has been added. Rlsheehan (talk) 13:45, 22 September 2008 (UTC)
What on god's earth is the following supposed to mean??

In general, the degrees of freedom of an estimate is equal to the number of independent scores that go into the estimate minus the number of parameters estimated as intermediate steps in the estimation of the parameter itself. —Preceding unsigned comment added by 132.170.163.177 (talk) 23:39, 5 August 2009 (UTC)

## Thanks, I get it!

I once asked a prof who was teaching a grad level stat class, "so what are degrees of freedom anyway?" He said, "blah blah, the number of cases free to vary. I said, "but what does that mean??" and he just kept on repeating it over and over. I walked out and never went back. After 20 years I finally get it -- the vector space discussion was very helpful.

—Preceding unsigned comment added by 128.97.244.19 (talk) 20:54, 14 January 2009 (UTC)

## Degrees of freedom in qualitative small-n research

The degrees of freedom problem is often advanced as a critique of qualitative, small-n research. Case-study researchers often test a range of independent variables with a very limited number of cases. Therefore, the degrees of freedom, it is argued, are almost inevitably negative. George and Andrew Bennett (2005, "Case studies and theory development in the social sciences", Cambridge: MIT Press), however, point out that this overlooks the fact that a case study tends to consist of a lot more than just a single empirical observation. Therefore, small-n research allows for the testing of a much larger number of independent than commonly assumed.

Would anyone mind if I added this to the article? —Preceding unsigned comment added by Niclas M. (talkcontribs) 16:04, 4 April 2010 (UTC)

## Conservation of DoF.

The article needs to convey the idea of conservation of DoF under mappings. For example a time series of N values has Nd.f. The Fourier transform, actually series, has N/2 spectral estimates. Each estimate has A and B coefficients, or amp and phase, and thus 2d.f. The total is again Nd.f., conserving DoF.

DoF is closely related to the concept of information, dimensions of vectors, and formal mechanics. 2d.f. is the most complex problem in mechanics that have general solutions.

## Why not start with something much simpler like x+y=4?

I've been teaching this stuff long enough to know that a simple example is the best place to start. So, simply start with x+y=4, and I choose x to be 1, then y must 3, or if I chose x to be 2, then y must be 2. In other words, I can choose x, but once x is chosen, y is immediately determined. So I have 1 degree of freedom. The same reasoning can follow with respect to y. — Preceding unsigned comment added by Towsiak (talkcontribs) 05:20, 2 February 2013 (UTC)

## Linear regression example: simple? Perhaps needs further elaboration...

The second example for understanding degrees of freedom for error in linear regression (see: http://en.wikipedia.org/wiki/Degrees_of_freedom_%28statistics%29#Linear_regression), is good but I think it is missing a crucial step. The relevant part currently reads (as of March 5, 2013):

Then the residuals

$e_i=y_i-(\widehat{a}+\widehat{b}x_i)\,$

are constrained to lie within the space defined by the two equations

$e_1+\cdots+e_n=0,\,$
$x_1 e_1+\cdots+x_n e_n=0.\,$

One says that there are n − 2 degrees of freedom for error.

The problem is that this is the most important part of the explanation but does not in fact explain why the residuals are constrained by these two equations... The fact that the residuals add up to 0 make sense to me (the first equation). But where does the second equation come from? Why does the sum of the product of x's and residuals add up to 0? How is that derived? Finally, for the sake of a layman, perhaps it should be explicitly stated why, given two constraining equations, it follows that there are n-2 degrees of freedom.

I would gladly help make the train of thought of this example easier to follow, but I fear my understanding is not good enough to do it. — Preceding unsigned comment added by Firth m (talkcontribs) 03:42, 5 March 2013 (UTC)

I agree a clarification would be helpful. The origin for the second equation is described in Least_squares#Solving_the_least_squares_problem, where r = e and x_i = \partial e / \partial b. It sums to zero because you're seeking the minimum of the sum of squared residuals (see Maxima_and_minima). Fgnievinski (talk) 06:14, 5 March 2013 (UTC)