# Talk:Covariance matrix

WikiProject Statistics (Rated C-class, High-importance)

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

C  This article has been rated as C-Class on the quality scale.
High  This article has been rated as High-importance on the importance scale.

## Pooled within-group covariance matrix

I have found use for the "pooled within-group covariance matrix." See for example http://people.revoledu.com/kardi/tutorial/LDA/Numerical%20Example.html or "Analyzing Multivariate Data" by Lattin, Carroll, and Green. This page seems a natural place to put a definition for such a thing, but it doesn't exactly fit into the flow of the page. Suggestions? Otherwise I'll just jump in at a future date. dfrankow (talk) 16:54, 29 December 2008 (UTC)

## New problem...

This article starts with a big mess of slashes and sigmas and brackets... not sure if this is just my browser rendering something in a strange manner (though I'm using firefox, so I expect I'm not the only one seeing this) or quite what it is. I don't know how to fix it either; maybe somebody else does?

—The preceding unsigned comment was added by 129.169.10.56 (talk) 16:32, 5 December 2006 (UTC).

## Explanatory Formula

This formula looks at first sight very complicated. Actually its derivation is quite simple (for simplicity we assume μ to be 0,(just replace everywhere X by X - μ if you want)):

1. fix a direction in n dimensions (unit vector), let's call it u
2. project your data onto this direction (you get a number for each of your data vectors, or taking all together a set of scalar samples); you perform just a scalar product i.e.: $S(i) = (X(i),u)$
3. compute ordinary variance for this new set of scalar numbers

We are almost finished. Of course for every unit vector you get (in general) different values, so you do not have just one number like in the scalar case but a whole bunch of numbers (a continuum) parametrised by the unit vectors in n dimensions (actually only the direction counts, u and -u give the same value) Now comes the big trick. We do not have to keep this infinity of numbers, as you can see below all the information is contained in the covariance matrix (wow!)

${\rm var} {((u,X))} = E((u,X)^2) = E((u,X)(u,X)) = E(u^\top XX^\top u)$

Now because u is a constant we have:

$E(u^\top XX^\top u) = u^\top E(XX^\top)u$

or

${\rm var} {((u,X)} = u^\top E(XX^\top)u = u^\top \Sigma u$

and we are done... (easy, isn't it :)

## Comments moved here to Talk page

I have moved the comments above to this discussion page for several reasons. The assertion that this very simple formula looks "very complicated" seems exceedingly silly and definitely not NPOV. Then whoever wrote it refers to "its derivation". That makes no sense. It is a definition, not an identity or any sort of proposition. What proposition that author was trying to derive is never stated. The writing is a model of unclarity. Michael Hardy 22:52 Mar 12, 2003 (UTC)

### I was attempting to explain covariance matrix

Okay I have written the stuff above and would like to reply...
1) It may be true tha for someone familiar with statistics the covariance matrix is immediatly understandable... nonetheless for someone outside of the statistics world (like me) it looks complicated...
2) by derivation I actually meant Motivation, or some hint how one can understand the meaning of covariance matrix and what it can be used for.
If you just throw the formula at people most of them do not understand its meaning and it is not clear at all why a formula like this makes any sense...
Of course you can let everyone find out by him/herself but this is just waste of time since the underlying concept is very simple...I did not try to prove any statement.
3) what is a clear writing is certainly discutable ;-) maybe it is not as water proof and correct as yours, but the idea is quite clear and this is the only thing that counts...

#### Idea was is not clear

Except that the idea is not clear without considerable interpretation, and even then I'm not entirely sure of what was meant. I think that what was intended could certainly be said more clearly with far fewer words. I'll make some attempt at this within a few days. Michael Hardy 20:48 Mar 13, 2003 (UTC)

## Another explanation

Okay I try to explain the idea more clearly:

if you have some set of vector measueremnts you can consider it as a cloud of points in n-dimensions. If you want to find something interesting about your data set you can look a the data from different directions, or what is essentially the same perform a projection into 1,2, or 3 dimensions. But there are many projections possible, which one to take ? Life is not enough to try them all ;-) One criterion you can apply (not the best one but is better than nothing and it works sometimes...) is to look at directions for which the data have large variance (this makes sense if you want to find the most "energetic" components...) I tried to explain that the covariance matrix is a tool (at least can be interpreted in this way) to represent the data variances in all possible drections in a effective and compact way. If you once understand this it is immediatly clear why it is useful to look for the eigenvalues and eigenvectors of the covariance matrix.

I can not expect to see your version of this ! ;-)

### Rephrased

I'll try my hand at the above:
If you have many measurements of similar things (such as GNP and airplane passengers for several years), such lists of information can be manipulated with linear algebra as vectors of numbers. If graphs are drawn of all useful combinations of these numbers, the result is many separate points on many graphs. Comparisons between the numerous graphs can be done if all the numbers are placed within one graph with as many dimensions as there are data items, thus creating a "cloud" of points in this space.
For example, GNP of one country for several years can be shown on a page as a graph of years by values. That has only two dimensions. Airline passenger numbers can be shown to the side of that graph in a three dimensional display. Adding another measurement, such as electricity consumed, requires a fourth dimension which is often hard to visualise.
Finding interesting information about such combinations can be difficult. One issue to examine is which part of that cloud of points has the greatest difference. The covariance matrix is the result of calculating distances between all the data points in all directions, and producing the direction through the cloud which has the greatest differences. Recording the distances between different types of data within the cloud produces the covariance matrix, where larger numbers suggest greater relationships.
(should eigenvalues and eigenvectors be introduced here, or are those considered to be parts of calculations other than covariance?) SEWilco 18:55, 15 Jan 2004 (UTC)

I have inserted into the article some language that I think addresses your point, which I still think was quite unclear as you wrote it originally. Michael Hardy 19:53 Mar 14, 2003 (UTC)

PS: "One criterion is ....", but some other criteria may exist. (My point here is that in standard English, "criterion" is the singular. Michael Hardy 19:54 Mar 14, 2003 (UTC)

### Still not clear and lost illustration

Unfortunately I'm not very happy with your version of my explanation.
Such sort of explanations can be found in any book, and I never found them very helpful. You have completely lost the geometric picture which makes all clear and simple. To say that the matrix entries are covariances between the variables

does not explain anything, and actually makes it more complicated because as you said this depends on the basis...

I would prefer to let decide the people who visit this site by themselves

which version they find more illuminating... At the moment it is hardly possible because my version is quite hidden :-(

Ps.
With "criterion" you are right, I'm not very good in english...
(nonostante capisci quasi tutto, che miracolo !)

### Still looks unclear

I can write a more leisurely explanation when I have time, but your version still looks unclear to me as it stands. Michael Hardy 22:43 Mar 17, 2003 (UTC)

## Please explain what is defined

I think that it is better to explain the things one defines. The above explanations helped me more to understand the topic than the mathematical absolutely correct formulas one finds, when looking for "covariance matrix". To my opinion most of the people that use Wikipedia are interested in both versions, so they should see them at the same site and not hidden in the discussion group.

### What is unclear?

Maybe you could be more specific on "looks unclear to me"... I tried to write as clearly as I only can, because I want that everyone understands it with ease. If there is something unclear to you maybe I could explain it better but at the moment I do not know what is unclear ?

People, label yourself in your comments so we know who is talking. Also be a little more specific about what you are pointing at. There are too many "that", "I", and "you" for it to be clear who is talking about what. I suggest the four-tilde signature so the date is included. SEWilco 17:11, 15 Jan 2004 (UTC)

## Yet Another Rephrasing

(SEWilco 08:39, 7 Jul 2004 (UTC)) Maybe something like this will be useful:

A vector is a list of numbers. The variance is the square of the difference between a number and an expected value, such as the variance of two lengths is an area the size of the square of the difference. The covariance matrix has a list of numbers along one side, and the other side has a list of the expected values for each listed value. Each position of the matrix is filled in with the square of the difference between those two numbers. The covariance matrix then contains the variance between each listed number and the expected values of all numbers in the list. This shows how different all numbers are from all the expected values.

## Nonstandard notation?

I've never encountered the usage of $var(\textbf{X})$ for denoting the covariance matix. I've always used $\textbf{C}_X$ for this (and $\textbf{R}_X$ for autocorrelation matrix). This is standard notation pracise in the field of signal processing. --Fredrik Orderud 12:26, 20 Apr 2005 (UTC)

How about $\operatorname{cov}(X)$? Cburnett 14:24, Apr 20, 2005 (UTC)
I've now changed the notation to "cov", which is in accordance with mathworld. I've also added a separate "signal processing" section containing the different notation used there. --Fredrik Orderud 21:02, 28 Apr 2005 (UTC)
Pardon me, but who wrote the comments immediately above??? Michael Hardy 20:07, 28 Apr 2005 (UTC)
Sorry. It as me, and I forgot to sign. --Fredrik Orderud 21:02, 28 Apr 2005 (UTC)

Standard notation:

$\operatorname{var}(\textbf{X}) = E[(\textbf{X} - E[\textbf{X}])(\textbf{X} - E[\textbf{X}])^{T}]$

ALSO standard notation:

$\operatorname{cov}(\textbf{X}) = E[(\textbf{X} - E[\textbf{X}])(\textbf{X} - E[\textbf{X}])^{T}]$

ALSO standard notation:

$\operatorname{cov}(\textbf{X},\textbf{Y}) = E[(\textbf{X} - E[\textbf{X}])(\textbf{Y} - E[\textbf{Y}])^{T}]$ (the "cross-covariance" between two random vectors)

Unfortunately the first two of these usages jar with each other. The first and third are in perfect harmony. The first notation is found in William Feller's celebrated two-volume book on probability, which everybody is familiar with, so it's surprising that some people are suggesting it first appeared on Wikipedia. It's also found in some statistics texts, e.g., Sanford Weissberg's linear regression text. Michael Hardy 18:05, 28 Apr 2005 (UTC)

## Horrible mess!!

This article was starting to become an examplar of crackpothood. Someone who apparently didn't like the opening paragraphs, instead of replacing them with other material, simply put the other material above those opening paragraphs, so that the article started over again, saying, in a later paragraph, "In statistics, a covariance matrix is ..." etc., and giving the same elaborate definition again, with stylistic differences. And that eventually became the second of FOUR such iterations, with stylistic differences! Other things were wrong too. Why, for example was there a "stub" notice?? There should have been a "cleanup" notice right at the top, instead of a "stub" notice at the bottom. Inline TeX often gets misaligned or appears far too big, or both, on many browsers, but it looks as if someone went through and put perfectly good-looking non-TeX inline mathematical notation with TeX (e.g., "an n × n matrix" ---> "an $n \times n$ matrix"). (Tex generally looks very good when "displayed", however. And when TeX used in the normal way, as opposed to its use on Wikipedia, there's certainly no problem with inline math notation.) Using lower-case letters for random variables is jarring, since in many cases one wants to write such things as

FX(x) = Pr(Xx)

and it is crucial to be careful about which of the "x"s above are capital and which are lower-case. The cleanup isn't finished yet .... Michael Hardy 19:16, 28 Apr 2005 (UTC)

## Properties

I added the list of properties in the article. There was only two of them stated, and these properties should definitivley be on an article about cov and var matrices. --Steffen Grønneberg 14:06, 5 October 2005 (UTC)

Somehow I find it difficult to understand the 5th property:

5. $\operatorname{cov}(\mathbf{X},\mathbf{X}) = \operatorname{cov}(\mathbf{X},\mathbf{Y}) = \operatorname{cov}(\mathbf{Y},\mathbf{X})^\top$

Shouldn't the correct formula be:

5. $\operatorname{cov}(\mathbf{X},\mathbf{Y}) = \operatorname{cov}(\mathbf{Y},\mathbf{X})^\top$

, since no relationship between "X" and "Y" is defined? Does anyone have a reference on this? --Fredrik Orderud 12:01, 11 October 2005 (UTC)

Yep, my bad. Fixed it now. The reference I used is Multivariate Analysis by K. V. Mardia, J. T. Kent, J. M. Bibby. Its in chap. 3. (http://www.amazon.com/gp/product/0124712525/103-2355319-3731041?v=glance&n=283155&n=507846&s=books&v=glance) Is there a place where it's usual to cite this? --Steffen Grønneberg 23:10, 12 October 2005 (UTC)

Before I go make a fool of myself, shouldn't A and B be q x p matrices instead of p x q matrices? --The imp 15:32, 15 March 2006 (UTC)

I think the A and B matrices have the proper dimensions, but I changed the description of $\mathbf{Y}$ to a p x 1 vector (from q x 1) and changed $\mathbf{a}$ to a q x 1 vector (from a p x 1). I did this based on:

Properties 4, 5, 8, etc. that seem to imply that $\mathbf{X}$ and $\mathbf{Y}$ have the same dimensions.
Property 3, where $\mathbf{AX}$ would be a q x 1 vector, so I don't think it makes sense to add $\mathbf{a}$ if $\mathbf{a}$ is a p x 1 vector.

If I'm wrong, my apologies - feel free to correct it back to the original. —Preceding unsigned comment added by 64.22.160.1 (talk) 21:37, 8 May 2008 (UTC)

## Conflict of var and cov?

I don't see how var(X,Y) and cov(X,Y) "conflict". We don't say that gamma(x) and x! or asin and arcsin "conflict" or "jar". It is perfectly OK to have var(X)=cov(X)=cov(X,X). The fact that there is a one-argument function cov doesn't exclude there being a two-argument, related, function. Consider, say, Γ(x)=Γ(x,0). --Macrakis 21:51, 21 December 2005 (UTC)

Hear, hear!! "Conflicting" notation is things like "is 0 in $R^+$" and "where does the $2\pi$ go in the Fourier Transform?". I've toned down the expression of conflict. LachlanA 00:47, 22 January 2007 (UTC)

## Calculation of covariance matrix

It would be nice to have a section on computational methods for calculating covariance matrices. I am unfortunately not competant to write it.... Any volunteers? --Macrakis 21:51, 21 December 2005 (UTC)

       public Matrix CovarianceMatrix(double[,] myArray)    // For a num of TS, n, there are n*(n-1) covar's.
{
//Cov(X, X) = Var(X)
//Cov(P, Q) = Cov(Q, P)
//vcvMatrix is symmetric square, with Var(i) on the leading diagonal.
//vcvMatrix is positive semi-definite (should i include a safty test??)
//Cov(P, Q) is NOT unitless; its units are those of P times those of Q.
int nCols = myArray.GetLength(1);
int nRows = myArray.GetLength(0);
Matrix vcvMatrix = new Matrix(nCols, nCols);
double[] u = mean(myArray);
for (int i = 0; i < nCols; i++)      //rows of the vcvMatrix
{
for (int j = 0; j < nCols; j++)  //cols of the vcvMatrix
{
double temp = 0;
double covar = 0;
for (int z = 0; z < nRows; z++)
{
temp += (myArray[z, i] - u[i]) * (myArray[z, j] - u[j]);
}
covar = temp / (nRows - 1);
vcvMatrix[i, j] = covar;
}
}
if (!vcvMatrix.Symmetric)
{
throw new ApplicationException("VCV matrix is not symmetric ");
}
return vcvMatrix;
}

Basically by computation, you mean estimation, so this belongs in estimation of covariance matrices or sample mean and sample covariance Prax54 (talk) 14:43, 10 August 2012 (UTC)
However, it would be helpful to many readers if this article pointed out the difference between a covariance matrix for random variables and a sample covariance matrix ( and perhaps a matrix of covariance estimators) and at least gave a link to an article on sample covariance matrices. I think all the articles on statistical items should perform a similar service for their respective topics - e.g. sample variance vs variance of a random variable, sample mean vs mean of a random variable. This would be extremely redundant and unnecessarily if we regard the Wikipedia as a big textbook. From that point of view, these distinctions should be made in one introductory chapter. But given that most readers consult invidual articles, I think it's a reasonable approach.

Tashiro (talk) 16:43, 30 September 2012 (UTC)

## Possible covariance matrices

What are the restrictions on what matrices can be covariance matrices? I guess the matrix has to be symmetric; is any symmetric matrix a possible covariance matrix? --Trovatore 23:11, 19 June 2006 (UTC)

A square matrix with real entries is a covariance matrix if and only if it is non-negative definite. If X is a column vector-valued random variable, then the expected value of XXT is the covariance matrix of the scalar compoments of X, so it should be clear why that has to be non-negative definite. By the spectral theorem in its finite-dimensional version, every non-negative definite real matrix M has a non-negative definite real square root, which let us call M1/2. Then let X be any column vector of the right size whose entries are random variables the variance of each of which is 1 and the covariance between any two of which is 0. Then the covariance matrix of the entries in M1/2X is M. So any non-negative definite matrix is a covariance matrix. Michael Hardy 17:09, 20 June 2006 (UTC)
Suppose you know all diagonal entries and some off-diagonal, and you want to generate a nonnegative-definite symmetric matrix having those entries. Is there an efficient way to generate such a thing, in the large-finite-dimensional case? --Trovatore 22:48, 23 June 2006 (UTC)

## Expected value operator

In this definition, 'expected value operator' (mu)is used. Per my Excel program explanation of covariance, mu is just the average. Isn't it simpler to just say that mu is just the average value of X rather than the 'expected value'? —The preceding unsigned comment was added by Steve 10-Jan-0771.121.7.79 (talk) 03:41, 11 January 2007 (UTC).

Yes, but "average" is quite ambiguous. It could be a weighted mean, the geometric mean, the median, or lots of other things. The "expected value" is a standard term for the equally-weighted arithmetic mean. LachlanA 00:30, 22 January 2007 (UTC)

Also, terms like mean and covariance can be used for the estimators as well as for the parameters they estimate, whereas expected value has no such ambiguity. Btyner (talk) 18:58, 13 January 2008 (UTC)

You need to add what the typical E function is that is used in practice. That is, you don't divide by N, but N-1 typically. I frankly think using expectation is a mistake, as it makes this article more difficult for the newbie than it needs to be. Sure, it may be more general, but that doesn't mean more helpful, clear, or useful. —Preceding unsigned comment added by 71.111.251.229 (talk) 05:53, 2 March 2008 (UTC)

## Introduction

"In statistics and probability theory, the covariance matrix is a matrix of covariances between elements of a vector. It is the natural generalization to higher dimensions of the concept of the variance of a scalar-valued random variable.

If X is a column vector with n scalar random variable components, and μk is the expected value of the kth element of X, i.e., μk = E(Xk), then the covariance matrix is defined as:"

I guess many people, like me, come to visit this article because they want to do some kind of statistical analysis within their studies or other related work. Of course I can only speak for myself but the explanation that a covariance matrix is "a matrix of covariances" did not really help. Also the fact that it's a natural generalization to higher dimension of some concept didn't improve my understanding (and looking at the other comments I'm appearently not alone). Maybe somebody could add an introductory explanation or even a section in the text where this concept is explained for the uninitiated. 84.168.17.109 10:43, 7 February 2007 (UTC)