# Talk:Estimation of covariance matrices

WikiProject Statistics (Rated C-class, High-importance)

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

C  This article has been rated as C-Class on the quality scale.
High  This article has been rated as High-importance on the importance scale.
WikiProject Economics (Rated C-class)
This article is within the scope of WikiProject Economics, a collaborative effort to improve the coverage of Economics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
C  This article has been rated as C-Class on the project's quality scale.
WikiProject Systems (Rated C-class)
C  This article has been rated as C-Class on the project's quality scale.
WikiProject Mathematics (Rated C-class, Low-importance)
This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Mathematics rating:
 C Class
 Low Importance
Field: Applied mathematics

## Biased vs. Unbiased estimate

Shouldnt we also note that the MLE is biased, and include a section on an unbiased estimator?

Perhaps, but I don't see any urgency about it. Most reasonable estimators, including most MLEs, are biased. Michael Hardy 22:00, 2 May 2005 (UTC)

Michael, please see if my piece fits well enough. -- Stas Kolenikov

Guys, please at least put some links for unbiased discussions. I got some strange phenomenon when I was doing a project, so I adjust the diagonal eleme nts to unbiased variance estimates. It seems work better. Jackzhp 12:48, 21 October 2006 (UTC)

## notation

I'd like to point out that the symbol $\Sigma$ starts out denoting a parameter, but then is later used as an estimator. Btyner 19:40, 9 February 2007 (UTC)

## New summary and further proposed changes

I have rewritten the summary, putting the maximum likelihood estimator in perspective.

The use X and x should be made consistent throughout the article.

I am not sure about the purpose of the statement

Although no one is surprised that the estimator of the population covariance matrix is [closely related to] the sample covariance matrix, the mathematical derivation is perhaps not widely known and is surprisingly subtle and elegant.

for the following reasons:

• The form with 1/n is not what is commonly called sample covariance matrix. For now I have added the text "closely related to" but that weakens the surprise part.
• Subtlety, elegance, and surprise are subjective, and the information contents of the statement is zero.
• The argument is nice but not all that subtle and elegant for that to be notable. (In my opinion, see the note above about subjective.)

Jmath666 14:34, 25 March 2007 (UTC)

I have now removed the statement above as proposed. Jmath666 23:05, 31 March 2007 (UTC)

## linking to the article on "degrees of freedom"

In the beginning of the article, where we explaining way we divide by the factor n-1 rather than with n (in the unbiased estimator for the cov matrix), it is said that the reason is because the mean is not known and is replaced by the sample mean. Of course this is true. And more then that, it directs us to the notion of degrees of freedom. and thus should link to the article on Degrees of freedom (statistics). What do you think ? Talgalili 13:04, 25 August 2007 (UTC)

## confusion

quote: Given a sample consisting of independent observations X1,..., Xn of a random vector XRp×1 (a p×1 column),

It has been said in wikipedia:Village_pump_(proposals)#Wait!!! that this should be clarified to the benefit of nonexpert readers. I too find it obscure. Is X1 an observation of a random vector X with p components, or is X1 an observation of the first composant of a random vector X with n components? Is p=n? If not, what is p? Conventionally, (X1,..., Xn) are the components of the n-vector X. That is apparently not the case here. Please clarify. Bo Jacoby (talk) 08:26, 9 August 2008 (UTC).

I think this just a typo, so I fixed it. Pdbailey (talk) 03:22, 11 August 2008 (UTC)

Thank you, Pdbayley, for your edit. However the typo, if it was a typo? is also present in the very first line of the article. Perhaps the covariance matrix is a p×p matrix, and each $X_i\,$ and $\overline{X}$ are p-vectors? I now tend to think that this is the case. Probably the observations should be written x rather than X.

${1 \over {n-1}}\sum_{i=1}^n (x_i-\overline{x})(x_i-\overline{x})^T,$
where
$\overline{x}={1 \over {n}}\sum_{i=1}^n x_i$
is the sample mean.

Bo Jacoby (talk) 08:14, 11 August 2008 (UTC).

I'm surprised there was confusion about this. Pdbailey was clearly wrong. I'll take a look to see that the article is clear about this. The Wishart matrix here is supposed to be p × p; the sample size is supposed to be n. Michael Hardy (talk) 12:55, 11 August 2008 (UTC)
I'm not, while one can figure it out, at first glance the wording and typesetting are really confusing. There are two possible interpretations of the text. (1) xi is being used to represent realizations of a random vector with range Rp×1. Problem: how can a function from an event space to Rp×1 be a member of Rp×1? And isn't an observation a scalar (and not a set of observations = a vector)? These suggest (2) X is a realization (and hence a member of the range of the RV), and that xi is an individual element of X. But why would you estimate the covariance if you knew that the elements of X were independent? Plus, lowercase with subscripts is almost always used to represent scalars.
But, that's all just at first glance, having spent a few minutes looking at the article (which I should have done before). I can see that the first interpretation is right and that the text/typesetting is just a little ham handed. What do you think of this?
There are two problems with cleaning this up for me. (1) I'm not used to seeing a vector indexed with a subscript unless it is a member of a matrix. I don't know how to rectify this. (2) I'm not sure of capital X or a bold lowercase X would be clearer. Pdbailey (talk) 04:35, 12 August 2008 (UTC)

Given a sample consisting of independent observations X1,..., Xn of a random vector so that, for each vector, XiRp×1 (a p×1 column), an unbiased estimator of the (p×p) covariance matrix

$\operatorname{cov}(X) = \operatorname{E}((X-\operatorname{E}(X))(X-\operatorname{E}(X))^T)$

is the sample covariance matrix

${1 \over {n-1}}\sum_{i=1}^n (X_i-\overline{X})(X_i-\overline{X})^T,$

where the vector $\overline{X}$ is given by

$\overline{X}={1 \over {n}}\sum_{i=1}^n X_i$
Okay, Sample_covariance_matrix does this a lot better. Maybe we should just start with that version. Is there any reason not to merge the two articles? Pdbailey (talk) 15:07, 12 August 2008 (UTC)

## Rude opening sentence

I was startled at how abrupt and devoid of context-setting the opening sentence is:

Given a sample consisting of independent observations x1,..., xn of a random vector XRp×1 (a p×1 column), an unbiased estimator of the (p×p) covariance matrix[.....]

I wondered if I had written that. I looked at the history. I did not write that. I knew that some people start Wikipedia articles like this; I hadn't realized that some people alter Wikipedia articles with proper introductory material so that they read like that. Michael Hardy (talk) 13:03, 11 August 2008 (UTC)

...and now I've restored the introductory paragraph. Michael Hardy (talk) 13:26, 11 August 2008 (UTC)

I have removed the restored para, as the article is not immediately about any of

• the Wishsart distribution
• estimation assuming multivariate normality
• maximum likelihood estimation,

The point made was already covered later in the article, but might be given a different placement if emphasis is required. I have added a more general intro and tried to separate-off the bits related to the normal distribution by putting them into a separate section. Melcombe (talk) 15:04, 11 August 2008 (UTC)

## Convergence in the general case

What is known about the general, non-gaussian case? In what sense does the sample covariance estimate/converges to the covariance matrix then? What if the random vectors are not independent but only approximately so? Jmath666 (talk) 05:45, 9 November 2008 (UTC)

Interesting question. One could wonder if something like the Gauss–Markov theorem could be done. Michael Hardy (talk) 20:24, 6 May 2009 (UTC)

## Issues with a new section

I have some issues with the new section recently added. I've commented it out in the article for now.

### Maximum likelihood estimation: general case

The first-order conditions for a MLE of parameter θ are that the first derivative of the log-likelihood function should be null at θMLE. Intuitively, the second derivative of the log-likelihood function indicates its curvature : the higher it is, the better identified θMLE since the likelihood function will be inverse-V-shaped around θMLE. Formally, it can be proved that

$\sqrt{T}(\theta_\text{MLE}-\theta) \rightarrow \mathcal{N}(0,\Omega) \,$

where $\Omega$ can be estimated by

$\left(-\frac{1}{T}\sum_{t=1}^T \frac{\partial^2 \ell_t}{\partial \theta \, \partial \theta '} (\theta_\text{MLE})\right)^{-1}.$

### (end of new section)

In this case "general case" appears not to mention covariance matrices at all, but as applied to covariance matrices being estimated, it seems to say that the sample covariance matrix has approximately a normal distribution with a variance Ω. I'm not sure I know what is meant by the variance of a matrix-valued random variable. I may have seen it defined at some point in the past, but I think if one writes about such a thing here, the article should explain what it is. And the parameter θ would in this case be a positive-definite matrix. How do you compute those partial derivatives with respect to a positive-definite matrix-valued random variable? I don't know for sure whether I've seen such a thing, but I think if that is to be done here, the article should explain what it is.

Maybe the new paragraph belongs in the main maximum likelihood article or one of the other related articles. Michael Hardy (talk) 20:23, 6 May 2009 (UTC)

My interpretation is that this new stuff is about estimating the covariance matrix of an estimate of a parameter vector obtained my maximum likelihood ... so not the covariance matrix of a population of which there is a direct sample as is the apparent intention of the rest of the article so far. Whether the article content should be extended to cover this topic is, I suppose, open. The article Unbiased estimation of standard deviation, which might be thought parallel to this one, presently covers both the population stdev and the stdev of the sample mean. Melcombe (talk) 08:48, 7 May 2009 (UTC)

## Median-unbiased estimation

Donald Andrews and others have written on median-unbiased estimators, with application to time series analysis.  Kiefer.Wolfowitz  (Discussion) 14:59, 13 April 2011 (UTC)