Talk:Sample mean and sample covariance
|Sample mean and sample covariance has been listed as a level-4 vital article in Mathematics. If you can improve it, please do. This article has been rated as Start-Class.|
|WikiProject Statistics||(Rated Start-class, Mid-importance)|
|This topic is in need of attention from an expert on the subject.
The section or sections that need attention may be noted in a message below.
Perhaps someone more knowledgeable than me can add an explanation what a weighted sample and its covariance actually mean, esp. from the probability theory point of view (sample space). I know of 3 contexts: 1. weighted linear regression 2. biased samples, e.g., Efromovich, (2004) 3. weighted ensembles, as in Particle filters. Jmath666 06:24, 12 March 2007 (UTC)
Should the individual entries for the sample mean have a (1/N) in front of it? Or am I missing something here? --WillBecker 11:05, 2 May 2007 (UTC)
- Thank you, I have fixed that now. You can be bold and make changes yourself. I do not own this (or any other page) even if I made most of the edits here to date. Thanks again for your help. Jmath666 20:49, 2 May 2007 (UTC)
Can we change the notation from x bar to \mu_x ? There just seems to be a ton of x's flying around and it may be easier to follow with mu's in some places?daviddoria (talk) 18:44, 11 September 2008 (UTC)
Samples weighted with a matrix?
- It could be called the sample variance, but that term is also used for the epression with n − 1 in the denominator. It is the maximum likelihood estimate of the population variance if the sample is taken to be from a normal distribution of unkown population variance. Michael Hardy (talk) 16:48, 14 November 2008 (UTC)
Is the formula for the weighted covariance really correct? I'm wondering about the normalizing part... If the formula from the cited page is correct, the denominator should have the wrong sign? 188.8.131.52 (talk) 11:06, 26 November 2008 (UTC)
- The denominator is positive, all fine there. Let me show what happens in the special case of equal weights:
The page says:
- "The reason the sample covariance matrix has in the denominator rather than is essentially that the population mean is not known and is replaced by the sample mean . If the population mean is known, the analogous unbiased estimate
- using the population mean, has in the denominator."
This is essentially meaningless, since the direct implication is not spelled out. Why does the population mean cause the denominator to increase by 1? If the causal link is not present, it seems equivalent to saying something like "The denominator is N-1 because Iran is a country" or "because cheese is made from milk." While this seems ridiculous to someone familiar with math, those who are not gather nothing from that statement. --184.108.40.206 (talk) 23:14, 12 October 2011 (UTC)
- Agree; I fixed this by noting that the sample mean is correlated with the sample it's being compared against and refering to Bessel's correction for more details. Eamon Nerbonne (talk) 12:17, 15 October 2011 (UTC)
Text structure / row vs. column vectors.
Usually, random vectors are presented as column vectors; e.g. see the Random Vector page. This page presents them as row vectors, which is potentially confusing. I think we should swap that.
Also, the page is complex; there's lots of variables and lots of subscripts; some of which are used before they are introduced, and many of which are introduced without clear context. For example, the xij variable is introduced before x, i, and j are independently. I think a lot of this complexity can simply be dropped. E.g.:
- The sample mean vector is a row vector whose jth element (j = 1, ..., K) is the average value of the N observations on the jth random variable. Thus the sample mean vector is the average of the row vectors of observations on the K variables:
- ==> change to ==>
- The sample mean vector is the element-wise mean of 's observations. So the jth element of is the average of the jth elements of the observations of :
We really don't need to repeat the fact that there are N observations and K elements. Whether it's a row or column vector, what exactly the range of the indexes is, or how many variables there is all not central to the point; this text (and lots of other bits) are basically piloting prose to intuitively highlight the subsequent formula. For the readers that don't know the topic in detail, it's just confusing; and for those that do but just want to look up a detail it's telling them something they know and making the actual formula harder to find.
So I'd propose using the normal approach of column vectors, to introduce the variables a little more elaborately, and to rephrase text focusing on the intent/intuition behind the formula rather than a precise replacement for the formula. Finally, it'd be nice to add a variant formula using matrix/vector notation rather than elementwise sums; all those subscripts make it look more complicated than it is. Eamon Nerbonne (talk) 12:15, 15 October 2011 (UTC)