From Wikipedia, the free encyclopedia
Jump to: navigation, search
          This article is of interest to the following WikiProjects:
WikiProject Mathematics (Rated C-class, High-importance)
WikiProject Mathematics
This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Mathematics rating:
C Class
High Importance
 Field: Probability and statistics
One of the 500 most frequently viewed mathematics articles.
WikiProject Statistics (Rated C-class, High-importance)
WikiProject icon

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

C-Class article C  This article has been rated as C-Class on the quality scale.
 High  This article has been rated as High-importance on the importance scale.
WikiProject Citizendium Porting    (Inactive)
WikiProject icon This article is within the scope of WikiProject Citizendium Porting, a project which is currently considered to be inactive.


An arithmetic example, especially yielding a covariance value for an example population and then illustrating its use and usefulness would help. (talk) 21:47, 14 July 2010 (UTC)

Unclear line 3[edit]

I think the following is unclear, can someone clarify it? (Since it's not clear to me, I doubt I'm the right one to clarify it... (3) positive definite: Var(X) = Cov(X, X) ≥ 0, and Cov(X, X) = 0 -> X is a constant random variable (K). Pt314156 15:38, 27 April 2007 (UTC)pt314156


I question the phrase 'Random Variables' - surely if they were truly random variables, there could be no question of any meaningful correlation, and the whole concept would lose all utility - <email removed> 12-July-2005 00:59 Sydney Earth :)

I agree that the phrase 'Random Variables' is a suboptimal naming construct. It tries to illude that Statistics is an exact science, when in fact it should be named 'Observations' to recognize its empirical heritage. The name should honor its real root: Analysis of data by means of rules made up for the practical purpose of getting useful information from the data :) (talk) 17:55, 18 December 2011 (UTC)

Please don't be a total crackpot. If you want tutoring or instruction on these concepts, ask someone. Don't use the word "would", as if you were talking about something hypothetical. Admit your ignorance. Simple example: Pick a married couple randomly from a large population. The husband's height is a random variable, the randomness coming from the fact that the couple was chosen randomly. The wife's height, for the same reason, is a random variable. The two are correlated, in that the conditional probability that the wife is more than six feet tall, given that the husband is more than six-and-a-half feet tall, is different from the unconditional probability that the that the wife is more than six feet tall. See the Wikipedia article on statistical independence of random variables; some pairs of random variables are independent; some are not. Another example: the square of the husband's age is a random variable; the husband's age is another random variable. And they are correlated. As for utility, just look at all of the practical applications of these concepts in statistical analysis. Michael Hardy 21:04, 11 July 2005 (UTC)

not sure about my edit what are ì and í ?

The converse however is not true ? Do you mean than Covariance could be 0 with dependant variables ?

Yes. example: y=x² ( where -1<=x<=1 ). Cov=0. Dependant? Yes, it's a function!

It is not clear what 'E' means in the equation E(XY).

E(XY) = (1/N)*sum((X_i*Y_i), i=1..N), perhaps there should be the sum definition explicitly stated in this article as well as the expectation value article? --dwee

That formula is correct only in the case in which the number of possible values of X and Y is some finite number N. More generally, the expectation could be an infinite series or an integral. At any rate, E(X) is the expected value of the random variable X. Michael Hardy 01:59, 7 Oct 2004 (UTC)
Regardless of the usages of the expected value in higher level theoretical statistics, the finite sum version should be included since it is the formula that most people learn when they are first taught covariance. It's also in common practical useage, unlike the expected value formula, sample sizes aren't infinite (mostly). Either put both formulas in or just the finite sum formula. Every other source I can find on the internet has the finite sum definition, for example:

-Trotsky — Preceding unsigned comment added by (talk) 20:22, 8 June 2012 (UTC)

Just to note that the second equation is not rendered.

It is on the browser I am using. Michael Hardy 20:26, 6 Jan 2004 (UTC)

the covarioance definition for vector valued random variables looks very sophisticated...

That seems like a completely subjective statement. Michael Hardy 21:51, 3 Jan 2005 (UTC)

what is the actual reason for defining a thing like this ? Why not put the E(X_iY_j) entries into a table,

That's exactly what it says is done! Michael Hardy 21:51, 3 Jan 2005 (UTC)

why a matrix is needed ?

The only difference between a "table" and a matrix is that one has a definition of matrix multiplication. And one often has occasion to multiply these matrices. Michael Hardy 21:51, 3 Jan 2005 (UTC)

your "explanation" does not explain anything as any table can be treated as matrix and multiplied with other tables. This of course does not make much sense in general. So the actual question is : why the definition (and multiplication) of a matrix with entries E(X_iY_j) makes sense ?

Your original question was completely unclear, to say the least. Maybe I'll write something on motivation of this definition at some point. Michael Hardy 20:23, 4 Jan 2005 (UTC)
"For column-vector valued random variables X and Y with respective expected values μ and ν, and n and m scalar components respectively, the covariance is defined to be the n×m matrix"
I don't get it either; how do you get a matrix as the cov(X,Y) when normally it is a scalar? Probably I am not understanding how you are defining X and Y. To me it sounds like the m or n components are just sample values of random variables X and Y? --Chinasaur 02:00, 1 Apr 2005 (UTC)


I'm sorry but this page makes UTTERLY NO SENSE, could someone please add in a paragraph explaining things for those of us that aren't super mathmaticians? 16:24, 25 July 2005 (UTC)


The covariance of two column vectors is stated to generate a matrix. Is there an similar function to covariance which generates a single scalar instead of a matrix by instead multiplying the transpose of the first term against the unaltered column vector? Is there a reference where we could find the derivations of these terms? Ben hall 15:09, 16 September 2005 (UTC)

If X and Y are both n × 1 random vectors, so that cov(X,Y) is n × n, then the trace of the covariance may perhaps be what you're looking for. Michael Hardy 18:29, 16 September 2005 (UTC)
Sorry, but I think I may have expressed myself poorly. I was thinking in terms of a list of variables which could be described as a vector, or as a matrix. For example if I have the cartesian coordinates of particles in a box over a period of time, I see how I can find the covariance matrix based on each of the components for each of the particles but I cannot see how I might find a covariance matrix based solely on the motions of the particles with respect to one another (ie if they are moving in the same direction or opposing directions). For this would it be suitable to take the inner product of the differences between the cartesian coordinates and their averages? Also, how could I show that this is a valid approach? Thanks for your suggestions so far. Ben hall 20:02, 17 September 2005 (UTC)


In probability theory and statistics, the covariance between two real-valued random variables X and Y, with expected values E(X) = μ and E(Y) = ν is defined as: -- it is unclear that we speak about n sets of variables {X} and {Y}. I suggest starting with In probability theory and statistics, the covariance between two real-valued random variables X and Y in the given sets {X} and {Y}, with expected values E(X) = μ and E(Y) = ν is defined as:. This is unclear too, but more explanatory why do you ever speak about E(X) and E(Y) without noting they are sets of variables, not just 'variables'. Also the speech about mean values as well as expected values in th same meaning would be preferrable. I would also suggest adding a link to

Please comment on my comments :) I will start reorganizing the article if there will be no comments within a month. --GrAndrew 13:07, 21 April 2006 (UTC)

Having a PhD in statistics somehow fails to enable me to understand what you're saying. What is this thing you're calling "n"?? Why do you speak of X and Y as being "in the given sets {X} and {Y}, and what in the world does that mean? If you change the sentence to say that, I will certainly revert. The first sentence looks fine to me, and your proposed change to it looks very bad in a number of ways, not only because it's completely cryptic. The covariance is between random variables, not between sets of random variables. And to refer to "n" without saying what it is would be stupid. Michael Hardy 18:20, 21 April 2006 (UTC)

I've just looked at that article on mathworld. It's quite confused. This article is clearer and more accurate. I'm still trying to guess what you mean by saying "they are sets of variables, not just 'variables'". In fact, they are just random variables, not sets of random variables. Michael Hardy 21:01, 21 April 2006 (UTC)

I would guess that he is coming at the problem from a time series analysis point of view and thinking of covariance in terms of a statistic calculated using a time series of sampled data. To someone from that background it can seem confusing to think of covariance calculated for a single variable when, in practice, you calculate it from a set of data. This is not to say I think the article should be changed though perhaps if I found time to add something about calculating sample covariance on data it would clarify. --Richard Clegg 12:45, 24 April 2006 (UTC)

Positive feedback[edit]

Just want to say that as a high school student working in a lab, I found this article (especially the first section) to be exceptionally well written. Most higher-level math pages are overly pedantic and throw terminology around far to much, only occaisionally linking to equally poorly defined articles. As an outside observer, I think this article should be an example to other mathematical pages. Etiher that, or I've become far too familar with this type of subject matter! Thanks for making my work eaiser to understand!

Thanks to the above - nice to haev positive feedback! Johnbibby 19:38, 1 September 2006 (UTC)


After the latest edit, the article includes this:

If X and Y are independent, then their covariance is zero. This follows because under independence,
E(X \cdot Y)=E(X) \cdot E(Y)=\mu\nu,
The converse, however, is not true: it is possible that X and Y are not independent, yet their covariance is zero. This is because although under statistical independence,
E(X \cdot Y)=E(X) \cdot E(Y)=\mu\nu,
the converse is not true.

The second sentence seems to be just a repetition of the first. Why is the addition of the second sentence, repeating the content of the first, an improvement? Michael Hardy

18:15, 21 August 2006 (UTC)

Clarification: On second thought, what I meant was: the part that begins with "This is because..." and ends with "... is not true" seems to repeat what came before it. Michael Hardy 18:24, 21 August 2006 (UTC)

You're right! I introduced some (partial) redundancy - I'll go back to amend it.

{I'm new to Wikipedia & still haven't sorted this 'Talk' thing out yet - so please bear with me while I learn!} Johnbibby 16:59, 22 August 2006 (UTC)

algorithm to estimate[edit]

This article should include (an easily understandable) outline of an algorithm to estimate the covariance between two sets of (finite) N measurements of variables X and Y. A note should be included about maximum likelihood vs. unbiased estimators and how to convert between the two.

eg. start with two random variables X and Y, each with N measured values

X = { X_1, X_2, ... , X_N } = { X_n }, n = 1 ... N
Y = { Y_1, Y_2, ... , Y_N }

estimate their means

muX = sum(X_i)/N, i = 1 ... N
muY = sum(Y_i)/N, i = 1 ... N

'centre' the values about their estimated mean

centreX = { X_i - mu_X }, i = 1 ... N
centreY = { Y_i - mu_Y }, i = 1 ... N

then estimate the Covarance of X and Y

Cov(X, Y) = sum( centreX_i * centreY_i ) / (N - 1), i = 1 ... N (unbiased)
Cov(X, Y) = sum( centreX_i * centreY_i ) / N, i = 1 ... N (maximum likelihood)

I'm not entirely sure that I've got unbiased / maximum likelihood correct, but this paper seems to agree at the bottom of page 1 and top of page 2. 23:34, 30 August 2006 (UTC)

outer product?[edit]

Outer_product#Applications mentions that the outer product can be used for computing the covariance and auto-covariance matrices for two random variables. How this is accomplished should be outlined on this page, or that page... somewhere. 00:53, 31 August 2006 (UTC)

unbiased estimation[edit]

For variance, there is a biased estimation,and an unbiased estimation. Is there any unbiased estimation for covariance? Jackzhp 17:55, 2 February 2007 (UTC)

Yes. See sample covariance matrix. Prax54 (talk) 19:33, 10 August 2012 (UTC)

How about a beginners definition[edit]

Don't get me wrong, I think it's great that wikipedia has brainiacs that want to include all kinds of details. But how about a really good definition for math newbies? How about a metaphor or an example for folks that only want to understand it enough to complete a conversation and then go back to their relatively mathless life? —The preceding unsigned comment was added by Tghounsell (talkcontribs) 01:09, 20 March 2007 (UTC).

Myu vs v thing[edit]

Why are we using a little v thing for the mean of y? Shouldn't we use μy ?? Fresheneesz 07:28, 21 March 2007 (UTC)

   ν is the Greek letter "nu", which comes after μ in the alphabet. 17:47, 29 March 2007 (UTC)

Inner product[edit]

The last part of the section on inner product is not clear. Perhaps someone could explain better why random variables is in quotes, for example.

  • "It follows that covariance is an inner product over a vector space of "random variables", with a(X) = (aX) and X + Y = (X + Y). "Random variables" is in quotes because it is not true that X + K is distributed the same as X for any constant K; but as long as these three basic properties of covariance apply, the duals of theorems regarding inner products that depend only on those properties will be valid."

Of course X + K is not distributed the same as X; usually the mean will be different. That doesn't explain why "random variables" is in quotation marks. Maybe it should say it's because a constant K isn't really a random variable? Or just remove the quotation marks? Using quotation marks to indicate vagueness in a math article may not be a good idea; better to state it a different way, correctly. --Coppertwig 12:34, 17 June 2007 (UTC)

OK, I think I fixed it. --Coppertwig 12:46, 17 June 2007 (UTC)

Sample Covariance[edit]

Can we put a more prominent link to Sample Covariance? It is important to tell readers right away that if they are actually looking to construct a covariance matrix, they need to see the other page.daviddoria (talk) 18:45, 11 September 2008 (UTC)

Drop \mu and \nu?[edit]

It seems to me that abbreviating \mu := E(X) and \nu := E(Y) adds, well, absolutely nothing to the article.

The definition section, for example, would read like this:

The covariance between two real-valued random variables X and Y, with expected values \scriptstyle E(X) and \scriptstyle E(Y) is defined as

\operatorname{Cov}(X, Y) = \operatorname{E}((X - \operatorname{E}(X)) (Y - \operatorname{E}(Y))), \,

where E is the expected value operator, as above. This can also be written:

\operatorname{Cov}(X, Y) = \operatorname{E}(X \cdot Y - \operatorname{E}(X) \cdot Y - X \cdot \operatorname{E}(Y) + \operatorname{E}(X) \cdot \operatorname{E}(Y)), \,
\operatorname{Cov}(X, Y) = \operatorname{E}(X \cdot Y) - \operatorname{E}(X) \cdot \operatorname{E}(Y) - \operatorname{E}(X) \cdot \operatorname{E}(Y) + \operatorname{E}(X) \cdot \operatorname{E}(Y), \,
\operatorname{Cov}(X, Y) = \operatorname{E}(X \cdot Y) - \operatorname{E}(X) \cdot \operatorname{E}(Y). \,

Random variables whose covariance is zero are called uncorrelated.

If X and Y are independent, then their covariance is zero. This follows because under independence,

E(X \cdot Y)=E(X) \cdot E(Y).

Recalling the final form of the covariance derivation given above, and substituting, we get

\operatorname{Cov}(X, Y) = E(X) \cdot E(Y) - E(X) \cdot E(Y) = 0.

The converse, however, is generally not true: Some pairs of random variables have covariance zero although they are not independent. Under some additional assumptions, covariance zero sometimes does entail independence, as for example in the case of multivariate normal distributions.

The units of measurement of the covariance Cov(X, Y) are those of X times those of Y. By contrast, correlation, which depends on the covariance, is a dimensionless measure of linear dependence.

Using these symbols does add absolutely nothing to the article. However it removes some of the clutter in the formulae, making them easier to see and understand ... therefore a good thing I think. Melcombe (talk) 12:10, 15 January 2009 (UTC)

Add \sigma?[edit]

It is customary to use \sigmaXY to represent covariance(x,y) and \sigma2 to represent variance. Contrary to the above author's opinion that use of symbols adds nothing to the article, I believe that adding customary symbol usage will help people coming to this page while viewing material containing those symbols.Tedtoal (talk) 22:38, 12 March 2011 (UTC)

More Citations Please[edit]

I'd like to see more citations on this page, particularly having to do with the more abstract treatment toward the end. Wikipedia should act as a portal to more detailed information on the web. I am having a hard time finding treatments of the covariance function, but the person who wrote the information on inner products, Banach spaces, etc should have the information (or else it shouldn't be there!).Trashbird1240 (talk) 20:24, 13 May 2009 (UTC)

citations and simplifications[edit]

I don't do this much, so pardon if I inadvertently break rules. My copy of [1], defines the covariance of two random variables x and y as

\operatorname{Cov}(x, y) = E((x - \mu)(y - \nu))

using \mu = E(x), etc. which simplifies to

\operatorname{Cov}(x, y) = E(xy) - E(x) E(y)

(see p. 152, Papoulis) This seems like a simpler formulation than that that appears in the article. I also think that the concept of random variables is not enough emphasized and linked. It is really impossible to understand this discussion without understanding what random variables are. In Papoulis they are always shown in boldface, which is quite helpful, but I don't know how to do that here. -- Alanyoder (talk) 03:47, 2 September 2009 (UTC)

Covariance and Covariance matrix[edit]

In this article we use \operatorname{Cov}(X, Y) to talk about both the covariance and the covariance matrix. I think that this is confusing. --Belchman (talk) 16:03, 31 October 2009 (UTC)

Covariance operator, etc.[edit]

I think the notation currently used in the article is quite confusing. We write Cov(X,Y) for the covariance of two random variables X and Y (which is pretty standard), but later on we also write Cov(x,y) for the “variance-covariance matrix” of a third random variable Z, which somehow isn't even present in the notation, whereas x and y here are just auxiliary quantities demonstrating how the Cov bilinear form acts. So, the covariance operator C:HH of random variable Z is defined as

C_Z(f) = \operatorname{E}\big[ (Z-\operatorname{E}Z, f) \cdot (Z-\operatorname{E}Z) \big]

Oh and btw, some of the definitions in text are missing this -\operatorname{E}Z part. Also, the definition of covariance for the function-valued r.e. valued follows directly from the definition for the generic Hilbert space H, since random processes are defined on the Hilbert space L² of square-integrable functions (or at least we'd need square integrability for covariance to be defined). // stpasha » 09:31, 25 February 2010 (UTC)

Definition, X,Y need to be integrable[edit]

In the current version there are given 4 definitions, but we would have to agree on the first one (in my understanding the most common) since the third and forth are only equivalent if X and Y as well as their product is integrable. Quiet photon (talk) 18:58, 4 March 2010 (UTC) Actually you could require square-integrability from the start. I will add that. Quiet photon (talk) 19:17, 4 March 2010 (UTC)


I very much doubt wat is written in the section "incremental computation". An incremental form of computation exists for the sample covariance. (talk) 08:29, 21 April 2010 (UTC)

Missing topic: Covariance in error estimation[edit]

What I am completely missing in this article is at least one section about how to use covariance in error estimations. For example, the plotting tool Gnuplot calculates the covariance matrix of a linear fit f(x)=a+b*x to a x-y data sample. The covariance of params a,b may be -0.9...something, while the covariance of the data themselves is +0.6...something. To calculate the covariance of the errors one needs to know, how to convert the information from this article into this "second order" problem: I.e., what is the expectation value of each parameter, how does one define the errors etc. How does one define a sample of (a,b)? Just by picking random values around the best-fit a,b and then applying the covariance formulae? Unfortunately, gnuplot.pdf doesn't spend any word about this. If there is any expert on such class of (highly important) problems, his or her contribution to this article would be highly appreciated.--SiriusB (talk) 11:14, 8 April 2011 (UTC)

I don't think this article can realistically cover how to calculate covariance of parameter estimates for every statistical model. It's more appropriate to cover that in the article for each model. Sounds like you're thinking about simple linear regression which doesn't currently include this (but perhaps should); however, simple linear regression is a special case of ordinary least squares, and that article does include the formula for the variance-covariance matrix under ordinary least squares#Finite sample properties. Qwfp (talk) 19:21, 8 April 2011 (UTC)
No, I am talking about what gnuplot does for each fit: The correlation matrix (which is closely related to the covariance matrix) shows the correlation of the parameters (a,b), not the x-y-data. I would guess that a large fraction of users looking up this article want to know, how to get this correlation. Meanwhile, I've found out, but did not yet find a citeable source: Calculate an array for both parameters, scanned around the confidence ellipse with sufficient resolution. Take all (a,b) tuples with a chi-square corresponding to sigma <3 (or about this), and calculate the covariance matrix the same way described here for (x,y) points, but with the weights given by the corresponding (exclusion) probability (ca. 0.3% for 3 sigma, and larger inside the 3-sigma contour). A more sophisticated method could include Markov chains, or the ellipse axes (for near-Gaussian errors) instead.--SiriusB (talk) 07:52, 11 April 2011 (UTC)
If you're not talking about the correlation or covariance matrix of the estimates of the parameters of a simple linear regression model, I have no idea what you are talking about. Maybe a screenshot or text output sample from gnuplot would help. Qwfp (talk) 08:26, 11 April 2011 (UTC)

A simpler definition exists[edit]

Hello, I don't think this page is all that great for the lay person; as a matter of fact, I think the Definition section over-complicates things. Wolfram MathWorld has us beat here: Anyone object to me incorporating this information into this page? I'm particularly looking for feedback from folks who are significantly more stats-minded than myself. :) Thanks, dmnapolitano (talk) 18:17, 27 September 2011 (UTC)

Anyway: the present definition in math world many seem simpler, but is not correct. Nijdam (talk) 10:28, 9 June 2012 (UTC)

Geometrical Interpretation[edit]

@Tfkhang: I don't understand what you mean to say witrh:

Geometrically, the covariance can be thought of as the sum of signed rectangular areas, where those lying to the upper right and lower left quadrants relative to the mean of X and Y are positively signed, and those lying elsewhere are negatively signed.

It is not correct anyway. Nijdam (talk) 10:59, 4 December 2011 (UTC)

@Nijdam: I have made a diagram and tried to explain things clearer.

Tfkhang (talk) 00:21, 5 December 2011 (UTC)

Sorry, but this does not contribute to any understanding of covariance. Besides it is definitely own research, and it is not a correct way of describing. If you want to improve the article, with a geometrical interpretation, find a reliable source and discuss your idea here on the talk page first, before changing the article. Nijdam (talk) 10:05, 5 December 2011 (UTC)

@Nijdam: I would appreciate it if you could explain why it "does not contribute to any understanding of covariance", instead of dismissing it.

Tfkhang (talk) 11:36, 5 December 2011 (UTC)

In the first place, if there would be a geometric interpretation, it should not be considered a property.

Let (X,Y) be a paired observation.

>>>if it's about observations, this is about the sample covariance and not the variance

Geometrically, we can think of the covariance of X and Y as the average of the sum of signed rectangular areas induced by the relative position of (X,Y) to the coordinate for the mean of X and Y: (\mu_x, \mu_y).

>>>what are signed areas, and in what way are they induced?

>>>The (sample) covariance is certainly not an average of areas.

>>>The sample covariance is a measure relative to the sample means, not the expected values

This means that, by centering around (\mu_x, \mu_y), rectangles induced by coordinates to the upper right and bottom left would have positive signs, while those induced by coordinates to the upper left and bottom right would have negative signs.

>>>Something like this holds for the relative values, but not for areas.

Nijdam (talk) 15:48, 5 December 2011 (UTC)

@Nijdam: You have some constructive ideas here. I just wish you could engage my writing positively by adding/clarifying points instead of undoing it. At least, this is what I understand what wikipedia is about.

>>>if it's about observations, this is about the sample covariance and not the variance


>>>what are signed areas, and in what way are they induced?

I have a coordinate (x,y), and there's (mu_x, mu_y); I can induce a rectangle with these two coordinates as opposite corners by drawing a horizontal line from either coordinate (towards the other coordinate), and then connecting it to the other coordinate perpendicularly, repeating this until it returns to the original coordinate. Signed areas simply means that the area has a positive or negative sign, depending on where the rectangle is induced. The diagram makes this clear.

>>>The (sample) covariance is certainly not an average of areas.

I just said it could be thought of as such. If a way of thinking is a useful pedagogically, it should be made known. I don't think Wikipedia math/stats should end up reading like a Bourbakian treatise.

And I do not agree that this is "own research". The geometry is just begging to be seen from the formula for the sample covariance.

Tfkhang (talk) 09:22, 12 December 2011 (UTC)

There is no geometrical interpretation. May be you refer to a graphical understanding. The main thing a graphical presentation may clarify is the sign of the covariance as an indication of the type of relationship, in the sense that if the data show a positive linear tendency the majority of the data will be found in the quadrants on the right above and on the left behind the central point. These data contribute positive to the covariance, and the data in the other quadrants contribute negative. Is this what you want to explain? Nijdam (talk) 10:03, 12 December 2011 (UTC)

Comment from the top[edit]

I agree to the comment below. How about a motivation at start. What is it in principle? Does it have an easy understandable analogy. Is it a scalar number or a vector/matrix? Why use "expected value" when we can say it simpler as "average" (or mean). And "For random vectors X and Y" what does that mean? Does it mean vectors of random variables (a lot of random variables so to speak) or is it an array of random values? Later in the text from properties and so on, its better. But the beginning is the most important and should be as clear and as easy as possible. (Some notes about precision: The higher precision you aim for, the more complex is it gonna be. Like "average" (look up there) can be defined in many ways, but the idea stays the same. And what is "expected value"? Is it clearer? No. Is it more precise? Yes, since it have a stricter definition. But does it communicate well. No. It does not communicate well because humans tend to put meaning in words. And in my experience one can "expect" a wide range of possibilities. — Preceding unsigned comment added by (talk) 16:02, 18 December 2011 (UTC)

This page is not helpful to anyone. If you understand a word it says, you have no reason to read it. If you want to understand how to calculate covariance this doesnt help. I dont understand why Wikipedia in general must be so terrible at math, finance and statistics. I guess it is because whatever you write, it is incorrect in some special case and thus sentences like "with finite moments" must be added - to make it 100% correct - but 100% incomprehensible to most people. — Preceding unsigned comment added by (talk) 16:28, 17 December 2011 (UTC)

How is covariance related to other similar concepts like correlation? (Statistics need a concept-wash, using too many similar concepts in my opinion) (talk) 04:17, 22 December 2011 (UTC)

Better Formula?[edit]

The article states the last formula may cause catastrophic cancellation if used in computing, but the first formula clearly can't be computed directly in that form. So what equation should we use? (talk) 14:34, 29 May 2013 (UTC)

IS this correct?[edit]

The following was in the section "Properties", is it correct? (if so - we can add it again)

For sequences x1, ..., xn and y1, ..., ym of random variables, we have

\sigma\left(\sum_{i=1}^n {x_i}, \sum_{j=1}^m{y_j}\right) =    \sum_{i=1}^n{\sum_{j=1}^m{\sigma\left(x_i, y_j\right)}}.\,

Tal Galili (talk) 06:39, 29 September 2013 (UTC)

  1. ^ Papoulis. Probability, Random Variables and Stochastic Processes (McGraw-Hill, 1991)