Talk:Distance correlation

This is the talk page for discussing improvements to the Distance correlation article.
This is not a forum for general discussion of the article's subject.

Put new text under old text. Click here to start a new topic.
New to Wikipedia? Welcome! Learn to edit; get help.

Article policies

Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL

Statistics Mid‑importance

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics articles
Mid	This article has been rated as Mid-importance on the importance scale.

Mathematics Low‑priority

	Mathematics portal This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.MathematicsWikipedia:WikiProject MathematicsTemplate:WikiProject Mathematicsmathematics articles
Low	This article has been rated as Low-priority on the project's priority scale.

Problems with the article[edit]

I've gone back to the two cited articles by the original authors, and I have some problems relating some things here to there:

1. In "properties": "(ii) dcov_n = 0 if and only if every observation is the same." First, the same as what?: dcov refers to the dcov of two different variables. And this quote seems unlikely to be true, since it would preclude two non-constant variables from having a sample dcov of zero even if the variables are independent.

(1) It is correct but unclear.

The 2007 paper (page 1244, before Remark 2) says that dvar_n(X) = 0 iff every sample observation is identical. Am I correct that "dcov_n" in the present quote should be changed to "dvar_n"?

It would be correct that way. It is awkward to state (ii), better dvar_n.

2. In the section "Definitions#Distance covariance", it says "distance covariance is not the same as the covariance of distances, cov(|X-Y|, |Y-Y’|)". Should this say "cov(|X-X’|, |Y-Y’|)"? As it is it's not symmetric.

(2) you are correct; cov(|X-X’|, |Y-Y’|)

3. Still in the section "Definitions#Distance covariance", it says

"The population value of distance covariance [1][2] is

dcov(X,Y):= E|X-X’||Y-Y’| + E|X – X’| E|Y – Y’| - E|X – X’||Y – Y”| - E|X – X”||Y – Y’|

where E denotes expected value, X’ is an independent and identically distributed copy of X, Y’ is an independent and identically distributed copy of Y, finally X” (Y”) has the same distribution as X (Y) and independent not only of X (Y) but also of Y and Y’ (X and X’)."

I have a couple problems with this:

(a) Should it say that " X” (Y”) is independent not only of X and X’ (Y and Y’) but also ...."?

(b) I can't see how this definition relates to the one in the original papers (2007 and 2009). E.g. the closest thing I can find in the 2007 paper is in regard to the sample dcov, which is given (p. 2776, top and eq. 2.18) as

So I don't even see any mention of X” in the original paper. Duoduoduo (talk) 22:23, 21 December 2010 (UTC)[reply]

You want to check the later paper on Brownian Distance Covariance; this result is proved in the second part. You are correct that the equality is stated for population distance covariance. Looks like this section requires clarification.

Not the original poster but I will make corrections and clarifications asap. Thanks for catching the error. (modified my reply) Mathstat (talk) 23:28, 21 December 2010 (UTC)[reply]

Notational confusion[edit]

@Mathstat: Thanks for trying to clean up this article's notation. Maybe I'm just confused, but I think the difficulty arises in that the original 2007 and 2009 papers use two different meanings for dCov. The 2007 paper says on p. 2772: "The distance covariance (dCov) between random vectors X and Y with finite first moments is the nonnegative number V(X, Y ) defined by V²(X, Y ) = ...." Likewise, the 2009 paper says on pages 1236-7 "the distance covariance (dCov) statistic, derived in the next section, is the square root of V² ...."

But then for a while in the 2009 paper they use a different definition of dCov: on p. 1238 it says "This new notion Cov_U(X, Y ) contains as distinct special cases distance covariance V²(X, Y )...." Six lines later it says "A surprising result develops: the Brownian covariance is equal to the distance covariance" and later in that paragraph it says "we arrive at Cov_W(X, Y ) = V²(X, Y )." But then on p. 1241 it says "The distance covariance (dCov) between random vectors X and Y with finite first moments is the nonnegative number V(X, Y ) defined by V²(X, Y ) = ...", which appears to have been cut and pasted from the above quote in the 2007 paper. On p. 1249 it says "the Brownian covariance of X and Y is defined by W²(X, Y ) = ...", but it appears to mean that it is defined as the square root of this. Then on p. 1250 it says "The surprising coincidence: W = V" implying that both dCov and Brownian covariance are the positive square roots of V² and W².

So I'm confused. I hope you're able to sort all this out so as to use a consistent notation in the Wikipedia article. Duoduoduo (talk) 18:27, 4 February 2011 (UTC)[reply]

Yes, as you noticed, the notation in this Wikipedia article was not quite consistent with the notation in the 2007 and 2009 papers, and these recent changes are mainly to be consistent in notation. Concerning other notational matters in the Brownian covariance part, in SR2009 pp. 1248-1249 the Brownian covariance is defined in (3.4) and (3.6). In (3.4) it is stated that Brownian covariance is defined by its square W²(X, Y ), which parallels the definition of distance covariance in both papers. In (3.6) "Brownian covariance is defined by ... (equation 3.6 with W²(X, Y )). When reading the two pages it makes sense, but on p. 1249 it would be more clear if it said "Brownian covariance W" is defined by ..." or "is defined as the square root of ..." as you wrote here. Your sentence "The surprising coincidence: W = V" implying that both dCov and Brownian covariance are the positive square roots of V² and W². summarizes it well. Mathstat (talk) 19:28, 4 February 2011 (UTC)[reply]

Edits to Definitions, and miscellaneous[edit]

- Sorry I made major edits without posting here! I'll do so in the future. This article is great, and just thinking of ways to improve it!

- I think Definitions need to be edited for a more layperson audience (i.e., non-theoretical statisticians). Presumably, most readers are familiar with statistics, and want to know (1) intuition behind distance covariance and (2) how to compute it. The current article has rather obscure notation (granted, taken from Szekely and Rizzo, 2009), but perhaps using "D" to denote distance matrix and "R" to denote re-centered distance matrices are more reader-friendly. Also, defining dCov^2 with the equation below "One can show that this is equivalent to the following definition:" is stated without any intuition. This should be put into a later section for readers who want to more details about dcov (i.e., this equation is derived from starting with a norm difference between distributions). My edits (drbabinski) try to clean up the notation, and make things more straightforward for a layperson reader (although much can be improved), without removing the previous definitions.

- The picture with the different data sets and a dcorr value is misleading. It is unclear how to interpret dcorr values, and saying a relationship has a larger dcorr than another relationship should be carefully interpreted based on the number of samples and variables. This differs from Pearson correlation, whose value is interpretable.

- Can the "Problems with the article" section below (in Talk) be archived? Are those problems resolved?

Drbabinski (talk) 18:37, 24 January 2018 (UTC)[reply]

I hope you don't mind me moving your thread to the right place —in WP-environment— to the bottom.

I do not have any objections to your intentions, but consider this TP as not sufficiently bloated yet to justify archiving already. Purgy (talk) 07:26, 25 January 2018 (UTC)[reply]