Talk:Łukaszyk–Karmowski metric

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Statistics (Rated Start-class, Low-importance)
WikiProject icon

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

Start-Class article Start  This article has been rated as Start-Class on the quality scale.
 Low  This article has been rated as Low-importance on the importance scale.
 
WikiProject Mathematics (Rated Start-class, Low-importance)
WikiProject Mathematics
This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Mathematics rating:
Start Class
Low Importance
 Field: Probability and statistics

Discussion of term "probability metric"[edit]

(Note that this article was originally titled "probability metric")

To the best of my knowledge (and I'm not a probabilist, so my understanding of the nuances of this terminology isn't rock solid) a 'probability metric' is a metric on a space of probability functions. Not on a space of random variables. But regardless of the nomenclature, this page seems to define "probability metric" as the L_1 distance between two random variables under sort of vague and varying unspecified distribution functions. Which is weird. When I have time in the future, I'll try to edit it, but maybe a probability grad student could/should clean it up? Gray 23:59, 2 September 2007 (UTC)

Dear Gray, The 'probability metric' is not an L1 distance of two random variables, as it covers also random vectors. The distribution functions are arbitrary but it is not vague as it is a consequence of the fact that a given distribution is the property of the random variable or vector which distance from another variable or vector is measured. After specifying distributions and integrating the PM (which may be complicated e.g. for random vectors of different distributions of coefficients) you obtain a given, particular form of the probability metric. Examples of particular PM for two continuous random variables with normal (NN) and rectangular distributions (RR) are shown.--Guswen 17:31, 3 September 2007 (UTC)

As written in the article, it is vague. The notation Dz,z is used without being introduced, and is apparently redundant. I also don't see how to reconcile the total variation (for example) with this definition since the article seems to treat the joint distribution of X and Y fixed as well as their marginals. Isn't the total variation a semi-metric on the space of probability measures? Gray 04:30, 4 September 2007 (UTC)

I introduced a brief explanation of the subscript notation. I agree that the proposed notation may need some improvements in case of joint probability density function (F(x, y)) of two dependent random variables or vectors. Yet it explains which distributions were taken as PM arguments. PM is not a semi-metric as it satisfies the triangle inequality condition. --Guswen 18:14, 4 September 2007 (UTC)

Original research?[edit]

I am concerned that this article might fall under WP:NOR. The term "probability metric" was used for quite a few decades, but its meaning is not restricted to a specific metric such as the one provided in this article. It seems that the content is mostly from the only reference Lukaszyk's paper. --Memming (talk) 15:15, 7 May 2009 (UTC)

A citation search today showed that this paper had not been referenced cited by any other published paper. Melcombe (talk) 12:11, 8 May 2009 (UTC)

I do not share your opinion concerning WP:NOR: "Material for which no reliable source can be found is considered original research. The only way you can show that your edit does not come under this category is to produce a reliable published source that contains that same material." The probability metric was the subject of my PhD thesis at the Krakow University of Technology and it was also published by Springer Berlin/Heidelberg. Yet it is not my responsibility that my university do not publish PhD thesis online and the Springer do not offer their publications for free. I shall do my best provide you with a copy of the article via e-mail. While I named the new operator as "probability metric" I did some research to find that this term has not been used before but I might be obviously wrong. Finally I wrote this article not to self-promote myself or my work but to provide general information about this concept, which is not developed anymore. --Guswen (talk) 10:29, 18 May 2009 (UTC)

Thank you for your response. I also work on various metrics on probability measures, and I am aware of excellent works such as Hein and Bousquet (2005) or Martins et al. (2008) both ICML papers that defines Hilbertian metrics on probability measures. What I am concerned about this article is that instead of talking about the general concept of metrics on probability measure spaces, it only considers one specific metric that seems to be not the most significant one that is associated with the concept. The research on this area is still pretty active and potentially be very significant. However, the current article is showing very restrictive presentation, and seems to be mostly from Lukaszyk's paper. I took a look at the paper, and it seems not very significant compared to the works I have mentioned. I would suggest you to rewrite the article so that it contains different forms of probability metrics. Remember wikipedia is an encyclopedia, not only concept that are widely accepted and verified should be included. See WP:Notability as well. --Memming (talk) 13:59, 20 May 2009 (UTC)
A related article appeasrs to be Probabilistic metric space ... it would be good to have these articles inter-relate sensibly. Melcombe (talk) 14:37, 20 May 2009 (UTC)
A class of related topics Statistical distance where User:Guswen added this article, and it was how I came to this article. --Memming (talk) 19:52, 20 May 2009 (UTC)

Dear Memming, Dear Melcombe, Before we go on to further discussion concerning your proposition to rewrite the article so that it contains different forms of probability metrics, kindly note that probability metric (D) of the article is a generalization (not a new concept) of a metric (cf. Definition of a metric space) for probability distributions and D simply becomes metric (e.g. Euclidean) for Dirac delta distributions. Further (to the best of my knowledge) none of the Statistical distances has the same properties as D, which properties arise right from its definition and do not require any further publications to point or prove them i.e.:

  1. 0 < D\left(X, X\right) i.e. D never equals zero (except for two Dirac delta distributions
  2. D\left(X, Y\right) = D\left(Y, X\right)
  3. D\left(X, Y\right) + D\left(Y, Z\right) \ge D\left(X, Z\right)

To list only some differences:

  1. Kullback-Leibler divergence is not symmetric
  2. Hellinger distance 0 \le H(P,Q) \le 1 while 0 < D  <  +\infty.
  3. total variation and f-divergence are functions, while D is a number
  4. Jensen–Shannon divergence is based on the Kullback-Leibler divergence, thus is also not symmetric
  5. Bhattacharyya distance 0 \le BC \le 1 and need not obey the triangle inequality
  6. dp,q of the Probabilistic metric space is a distribution function, while D is a number

--Guswen (talk) 08:14, 21 May 2009 (UTC)

As far as I understand, Jensen-Shannon divergence is symmetric and it's square root is a metric. Read Martins, Figueiredo, Aguiar, Smith, and Xing, "Nonextensive entropic kernels", ICML 2008 for example. Also, f-divergence is not a function, many of f-divergences are actually metric. See Liese and Vajda, "On divergences and informations in statistics and information theory" IEEE Tr. Info. Theory, vol 52, no 10, 2006. --Memming (talk) 12:35, 21 May 2009 (UTC)
You are right (and I was wrong) - Jensen-Shannon divergence is symmetric, yet it is not a metric in a general sense (0 \le JSD \le 1)--Guswen (talk) 16:29, 24 May 2009 (UTC)
There are several points to consider.
  • If this article uses the term "probability metric" in a sense different from an established meaning, then it needs to be changed (possibly only the title) and there should be an article containing the established meaning. There needs to be a distinction between the term "probability metric" and one particular version of a probability metric and this could be accomplished by giving this a suitable name, for example the "mean absolute difference metric". There needs to be a definition of what is meant by the general term "probability metric" (or other name if necessary) which needs to start with saying what things the metric is measuring the distance between. It seems to me that your comparison with some of the other measures omits the important point that this measure deals with the joint distribution of the random variables, while some others just consider the distance between marginal distribution functions. There is probably enogh to say about what might be meant by proability metric to make this a separate article, with the present article renamed to whatever an acceptable name is.
  • Is this material "notable"? People have been using mean absolute difference for years ... why/when is it important that this should have the property (or some properties) of a metric?
  • Is this material "notable" in other senses of WP:Notability? Has someone other than the article's writer and the author of the original paper made effective use of the idea? It does not matter that "my university do not publish PhD thesis online and the Springer do not offer their publications for free"... if the references exist, put them in as it does not matter if they are online or not (online is only a convenience). At least it would upgrade the article from just the single basic reference, and it would satisfy some of the basic requirements for wikpaedia.
  • Are there other relevant probability metrics in the sense meant here... is the obvious alternative (the mean square difference) a probability metric in this sense?
I would encourage you to at least put in your references and to put in some in-line citations to indicate which bits relate to these new citations and which to Lukaszyk. Melcombe (talk) 11:07, 21 May 2009 (UTC)
I now see that you have included the mean squared difference ...but this reflects the problem that the article presently starts with a "definition" of what is meant but this does not cover all the cases being considered. Melcombe (talk) 11:24, 21 May 2009 (UTC)
I think the main problem is the WP:Notability, and the general title Probability metric which is not well defined in the literature. As far as I am concerned, it means any metric on probability measures and the meaning in the social level has not been specialized to what the article suggests at the moment. I suggest we remove this article from wikipedia. (BTW, mean square difference is a special case of f-divergence.) --Memming (talk) 12:42, 21 May 2009 (UTC)
The mean square difference between density functions is not the same as the metric here. I have come across this online source [[1]] which is part of "Modern theory of summation of random variables" by V. M. Zolotarev, VSP International Science Publishers (1997) ISBN 9067642703. This seems to be using "probability metric" in at least as general a sense as here (so as to allow a distance between random variables rather than just a distance between probabilities). However, "probability metric" does seem to be a misleading name for what is being done here with the metrics being considered. Perhaps something like "metrics on random variables" would be better. As for deletion, I think that the topic might have some importance to quantum mechanics even if not in probability theory. Of course, I know nothing about either. At least the writer(s) here have made an attempt to be understandable, even if there are still some difficulties. Let's just push the article in the right direction for now. (The above book is "interesting" in the sense that it seems to cost one of £104, £220 or £435 depending on what online retailer one looks at). Melcombe (talk) 14:12, 21 May 2009 (UTC)

Dear Melcombe, A few days ago I asked the promotor of my PhD to provide some other references (if any) other than my article from 2004. I also agree that the term "probability metric" may be misleading. Initially we used the term "measurement metric", since the metric was applied in approximation of scattered mechanical data obtained from experiments: photoelasticity, interferometry and strain gauge measurements which you might consider as random variables with mean value over the "exact location" of the measured quantity. Consequently Dirac delta distribution can be considered as an "exact" or "true" measurement in which "degenerated" case the metric transforms to the ordinary metric as expected. Therefore I suggest changing the name to "measurement metric". —Preceding unsigned comment added by Guswen (talkcontribs) 19:21, 22 May 2009 (UTC)

I discussed the name of the article with the promotor of my PhD thesis Dr. Sc. Wojciech Karmowski who substantially contributed the project and in our opinion such a name is fairly adequate. "Measurement metric" seems too narrow, while "Metric on random variables or random vectors" is too descriptive.--Guswen (talk) 12:44, 25 May 2009 (UTC)

I am glad to see the article renamed from Probability metric to Lukaszyk-Karmowski metric. It resolves one of my points. Thanks for doing so. --Memming (talk) 19:49, 25 May 2009 (UTC)
I am also very glad that we have reached some form of an agreement and once again I admit that it was a mistake to call this operator a probability metric which simply confuses the reader. I let myself to add an additional reference to the PhD thesis that I mentioned earlier (unfortunately neither online nor in English). I hope however that now the article resolves Onesource and Notability issues. --Guswen (talk) 21:53, 25 May 2009 (UTC)

New question (May 2009)[edit]

The article has "This function is not a metric as it is does not satisfy the identity of indiscernibles condition of the metric" near the beginning. Is this just overcome by the usual "except on a set of probability zero" type of convention? Melcombe (talk) 09:54, 27 May 2009 (UTC)

I do not think so. A set of probability zero means that probability of each point in that set is zero, so the points of this set do not "form" a random variable. The identity of indiscernibles condition says that for all elements

d(x, x) = 0

while D(X, X) = 0 if and only if X is a degenerated random variable with zero variance (cf. Almost surely) described by Dirac delta probability distribution (continuouus case) or degenerate distribution (discrete case).--Guswen (talk) 13:08, 29 May 2009 (UTC)

But according to the first definition given, which allows dependent varibles and does not assume that any random variables are necessarily independent, D(X, X) = 0 always because
D(X, X) = E(|X - X|)=0.\,
But I see that for much of the article you are assuming independence in the formulae being used
Melcombe (talk) 16:33, 29 May 2009 (UTC)
I do not agree again. Why E(|X - X|) = 0? X-X is not 0 but some neutral element (let's call it ZR) in a set of given random variables with respect to addition/subtraction (cf. Algebra of random variables or [2]). If the result of addition/subtraction of two random variables of the same variance σ2 is a new random variable of variance 2σ2 than such ZR element does not exist (in this context) since
X + ZR \ne X
You're right - independence of random variables makes the situation much easier. But it is not necessary.--Guswen (talk) 18:36, 29 May 2009 (UTC)
I disagree. When two random variables X, and Y are related as X = Y, then E|X-Y| = 0. And X-Y would be a random variable (function from sample space to reals) that takes 0 all the time. The reference ppt you liked talks only about independent cases. --Memming (talk) 12:59, 31 May 2009 (UTC)
I reconsidered your arguments and must admit that you're (partially) right. According my ppt reference (cf. also Convolution as method of addition/subtraction of random variables) X-X would not be a "true random variable" (that takes 0 all the time) but a random variable of variance 2σ2 and mean \mu = \mu_x - \mu_x = 0. But therefore  \mu = E|X-X| = 0.
Kindly note however that direct integration of
\int_{-\infty}^\infty \int_{-\infty}^\infty |x-y|F(x, y) \, dx\, dy,
or
D(X, Y) = \int_{-\infty}^\infty \int_{-\infty}^\infty |x-y|f(x)g(y) \, dx\, dy
for X=Y does not yield 0 but some positive value being the function of σ.
E.g. for two Gaussian distributions it is
\frac{2\sigma}{\sqrt\pi}
The equation E|X-Y| = 0 has been introduced to the article by Michael Hardy on September 1, 2007 as a simplification of the equations above. I accepted it as true, without further consideration. It seems however that it does not hold true here. —Preceding unsigned comment added by Guswen (talkcontribs) 08:32, 1 June 2009 (UTC)
By X=Y where X and Y denote random variables, I mean both X and Y are the same function from the sample space to the reals. And in that case, X-Y means to take the difference X-Y(\omega) := X(\omega)-Y(\omega) = 0, and it is a zero function. Hence, obviously its mean is zero, and variance is zero as well. I do not agree with you that it will have 2σ2 as variance in Gaussian case. (BTW, the integral formula you wrote is same as the expectation.) --Memming (talk) 19:02, 1 June 2009 (UTC)
What do you mean by "zero function"? A probability density function f(x) = 0? Why such a function would have mean of zero and zero variance? f(x) = 0 is not a probability density function since for the latter it is required that \int_{-\infty}^\infty f(x) \, \mathrm{d}x = 1.
As for the integral formula (which is same as the expectation?) one may calculate it e.g. for two normal densities of the same mean and variance (using dummy variable instead of the second x) to obtain \frac{2\sigma}{\sqrt\pi}.
Or am I missing your point?--Guswen (talk) 23:10, 1 June 2009 (UTC)
I'm saying the random variable itself is a zero function. The corresponding CDF would be a heaviside function, and the pdf would be a Dirac delta distribution at zero. --Memming (talk) 13:27, 3 June 2009 (UTC)
I do not agree again. When subtracting independent random variables the mean of the difference equals the difference of the means but variance of the difference is the sum of the variances not 0 (cf. e.g. [3] or the ppt reference above with proof on CDF).
Then you are still wrong. X-X is the difference of two dependent random variables (identical random variables), not independent random variables. The sum of variances only applies to independent random variables. If you do only want to have this "distance measure" defined for independent random variables then the article should start with this as the definition ... but you would then not have a measure of distance betweeen random variables but only a distance between probability distributions. Melcombe (talk) 16:16, 3 June 2009 (UTC)
Why do you consider two random variables of the same distribution, mean and variance as dependent? Formally they are identical (as 2=2) but consider each of them as a result of some experiment or measurement. Consider e.g. two measurements I mentioned in Popular explanation section: the results obtained by the first group of surveyors shall be independent on the results obtained by the second group, even though they may measure the same point μx = μy. Therefore:
D(X, X) \ne \int_{-\infty}^\infty \int_{-\infty}^\infty |x-x|f(x)g(x) \, dx\, dx = 0 but
D(X, X) = \int_{-\infty}^\infty \int_{-\infty}^\infty |x-t|f(x)g(t) \, dx\, dt
--Guswen (talk) 01:40, 4 June 2009 (UTC)
I don't always assume "two random variables of the same distribution, mean and variance as dependent", but it is a case allowed under the general definition presently stated in the article. To remind you, the general definition says:
The Lukaszyk-Karmowski metric D between two continuous random variables X and Y is defined as:
\int_{-\infty}^\infty \int_{-\infty}^\infty |x-y|F(x, y) \, dx\, dy,
where F(x, y) denotes the joint probability density function of random variables X and Y.
It then goes on to say that the random variables "might" be independent. It certainly doesn't say that they are always independent. The little I have seen refering to Karmowski metrics seems also not to be restricted to independence. If the article doesn't actually say what is intended, it should be changed. Melcombe (talk) 09:09, 4 June 2009 (UTC)
You're right - the definition covers also dependent random variables but the specific examples of the L-K metric and triangle inequality proof cover only the independent case. I changed the definition according to your suggestion. --Guswen (talk) 10:38, 4 June 2009 (UTC)


Deletion review[edit]

The following is a copy of tghe discussion that led to the deletion of this article ..

Since the article has now been restored, perhaps some attention can be placed on the points raised. One point is should the article be renamed to just "Karmowski metric"? Melcombe (talk) 11:07, 19 May 2011 (UTC)

... but I see in a Google search that most finds use the full "Lukaszyk-Karmowski metric" form. Melcombe (talk) 11:10, 19 May 2011 (UTC)

maybe it should be named lukaszyk-karmowski pseudometric for being more precise? 212.87.13.74 (talk) 12:50, 20 May 2011 (UTC)

removed speedy deletion tag[edit]

Removing speedy-deletion tag: since AfD deletion several citations for independent usage of the concept were added, enough to meet concerns raised in that discussion. Melcombe (talk) 09:23, 3 June 2011 (UTC)