Talk:U-statistic

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Statistics (Rated C-class, Mid-importance)
WikiProject icon

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

C-Class article C  This article has been rated as C-Class on the quality scale.
 Mid  This article has been rated as Mid-importance on the importance scale.
 

Attempt to argue that a U-statistic is not minimum variance[edit]

Copied from previous insertion in article...

The preceding paragraph does illustrate an important point, but it reaches the wrong conclusion as regards minimum variance unbiased estimation. Suppose, for example that the values are iid Cauchy, or more generally any symmetric distribution such as the Laplace distribution or the normal distribution, having tails not heavier than those of the Cauchy distribution. Then the expected value of the sample median of three values exists and is equal to the population median. Likewise, the expected value of the sample median of  n>3 values exists and is equal to the population median. So all such statistics have the same expected value, the center of symmetry. The U-statistic f_n(x_1,\ldots, x_n) described above is certainly unbiased for the population median, but, among unbiased estimates of the Cauchy median or Laplace median, it does not have minimum variance. It is worth bearing in mind that if symmetric distributions with very heavy tails are considered, for example with density proportional to (1+x^2)^{-3/4}, then the sample median of three values does not have an expected value let alone a variance. How then can the U-statistic f_n(x_1,\ldots, x_n) be unbiased for anything, let alone minimum variance unbiased? <end copy>

Response: "among unbiased estimates of the Cauchy median or Laplace median, it does not have minimum variance" — you haven't shown this.

Rebuttal[edit]

Unsubstantiated and preposterous claims of this sort (minimum-variance unbiasedness) contribute to the reputation of Wiki as an unreliable source. Since you have reinstated the claim of minimum variance after correction, the onus is on you to prove this or give a reference.

But for the record,...

It is an exercise in elementary calculus to show that the sample median of three Cauchy random variables has finite mean and infinite variance (just do the integration). Although less obvious, the associated U-statistics f_4, f_5,\ldots also have infinite variance for Cauchy samples. One corollary of your claim that the U-statistic f_n is minimum-variance unbiased is that every unbiased estimate of the Cauchy median has infinite variance. But this is contrary to the known fact that the sample median of  n\ge 5 values has finite variance. If you don't believe the math, just check by simulation for Cauchy samples that the sample median for n \ge 5 has smaller variance than the U-statistic f_n.

For the Laplace family, the state of affairs is even simpler because of the Cramer-Rao inequality. The maximum-likelihood estimator \mbox{median}(x_1,\ldots, x_n) is unbiased and has minimum asymptotic variance, which is strictly smaller than the variance of the U-statistic f_n.

If those examples fail to convince, bear in mind that, in the Gaussian family, the population median is also the population mean \mu. The sample median of three independent values is a random variable whose mean is \mu. But the sample mean \bar x_3 = (x_1 + x_2 + x_3)/3 has the same expectation as the median of three values. The associated U-statistics \bar x_n and f_n are different, both unbiased, and your claim is that BOTH have minimum variance for normal samples. The claim is clearly preposterous.


Melcombe (talk) 13:09, 8 September 2008 (UTC)

It would be good if you format the discussion according to ordinary wiki-practice, so that your bits are properly indicated. As for a reference, see the reference placed in the first para of the article. Melcombe (talk) 08:49, 9 September 2008 (UTC)

Response: I don't see anything relevant in any of the references. The minimum-variance claim is so obviously wrong that it couldn't possibly occur in any reputable source.

Do try to sign your responses as is standard. So you think Cox & Hinkley is not a reputable source? Specifically they say (Cox, D.R., Hinkley, D.V. (1974) Theoretical statistics. Chapman and Hall ISBN 0-412-12420-3, p.200): "The formal motivation for U <a U-statistic> is that it is the average of k(Y1,...,Yr) conditionally on Y(.) <the order statistics>, and, as we shall show in <a later section>, this implies that U has the smallest variance among all unbiased estimates of <the quantity being estimated>, whatever the true distribution..." So obviously in saying "I don't see anything relevant in any of the references" you didn't actually look very far. Melcombe (talk) 14:44, 9 September 2008 (UTC)

Response:

Sorry, the Cox-Hinkley reference did not appear on the main article, so I missed it. Ordinarily, C&H is a reliable source, but in this case I'm afraid they've got it wrong.

Section 8.4 of C&H uses the standard Rao-Blackwell projection, and I have no quarrel with that.

The sentence following (30) in C&H, correctly quoted above, is incorrect, and the same error is repeated after (31) on the same page. This passage confuses the average over sub-samples with the conditional expectation given the sufficient statistic (order statistics). The average over sub-samples is a Hajek projection; the conditional expectation is the Rao-Blackwell projection. Both projections have desirable variance-reducing properties. All is well if these happen to be equal, but in general they are not the same. A correct version of the sentence is as follows.

"The formal motivation for U is that it is a proxy for the conditional expected value of k(...) given the sufficient statistic, i.e. T = E(k\mid S). As we shall show... T has minimum variance among unbiased estimators. The same argument shows that T = E(U \mid S) is also the Rao-Blackwell projection of U, so the variance of U is at least as great as the variance of T."

The Cauchy example described above is such that the U-statistic f_n has infinite variance. And yet there exist unbiased estimates of the median that have finite variance for n\ge 5. The Laplace and Gaussian examples illustrate the same thing, though not to the same degree. I can't imagine more clear-cut counterexamples to the minimum-variance claim.

OK, now we are at least looking at the same reference. But can we make things still more clearcut? There is a difference between (i) minimum variance over the class of estimators which are unbiased for a particular family of distributions and (ii) minimum variance over the class of estimators which are unbiased for all distributions. The Gaussian case is a counter-example for the claim (i) but not (ii) I think, and possibly the same applies for your other examples. The interpretation of C&H at this point may be ambiguous, but it does say "whatever the true distribution". So do you think that the more specific claim that U-statistics are "minimum variance over the class of estimators which are unbiased for all distributions" is false (and false even if a "finite variance" condition is included if necessary). Melcombe (talk) 09:33, 10 September 2008 (UTC)

It is a reasonable question. Presumably you have in mind models for

which the order statistics are minimal sufficient. Here's the answer as I see it.

In a parametric model with parameter space {F}, with theta = theta(F) as the sub-parameter of interest, the term "minimum variance unbiased estimate of theta" means that the estimate is (a) unbiased E(T; F) = \theta(F) for all parameters F in the family; (b) var(T; F) \le var(T'; F) for all unbiased estimates T' and parameters F in the parameter space. It is immaterial to the definition whether {F} is finite-dimensional or infinite-dimensional.

In general, there is no reason to suppose that a minimum-variance estimate exists: T might dominate T' in one region and the reverse in another region of the parameter space. I don't think there exists a minimum-variance unbiased estimate of \mu(F) = \int x\, dF among iid models with parameter space consisting of all distributions for which the mean exists. If the variance exists, the sample mean is minimum-variance unbiased among linear estimators only.

If the minimal sufficient statistic is the set of order statistics, the Rao-Blackwell projection of an unbiased statistic h  U = E(h \mid S) is the associated U-statistic, which is also unbiased. But as we have seen in the Cauchy example, this statistic need not have minimum variance. I'm not sure what the implications of completeness might be, but I don't think condition (b) is satisfied by U-statistics in general. If a statistic is minimum variance unbiased for all distributions F, wouldn't it also be minimum variance unbiased for subsets? (It should be understood that the subset imposes no restriction on the range of theta().)

Incidentally, equation (31) in C&H has a typo in the conditioning event, which is presumably intended to be the sufficient statistic. Even with that fix, I'm not sure that it is correct. —Preceding unsigned comment added by 64.109.248.247 (talk) 14:20, 10 September 2008 (UTC)

I think there may be some confusion about what is being estimated. It is not the parameter of a family of distributions but rather a "population parameter" defined for any distribution which is defined in the U-statistics literature as the expected value of the "kernel function" (k(.,....) in C&H notation). Considering the simple case of the mean, the U-statistic is the sample mean and the estimators that it needs to beat to qualify for the second "minimum variance" claim are only those estimators for which the given candidate estimator (as a function of the observations) is unbiased for the population mean for all distributions. You may well be able to find different estimators that beat it for different specific distributions. As for your "Cauchy example" which you say produces a contradiction, the U-statistics based on "median of 1", "median of 3", "median of 5" etc are estimators which estimate different population parameters and so do not have to be in any particular order of variance according to (this version of) the claim of "minimum variance". Neither does the the U-statistic based on the "median of 1" have to beat other estimates of the "Cauchy median", since it is identical to the sample mean and the population parameter being estimated is the population mean and not the population median. The U-statistic for a sample of size N based on the "median of 3" does not have to beat the the sample median of the N values, since these estimate different quantities (quantities which are different for at least some distributions).
You say "Presumably you have in mind models for which the order statistics are minimal sufficient" ... I think this is actually the situation in which the C&H reference is working, essentially a distribution-free situation.
Melcombe (talk) 16:57, 10 September 2008 (UTC)

Response.

I think we are converging, and there is a claim that I can agree with. However, I do not understand the term "population parameter" unless it is defined either as a function theta(F) on the parameter space of a statistical model, or as a function of the tail of the process, which is much the same thing in the iid case. So, in any iid model, the mean, the median, probable error and so on are sub-parameters theta(F). At this point, I think we are saying the same thing in different ways. Ordinarily, the full model "iid with arbitrary F" is unworkable because conditions such as finiteness and uniqueness have to be imposed on the sub-parameter theta(F) or on the variance of the kernel. So any reasonable model that permits estimation of the mean, variance, median or probable error is necessarily a sub-model. The particular choice of sub-model is crucial to the issue now under discussion.

The claim that I hope we can agree on is as follows: If the family of distributions {F} is such that the order statistics are not only minimal sufficient for every sample size n, but also COMPLETE, then the U-statistic is minimum-variance unbiased (at least if the kernel has finite variance). I guess that is the intent, and I hope we can agree on a version of that sort.

If we are interested in estimation of the median, it may be reasonable to consider the "non-parametric" sub-model consisting of symmetric distributions having tails no heavier than 1/x^2. (I am not very keen on this, and I am confident that you are not keen either, but I can see no objection on theoretical grounds. It is commonly done in the robustness world to take advantage of the fact that the median is U-estimable in the sub-model). The order statistics are still minimal sufficient, but not complete. Although it is the Rao-Blackwell projection, the U-statistic associated with the kernel k (median of 3) does not have minimum variance among unbiased estimates of the median. The Cauchy calculation is a finite-dimensional illustration of this point.

I think it is worth making the point that minimal sufficiency of the order statistics is not enough to guarantee minimum variance unbiasedness, even in infinite-dimensional iid models.

As you may gather, I'm not keen on the terms "non-parametric" and "distribution-free". On the one hand, the terms are too negative and too vague. On the other hand, they are contradictory: every model is parametric, and no statistical model is distribution-free. But I'm not going to argue that one.

OK, I hope the process is now close enough to convergence.

128.135.149.5 (talk) 21:05, 10 September 2008 (UTC)signingoff

I have delved a little into the literature. A claim for minimum variance "among all unbiased estimates" is made in Hoeffding(1948) (as ref'd in article) between equations (4.3) and (4.4) who attributes it to Halmos(1946) (The theory of unbiased estimation, Ann Math Stat, 17, 34-43). I haven't seen the latter so I don't know the approach taken to get the result. However the claim is certainly not new with C&H. I will amend the article to try to make the context clearer. I disagree with what you say about "distribution-free" ...perhaps the sticking point is that you think in terms of needing to have a "model" (which almost implies a parametric model), whereas the essential thing is the analysis undertaken and the assumptions under which it is valid. Lots of non-parametric tests just rely on the assumption of iid (which some would say is a "model" but a non-parametric or distribution-free one). Similarly there are the examples of (i) the central limit theorem (ii) spectral analysis in time-series studies.
Melcombe (talk) 08:57, 11 September 2008 (UTC)