Talk:Order statistic

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Statistics (Rated C-class, Mid-importance)
WikiProject icon

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

C-Class article C  This article has been rated as C-Class on the quality scale.
 Mid  This article has been rated as Mid-importance on the importance scale.
 

I created the page on order statistics. Let us see if we can merge these pages together..


„However, we know from the preceding discussion that the probability that this interval actually contains the population median is...“ I have absolutely no idea why the probability should be equal to the magic number shown below this sentence. There has been no preceding discussion that would prove (or at least show) a formula from which the magic number is derived. A result of this importance would definitely require some clarification. In many practical cases, you can get data sets from software benchmarks that fail in normality tests. Computing confidence intervals based on order statistics is a very important topic, probably worth a separate article, not just one single magic number for one special case of six measurement values. — Andrej, 2010-11-30 —Preceding unsigned comment added by Andrejpodzimek (talkcontribs) 15:26, 30 November 2010 (UTC)

No offence, but for how important this topic is in statistics and how elegant its theory is, the article does an atrocious job. — Miguel 14:07, 2005 May 1 (UTC)

Not done yet, but at least now the article does not just dump a pile of equations on the reader without explanation or context. — Miguel 15:04, 2005 May 1 (UTC)

Old derivation saved for reference[edit]

Let be iid continuously distributed random variables, and X_{(1)}, X_{(2)}, \ldots, X_{(n)} be the corresponding order statistics. Let f(x) be the probability density function and F(x) be the cumulative distribution function of Xi. Then the probability density of the kth statistic can be found as follows.

f_{X_{(k)}}(x)={d \over dx} F_{X_{(k)}}(x)
={d \over dx}P\left(X_{(k)}\leq x\right)={d \over dx}P(\mathrm{at}\ \mathrm{least}\ k\ \mathrm{of}\ \mathrm{the}\ n\ X\mathrm{s}\ \mathrm{are}\leq x)
={d \over dx}P(\geq k\ \mathrm{successes}\ \mathrm{in}\ n\ \mathrm{trials})={d \over dx}\sum_{j=k}^n{n \choose j}P(X_1\leq x)^j(1-P(X_1\leq x))^{n-j}
={d \over dx}\sum_{j=k}^n{n \choose j} F(x)^j (1-F(x))^{n-j}
=\sum_{j=k}^n{n \choose j}
\left(jF(x)^{j-1}f(x)(1-F(x))^{n-j}
+F(x)^j (n-j)(1-F(x))^{n-j-1}(-f(x))\right)
=\sum_{j=k}^n\left(n{n-1 \choose j-1}F(x)^{j-1}(1-F(x))^{n-j} - n{n-1 \choose j} F(x)^j(1-F(x))^{n-j-1} \right)f(x)
=nf(x)\left(\sum_{j=k-1}^{n-1} {n-1 \choose j}
F(x)^j (1-F(x))^{(n-1)-j}
- \sum_{j=k}^n {n-1 \choose j}
F(x)^j (1-F(x))^{(n-1)-j}\right)

and the sum above telescopes, so that all terms cancel except the first and the last:

=nf(x)\left({n-1 \choose k-1} F(x)^{k-1} (1-F(x))^{(n-1)-(k-1)}
- \underbrace{{n-1 \choose n}
F(x)^n (1-F(x))^{(n-1)-n}}\right)

and the term over the underbrace is zero, so:

=nf(x){n-1 \choose k-1} F(x)^{k-1} (1-F(x))^{(n-1)-(k-1)}
={n! \over (k-1)!(n-k)!} F(x)^{k-1} (1-F(x))^{n-k} f(x).


COMMENT BY an actuary (knows prob + stat, not academic) The section "Distribution of each order statistic of an absolutely continuous distribution" would be clearer if it went like this:

1. Explain that you will derive the CDF and then take its derivative to get the pdf. 2. Derive the CDF. 3. Take the derivative.

The section Probability Distributions of Order Statistics should (optimally) reference another article for why F(X) ~ uniform. It is not obvious to newbies.

The interpolatory comments (such as the one about time series) should be distinguished somehow (i.e., with parentheses) so that the reader knows that they are not central to the argument of the article.

---End last guy's comment---

I disagree with your first point. It's easy to get an expression for the CDF, so it's the obvious starting point for a proof. But by far the easiest way to get the sum for the CDF into closed form is by differentiating it, fiddling with it until it's in closed form, then integrating it - which gives you the PDF along the way. It makes for a slightly confusing proof, but the alternatives - either trying to find an expression for the PDF from first principles, or trying to deal with that sum without differentiating it - seem deeply unpleasant. (Plus, at least according to Mathematica, the CDF's closed form seems to involve hypergeometric functions - making it a lot more complicated than the PDF.)

86.3.124.147 (talk) 22:53, 2 April 2008 (UTC)

Attention needed[edit]

I have placed an "expert needed" tag on this article partly because of the empty sections but mainly because of the relation between the parts that derive the distributions of the order statistics. The first part of what is there might be considered a direct approach and is probably OK for that. But a more advanced approach would be to start from the uniform distributiuon case, and to derive the more general case from this, which involves less complicated formulae. I am slightly unhappy about the 'du' approach taken for the uniform case. I do think the "uniform" part needs to be finished off by giving an expicit statement for the distribition function, possibly using an incomplete beta function, but certainly as an integral ... from which the density for the general case could then be derived. Melcombe (talk) 08:51, 11 April 2008 (UTC)

Figure[edit]

The caption of the figure for the exponential example ("Probability distributions for the n = 5 order statistics of an exponential distribution with \theta = 3") needs clarification.

  1. I assume the order statistics are for a sample of n=5 exponential random variables
  2. What is \theta? Is it \lambda, the parameter used on the exponential distribution page? It looks more like 1/\lambda. —Preceding unsigned comment added by LachlanA (talkcontribs)
Perhaps the best way to avoid this question is to replace the plot with one referred to the standard exponential distribution (i.e., with unit scale). The plot should also use notation consistent with that used in the article. It's a rasterized picture, and so needs replacement anyway. I'll see what I can come up with. Lovibond (talk) 15:07, 21 May 2013 (UTC)
I've redrawn the pdfs, changing the distribution to have unit scale and hazard rate. The picture is now vector, as well. I've updated the caption to reflect the change of scale, as well as clarify things a little (I hope!), identifying the functions as pdfs, rather than simply distributions. Lovibond (talk) 20:53, 21 May 2013 (UTC)

Equation error?[edit]

I don't believe the last equation in the "Dealing with discrete variables" section is correct. If the equation above it in terms of p1, p2 and p3 is correct, then the last equation should have a (1 - F(x) + f(x))^j as the first element of the second term in the summation (p2+p3)^j, rather than the existing (1 - f(x))^j.—Preceding unsigned comment added by 129.188.33.26 (talk) 16:22, 4 August 2010 (UTC)

Probabilistic Analysis[edit]

Before I say anything; I am fairly new to order statistics so bear with me as some of my comments are likely to be because of a lack of experience with them. However, I would consider myself to be an excellent representative of the kind of person that would come to this article in search of a better understanding of order statistics.

Basically, I think the "probabilistic analysis"-section is very confusing.

Firstly, the last subsection here is more measure theoretic than probabilistic. Secondly, it makes a few sort of detailed claims on the substitution to be used but makes no effort at describing how the formula of interest is derived. In my opinion, this formula;

f_{X_{(k)}}(x)\,dx = {n!\over(k-1)!(n-k)!}[F_X(x)]^{k-1}[1-F_X(x)]^{n-k}f_X(x)\,dx

is what most readers seek a better understanding of but that is left out. A proof, some intuition or at least a reference to where that might be found would really be great. Moreover, the section about the uniform is not motivated. I see that someone else has commented that some property exists that "might not be obvious to newbies", well, that's me! Please elaborate.

I propose that instead that the section would start with a simpler formula. One idea would be to reference e.g. Wackerly, Mendenhall & Schaeffer, 2008; "Mathematical Statistics with Applications", Duxbury, 7th edition: theorem 6.5, p. 336;

The pdf of the k'th order statistic is given by

f_{X_{(k)}}(x) = \frac{n!}{(k-1)!(n-k)!}[F_X(x)]^{k-1}[1-F_X(x)]^{n-k}f_X(x)

To me, the formula makes a lot of sense intuitively;

f_{X_{(k)}}(x) = \Pr [k-1\mathrm{below}\, x] \cdot \Pr[1 \mathrm{obs close to }\, x]\cdot \Pr [n-k\mathrm{obs above}\, x] \cdot  \#\lbrace \mathrm{ways we can do this} \rbrace

(I don't know if it's completely wrong but) maybe a motivation like this would be more useful to most readers? At least some form of introduction to the whole thing about the uniform. And some form of conclusion, what does that section prove? What have we shown?

 — Preceding unsigned comment added by Superpronker (talkcontribs) 13:53, 1 June 2011 (UTC) 

About Order Statistic of Uniform Distribution[edit]

"why F(X) ~ uniform" please refer to "Probability Integral Transformation" — Preceding unsigned comment added by BChenyu (talkcontribs) 16:18, 5 March 2012 (UTC)

Expectaion of Ordered Statistics[edit]

What is the expectation of the nth order statistic? E[X(n)] Or for that matter E[X(1)] or for any kth order statistic. — Preceding unsigned comment added by 199.119.232.221 (talk) 01:47, 26 November 2012 (UTC)

Useful to know, so I don't blame you for asking! Unfortunately, the answer depends upon the distribution. If your RV is continuous, obtain the density function of the kth order statistic. Once you have that, computing the expectation is in theory simple (though, in practice, it might not be!). Lovibond (talk) 21:10, 21 May 2013 (UTC)

Error in section "The joint distribution of the order statistics of an absolutely continuous distribution"?[edit]

I don't believe the equation that gives the joint pdf of two order statistics k and j. In particular for uniform [0,1] random variables and k = n, j = n-1, it doesn't seem to reduce to the equation given earlier (I get a 2 in the denominator which isn't present in the other equation).

128.151.210.203 (talk) 17:32, 8 July 2013 (UTC)

Extension to section: Probability distributions of order statistics[edit]

General formulas are known for the cumulative distributions of the smallest and largest samples, provided IID samples. The formulation for the maximum of n random-variables (for both the continuous and discrete cases) is listed here (http://www.math.ucsd.edu/~gptesler/283/slides/longrep_f13-handout.pdf). A very similar formulation works for the minimum.

Allow Y_1, Y_2, \ldots, Y_n to be i.i.d. random variables. The maximum is given by:


Y_{max} = \max(Y_1, Y_2, \ldots, Y_n)

meaning the CDF of Y_{max} is given by


\begin{align}
F_{Y_{max}} =& P(Y_{max} \le y) \\
=& P(Y_1 \le y, Y_2 \le y, \ldots, Y_n \le y) \\
=& F_{Y_1}(y) \cdot F_{Y_2}(y) \cdot \ldots \cdot F_{Y_n}(y) \\
=& F_Y(y)^n
\end{align}


Similarly, the minimum is given by: 
Y_{min} = \min(Y_1, Y_2, \ldots, Y_n)

meaning the CDF of Y_{min} is given by


\begin{align}
1-F_{Y_{min}}(y) =& P(Y_{min} \ge y) \\
=& P(Y_1 \ge y, Y_2 \ge y, \ldots, Y_n \ge y) \\
=& (1-F_{Y_1}(y)) \cdot (1-F_{Y_2}(y)) \cdot \ldots \cdot (1-F_{Y_n}(y)) \\
=& (1-F_Y(y))^n \\
F_{Y_{min}}(y) =& 1 - (1-F_Y(y))^n
\end{align}

A similar trick can be used to prove the general formulation which works on the probability density/mass function.Mouse7mouse9 00:02, 14 December 2013 (UTC)