Khmaladze transformation: Difference between revisions
Andrewman327 (talk | contribs) m clean up of articles listed as "needing cleanup", typos fixed: , → , , ,, → , using AWB (8759) |
No edit summary |
||
Line 13: | Line 13: | ||
This <math>v_n</math>, as a random process in <math>x</math>, is called the [[empirical process]]. Various [[Functional (mathematics)|functional]]s of <math>v_n</math> are used as test statistics. The change of the variable <math>v_n(x)=u_n(t)</math>, <math>t=F(x)</math> transforms to the so-called uniform empirical process <math>u_n</math>. The latter is an empirical processes based on independent random variables <math>U_i=F(X_i)</math>, which are [[uniform distribution (continuous)|uniformly distributed]] on <math>[0,1]</math> if the <math>X_i</math>s do indeed have distribution function <math>F</math>. |
This <math>v_n</math>, as a random process in <math>x</math>, is called the [[empirical process]]. Various [[Functional (mathematics)|functional]]s of <math>v_n</math> are used as test statistics. The change of the variable <math>v_n(x)=u_n(t)</math>, <math>t=F(x)</math> transforms to the so-called uniform empirical process <math>u_n</math>. The latter is an empirical processes based on independent random variables <math>U_i=F(X_i)</math>, which are [[uniform distribution (continuous)|uniformly distributed]] on <math>[0,1]</math> if the <math>X_i</math>s do indeed have distribution function <math>F</math>. |
||
This fact was discovered and first utilized by Kolmogorov(1933), Wald and Wolfowitz(1936) and Smirnov(1937) and, especially after Doob(1949) and Anderson and Darling(1952), it led to the standard rule to choose test statistics based on <math>v_n</math>. That is, test statistics <math>\psi(v_n,F)</math> are defined (which possibly depend on the <math>F</math> being tested) in such a way that there exists another statistic <math>\varphi(u_n)</math> derived from the uniform empirical process, such that <math>\psi(v_n,F)=\varphi(u_n)</math>. Examples are |
This fact was discovered and first utilized by Kolmogorov(1933), Wald and Wolfowitz(1936) and Smirnov(1937) and, especially after Doob(1949) and Anderson and Darling(1952), it led to the standard rule to choose test statistics based on <math>v_n</math>. That is, test statistics <math>\psi(v_n,F)</math> are defined (which possibly depend on the <math>F</math> being tested) in such a way that there exists another statistic <math>\varphi(u_n)</math> derived from the uniform empirical process, such that <math>\psi(v_n,F)=\varphi(u_n)</math>. Examples are |
||
: <math>\sup_x|v_n(x)|=\sup_t|u_n(t)|,\quad \sup_x\frac{|v_n(x)|}{a(F(x))}=\sup_t\frac {|u_n(t)|}{a(t)}</math> |
: <math>\sup_x|v_n(x)|=\sup_t|u_n(t)|,\quad \sup_x\frac{|v_n(x)|}{a(F(x))}=\sup_t\frac {|u_n(t)|}{a(t)}</math> |
||
and |
and |
||
: <math>\int_{-\infty}^{\infty} v_n^2(x)d F(x)=\int_{0}^{1} u_n^2(t)\,dt.</math> |
: <math>\int_{-\infty}^{\infty} v_n^2(x)d F(x)=\int_{0}^{1} u_n^2(t)\,dt.</math> |
||
Line 25: | Line 25: | ||
However, it is only rarely that one needs to test a simple hypothesis, when a fixed <math>F</math> as a hypothesis is given. Much more often, one needs to verify parametric hypotheses where the hypothetical <math>F=F_{\theta_n}</math>, depends on some parameters <math>\theta_n</math>, which the hypothesis does not specify and which have to be estimated from the sample <math>X_1,\ldots,X_n</math> itself. |
However, it is only rarely that one needs to test a simple hypothesis, when a fixed <math>F</math> as a hypothesis is given. Much more often, one needs to verify parametric hypotheses where the hypothetical <math>F=F_{\theta_n}</math>, depends on some parameters <math>\theta_n</math>, which the hypothesis does not specify and which have to be estimated from the sample <math>X_1,\ldots,X_n</math> itself. |
||
Although the estimators <math>\hat \theta_n</math>, most commonly converge to true value of <math>\theta</math>, it was discovered (Kac, Kiefer and Wolfowitz(1955) and Gikhman(1954)) that the parametric, or estimated, empirical process |
Although the estimators <math>\hat \theta_n</math>, most commonly converge to true value of <math>\theta</math>, it was discovered (Kac, Kiefer and Wolfowitz(1955) and Gikhman(1954)) that the parametric, or estimated, empirical process |
||
: <math>\hat v_n(x)=\sqrt{n} [F_n(x)-F_{\hat\theta_n}(x)]</math> |
: <math>\hat v_n(x)=\sqrt{n} [F_n(x)-F_{\hat\theta_n}(x)]</math> |
||
Line 33: | Line 33: | ||
From mid-50's to the late-80's, much work was done to clarify the situation and understand the nature of the process <math>\hat v_n</math>. |
From mid-50's to the late-80's, much work was done to clarify the situation and understand the nature of the process <math>\hat v_n</math>. |
||
In 1981, and then 1987 and 1993, E. V. Khmaladze suggested to replace the parametric empirical process <math>\hat v_n</math> by its martingale part <math>w_n</math> only. |
In 1981, and then 1987 and 1993, E. V. Khmaladze suggested to replace the parametric empirical process <math>\hat v_n</math> by its martingale part <math>w_n</math> only. |
||
: <math>\hat v_n(x)-K_n(x)=w_n(x)</math> |
: <math>\hat v_n(x)-K_n(x)=w_n(x)</math> |
||
where <math>K_n(x)</math> is the compensator of <math>\hat v_n(x)</math>. Then the following properties of <math>w_n</math> were established: |
where <math>K_n(x)</math> is the compensator of <math>\hat v_n(x)</math>. Then the following properties of <math>w_n</math> were established: |
||
Line 51: | Line 51: | ||
* The construction of innovation martingale <math>w_n</math> could be carried over to the case of vector-valued <math>X_1,\ldots,X_n</math>, giving rise to the definition of the so-called scanning martingales in <math>\mathbb R^d</math>. |
* The construction of innovation martingale <math>w_n</math> could be carried over to the case of vector-valued <math>X_1,\ldots,X_n</math>, giving rise to the definition of the so-called scanning martingales in <math>\mathbb R^d</math>. |
||
For a long time the transformation was, although known, still not used. Later, the work of researchers like |
For a long time the transformation was, although known, still not used. Later, the work of researchers like [[Roger Koenker|Koenker]], [[Winfried Stute|Stute]], [[Jushan Bai|Bai]], [[Hira L. Koul|Koul]], Koening, and others made it popular in econometrics and other fields of statistics.{{cn|date=August 2013}} |
||
==See also== |
==See also== |
||
Line 57: | Line 57: | ||
==References== |
==References== |
||
*Khmaladze |
*{{cite journal |last=Khmaladze |first=E. V. |year=1981 |title=Martingale Approach in the Theory of Goodness-of-fit Tests |journal=Theor. Prob. Appl. |volume=26 |issue=2 |pages=240–257 |doi=10.1137/1126027 }} |
||
*{{cite journal |last=Khmaladze |first=E. V. |year=1993 |title=Goodness of fit Problems and Scanning Innovation Martingales |journal=[[Annals of Statistics]] |volume=21 |issue=2 |pages=798–829 |jstor=2242262 }} |
|||
⚫ | |||
*Khmaladze, E.V. (1993) "Goodness of fit problems and scanning innovation martingales", ''The Annals of Statistics'', 21, 798 - 829. |
|||
⚫ | |||
{{DEFAULTSORT:Khmaladze Transformation}} |
{{DEFAULTSORT:Khmaladze Transformation}} |
Revision as of 09:02, 23 August 2013
This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these template messages)
No issues specified. Please specify issues, or remove this template. |
In statistics, the Khmaladze transformation is a mathematical tool used in constructing convenient goodness of fit tests for hypothetical distribution functions. More precisely, suppose are i.i.d., possibly multi-dimensional, random observations generated from an unknown probability distribution. A classical problem in statistics is to decide how well a given hypothetical distribution function , or a given hypothetical parametric family of distribution functions , fits the set of observations. The Khmaladze transformation allows us to construct goodness of fit tests with desirable properties. It is named after Estate V. Khmaladze.
Consider the sequence of empirical distribution functions based on a sequence of i.i.d random variables, , as n increases. Suppose is the hypothetical distribution function of each . To test whether the choice of is correct or not, statisticians use the normalized difference,
This , as a random process in , is called the empirical process. Various functionals of are used as test statistics. The change of the variable , transforms to the so-called uniform empirical process . The latter is an empirical processes based on independent random variables , which are uniformly distributed on if the s do indeed have distribution function .
This fact was discovered and first utilized by Kolmogorov(1933), Wald and Wolfowitz(1936) and Smirnov(1937) and, especially after Doob(1949) and Anderson and Darling(1952), it led to the standard rule to choose test statistics based on . That is, test statistics are defined (which possibly depend on the being tested) in such a way that there exists another statistic derived from the uniform empirical process, such that . Examples are
and
For all such functionals, their null distribution (under the hypothetical ) does not depend on , and can be calculated once and then used to test any .
However, it is only rarely that one needs to test a simple hypothesis, when a fixed as a hypothesis is given. Much more often, one needs to verify parametric hypotheses where the hypothetical , depends on some parameters , which the hypothesis does not specify and which have to be estimated from the sample itself.
Although the estimators , most commonly converge to true value of , it was discovered (Kac, Kiefer and Wolfowitz(1955) and Gikhman(1954)) that the parametric, or estimated, empirical process
differs significantly from and that the transformed process , has a distribution for which the limit distribution, as , is dependent on the parametric form of and on the particular estimator and, in general, within one parametric family, on the value of .
From mid-50's to the late-80's, much work was done to clarify the situation and understand the nature of the process .
In 1981, and then 1987 and 1993, E. V. Khmaladze suggested to replace the parametric empirical process by its martingale part only.
where is the compensator of . Then the following properties of were established:
- Although the form of , and therefore, of , depends on , as a function of both and , the limit distribution of the time transformed process
is that of standard Brownian motion on , i.e., is again standard and independent of the choice of .
- The relationship between and and between their limits, is one to one, so that the statistical inference based on or on are equivalent, and in , nothing is lost compared to .
- The construction of innovation martingale could be carried over to the case of vector-valued , giving rise to the definition of the so-called scanning martingales in .
For a long time the transformation was, although known, still not used. Later, the work of researchers like Koenker, Stute, Bai, Koul, Koening, and others made it popular in econometrics and other fields of statistics.[citation needed]
See also
References
- Khmaladze, E. V. (1981). "Martingale Approach in the Theory of Goodness-of-fit Tests". Theor. Prob. Appl. 26 (2): 240–257. doi:10.1137/1126027.
- Khmaladze, E. V. (1993). "Goodness of fit Problems and Scanning Innovation Martingales". Annals of Statistics. 21 (2): 798–829. JSTOR 2242262.
- Koul, H. L.; Swordson, E. (2011). "Khmaladze transformation". International Encyclopedia of Statistical Science. Springer. pp. 715–718. doi:10.1007/978-3-642-04898-2_325.