|WikiProject Statistics||(Rated B-class, Low-importance)|
|WikiProject Mathematics||(Rated B-class, Low-importance)|
Which percentile to use
I'm not sure about the line "where Ta is the 100(1 − (p/2))th percentile of Student's t-distribution..." for a 100p% prediction interval. For example, for a 90% prediction interval that would be the 55th percentile, which doesn't sound right - or am I missing something?
Perhaps it should instead read "where Ta is the 100(1 − (α/2))th percentile of Student's t-distribution..." for a 100(1-α)% interval, also replacing p by 1-α in the line above (i.e. α is the error rate in the prediction, whereas p was the success rate). For a 90% prediction interval (α=0.1) that would mean using the 95th percentile, which sounds more reasonable.
For possible support for this formulation see http://www.amstat.org/publications/jse/secure/v8n3/preston.cfm which defines α in the same way and uses the 100(α/2)th and 100(1-(α/2))th percentiles of a general distribution. Also http://www.math.umd.edu/~jjm/tpredictionintervals.pdf, which uses the 100(α/2)th percentile of the t-distribution - I assume that the choice of 100(α/2)th or 100(1-(α/2))th percentile depends on how your t-distribution tables are written.
Alternatively, the definition of p as a success rate in the article could be retained by referring to the 100((1+p)/2)th percentile of the t-distribution, in which case the error rate α would not need to be introduced.
Richard J Price 10:54, 22 March 2007 (UTC)
In agreement to the table on the student's t page, if T_a is the 100((1+p)/2)th percentile, then P(T<T_a)=(1+p)/2=(1+1-alpha)/2=1-alpha/2, with T student t distributed, which is the correct error for two-sided interval (see confidence interval). Gummif (talk) 00:49, 3 August 2013 (UTC)
Could we please get another example, with a population variable such as apple width or orange peel thickness, instead of a bunch of abstract equations? Thanks in advance. 22.214.171.124 21:41, 19 April 2007 (UTC)
- I’ve elaborated and given some simpler and clearer examples, notably the simple non-parametric estimation – hope it’s clearer now!
- —Nils von Barth (nbarth) (talk) 17:21, 19 April 2009 (UTC)
Why exactly is this stated --- "In Bayesian statistics, one can compute (Bayesian) prediction intervals from the posterior probability of the random variable, as a credible interval. In theoretical work, credible intervals are not often calculated for the prediction of future events, but for inference of parameters – i.e., credible intervals of a parameter, not for the outcomes of the variable itself"? --- It's quite common in practice to create a posterior predictive distribution which gives you an interval for the actual outcome of the variable itself. 126.96.36.199 (talk) 19:23, 30 April 2011 (UTC)
unclear on scope
The article could possibly be clarified by relating a prediction interval to a tolerance interval. The intro currently uses language that a prediction interval is not normally appropriate for, although terminology in this area can be a bit inconsistent:
- an estimate of an interval in which future observations will fall, with a certain probability, given what has already been observed
If not read carefully, this could imply that a prediction interval is an interval bounding n% of all future samples from a process, which would be equivalent to n% population coverage, which is not typically what a prediction interval gives you (except on average). However I'm not entirely sure where to go in clarifying this article. I've started by expanding tolerance interval instead. --188.8.131.52 (talk) 14:14, 26 August 2011 (UTC)
The source of confusion is clearly explained by Melcombe on the project page. A prediction interval [L,U] is an interval such that for a future observation X it holds: P(L<X<U) has a given value. For the standard score Z of X therefore it gives:
By determine the quantile z such that
- I still think it's necessary to mention standard score in the article. Let's continue the issue at that project page: Wikipedia_talk:WikiProject_Statistics#Standard_score. Mikael Häggström (talk) 18:57, 13 May 2012 (UTC)
If Known mean and known variance then it is not a prediction interval but a tolerance interval
Maybe this is just about semantics but if you agree then we should remove the example for "Known mean, known variance" and just link this case to Tolerance interval, what you think?
- There should be a link to [[tolerance interval], but the material should stay as the structure of the intervals needs to be compared across the cases where the parameters needs to be estimated or not. However, overall the univariate normal example is somewhat long and written at a textbook level, so perhaps this part can be reduced. Melcombe (talk) 07:03, 18 July 2012 (UTC)
On known mean, unknown variance
In this case, for normal population we have:
is chi-squared distributed with n degrees of freedom;
is t-distributed with n degrees of freedom, i.e. the statistic
I don't catch!
When forecasting a growth curve (x1, x2, ..., xn), then P(xi < xi+1) > P(xi > xi+1).
In facts, P(xi < xi+1) = 1-e where e is of the order of magnitude of the error on data.
When looking in my text book, I see the best estimate for is has an expectation of , and standard-deviation .
This implies that the error on the forecast estimate is mimimum for and widens as increases. It also implies that the confidence interval for the best estimator of , is always wider than the confidence interval for .