# Talk:Probit

WikiProject Statistics (Rated Start-class, Low-importance)

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

Start  This article has been rated as Start-Class on the quality scale.
Low  This article has been rated as Low-importance on the importance scale.
WikiProject Mathematics (Rated Start-class, Low-importance)
This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Mathematics rating:
 Start Class
 Low Importance
Field: Probability and statistics

## Percentile

shouldn't the graph's x-axis be in percentile and not in probability? The range of CDF is percentile (0,1) = (0th percentile, 100th percentile), whereas the range of normal distribution is probability (0,1) = (0%,100%). Since the probit is the CDF inverse, it's domain is CDF's range, which is in percentile, not probability. Thoreaulylazy 15:20, 3 October 2006 (UTC)

The range of the CDF in the graph is probability, right? However, this is an article about the probit, not the normal CDF, so it should probably remain a map from [0,1] to [-Inf, Inf]. --Pdbailey 00:49, 4 October 2006 (UTC)
I agree that Probit is a map from [0,1] to [-Inf,Inf], the problem is that [0,1] can be interpreted as 0% to 100% or 0th percentile to 100th percentile. The correct interpretation should be percentile not probability. An article about the probit function should be consistent with an article on the CDF function because they are inverses of each other. That is, the domain and range for CDF becomes the range and domain for Probit. The complication on whether to use "percentile" or "probability" stems from the fact that a percentile is a form of probability, "the probability that a random sample from the set is BELOW". That is, it is equivalent to say (a) a student John is ranked 70th in a class of 100, as it is to say (b) a random student from the class has a 70% probability of having a lower rank than John. Most statisticians, however, would simply say "John is at 70th percentile" -- no statistician would ever say "John is at 70% probability" (although they might say, if they're trying to be unnecessarily wordy, "a random student from the class has a 70% probability of being below John"). Thus, knowing the mean test score (M) and the standard deviation (D), we can infer John's score as M + D*Probit(.7). The domain of the Probit function is clearly percentile (.7 = 70th percentile). -- Thoreaulylazy 18:41, 7 December 2006 (UTC)
Refer also to the articles on rankit and Q-Q plot. DFH 19:23, 29 January 2007 (UTC)

## This is a bit heavy

Couldn't this article have a simpler summary? The lingo used in this article suggests to me that anybody who understood it would probably know what a probit was already anyway. —The preceding unsigned comment was added by 195.194.178.251 (talk) 09:46, 12 January 2007 (UTC).

I'd say "a bit" heavy is an understatement. The very first phrase of this article contains at least four terms I do not understand, and I did get some Statistics 101 in my days... Yoe (talk) 02:18, 11 August 2008 (UTC)

In my opinion article is great, espacialy part about computation in R. Hope to see more articles with usefoul studd like that 212.75.100.130 (talk) 18:50, 21 May 2009 (UTC)

## Needs More History and Motivation

1. There is a problem with identifying your readers, and writing appropriately. One reader might be an EPA manager with one course in statistics, who has heard that the scientists are using "LD50s" which are computed using something called "probit analysis." Another may be a statistician who thinks "wonder what Wikipedia has on probit."

2. Looks like this is mostly about math. You probably want to think about subject-matter motivation. Important areas of application include psychometrics, econometrics, and toxicology. A conventional toxicological explanation is based on a lognormal distribution of tolerances, where the tolerance of a single organism is the dose just high enough to elicit a particular type of "response" (not necessarily death).

3. I think it would be good to include some more about statistical inference, including some interesting history. The linear relation between probit of expected response and the predictor is very often taken by the statistically naive as an indication that inference is a case of ordinary linear regression. In fact, maximum likelihood estimation is pretty standard. Historically, it seems you could about say that probit is the original "generalized linear model." The scoring algorithm used in glm comes from an appendix of Bliss paper, written by R.A. Fisher.

4. Possible links are bioassay, toxicology, generalized linear models, R.A. Fisher.

Dfarrar 04:37, 20 February 2007 (UTC)

Maybe I'm mistaken, but I understood that probit stood for "PROBability of Insect Toxicity" and not "PROBability unIT"? The original probit model was to allow the useful linear model structure to be mapped into the binary outcome "bug lives" or "bug dies" by estimating the probability of the response state rather than estimating an inapplicable linear response. Anyone care to comment? 75.146.224.18 (talk) 00:33, 29 August 2009 (UTC)

Bliss's 1934 paper in Science simply says "These arbitrary probability units have been termed 'probits' and are given above in an abbreviated table". He gives no further explanation of the motivation for the term. "Probability of Insect Toxicity" gives 0 hits on Google. Smells like a backronym to me. --Qwfp (talk) 13:26, 29 August 2009 (UTC)

## what goes here vs. what goes under probit model

I think this should be limited pretty much to defining the probit function and alluding to the important uses. Additional detail on probit modeling should probably go under probit model. I hope my additions conform to that suggestion. I suggest finding some links related to the material I just put in on diagnosing deviation from normality. Thanks for the nice details on Bliss's work. Dfarrar 03:51, 31 March 2007 (UTC)

I'm not sure why this deserves an entry. The inverse of a CDF is not exactly special. All the interesting stuff regards the probit model. BTW, the recent entry about diagnosing non-normality is better covered by Q-Q_plots, or at least is just a special case of Q-Q plots. Pdbailey 17:28, 1 April 2007 (UTC)

I could live with merging this into probit modeling. Thanks for pointing out the QQ plot article.Dfarrar 00:29, 2 April 2007 (UTC)

Seems there should a reference to D.J. Finney's contribution to Probit Analysis. Below is a quote from page 6 of Probit Analysis, by D.J. Finney, 2nd edition Cambridge, At The University Press 1962.

"The statistical treatment of quantal assay data has been much aided by the development of Probit Analysis. This method, which is usually attributed to Gaddam (1933) and Bliss (1934a, b; 1935a, b) though it has, in fact, a much longer history (& 14), has now been widely adopted as the standard method of reducing the data to simple terms."

Richard Daum, retired, at one time employed as an analytical statistician by US Dept of Agriculture.

Richard, Could you provide the text of reference 14? Also, why do you think that belongs here and not at probit model? Pdbailey 16:18, 8 April 2007 (UTC)

14? (cannot duplicate original icon)references Section (?), page 42, in Chapter 2 with subtitle: HISTORY OF THE PROBIT METHOD in D.J. Finney's Probit Analysis. In response to your suggestion of why D.J. Finney's Probit Analysis belongs in Probit Model -- the answer is it seemed Finney's Probit Analysis work fit more appropriate here.

## error function

I think this entire page is a little silly grab bag of topics, but pointing out that the error function (a function defined by the CDF of the normal) is the inverse of the inverse of the normal's CDF seems a little beyond the pail to me. Pdbailey 21:17, 16 May 2007 (UTC)

I tentatively disagree with point 1 and agree on point 2. Unless someone comes forward to explain the significance of the erf thing, I think we should take it out.
I think the page can be used for certain things, especially connected with the history of use in toxicology, that might get in the way for your previously suggested, combined article on modeling of dichotomous data. For an example, I have just reviewed two probit programs. The coefficient estimates are equal except that the intercept terms differ by a value of 5. The article serves the interest of toxicologists by providing information on why such things can happen. Most related material is better placed inder probit model or some other regression-oriented article. If you want it worked into your combined article, suggest a plan. If I am not mistaken, you emphasize econometrics. Dfarrar 18:19, 23 May 2007 (UTC)

## here is the deleted erf() stuff

I think the erf stuff is fairly technical and needs some generally acceisible statement of it's significance. Here it is:

"The probit function may be expressed in terms of the inverse of the error function:

$\Phi^{-1}(p)=\sqrt{2}\,\operatorname{erf}^{-1}(2p-1)$