|WikiProject Statistics||(Rated Start-class, High-importance)|
Can I get a bit more background? What is the probit model? What does it do? What was it developed for? How was it developed? As is, the article provides minimal explanatory text before launching into near gibbrish.Theblindsage (talk) 08:19, 26 November 2013 (UTC)
Citation for Bliss
The citations for Bliss's papers are already in the article. I added McCullagh & Nelder which claims this was developed by Bliss. I thus then removed the cite needed tag. Baccyak4H (Yak!) 17:39, 11 June 2007 (UTC)
- Where is the cite in M&N (i.e. what page?) Pdbailey 02:18, 12 June 2007 (UTC)
- The same book (p 43) says that the method of solving the probit was in an appendix to the same article, written by Fisher. How can one be so sure that Fisher didn't influence Bliss? Pdbailey 13:10, 12 June 2007 (UTC)
- I read that ref to imply Fisher's contribution was using the scoring method in the context of the model itself. However, without immediate access to the paper, that is just my best educated interpretation/guess. If reading the paper in its entirety suggests Fisher could get co-credit (or even most or all of the credit) for the model itself, this article should reflect that. If you an get a copy, that would be great. 17:50, 12 June 2007 (UTC)
This is a pretty dense little article. I think a gentler intro should be developed. But more specifically now, I take issue with some wordings that are ambiguous or potentially a bit haughty:
- "Because the response is a series of binomial results, the likelihood is often assumed to follow the binomial distribution.": What does that mean? I myself understand how there is an assumption of normally distributed unobservable errors in the latent variable formulation of this model, but how or where is there a binomial distribution assumption in the model?
- I believe because: in the true relationship (as in, with no errors), the latent variable, Y*, falls between negative infinity and positive infinity, but usually close to one. Still in the true relationship, Y would always be one or zero for any Y*, based upon whether Y* is positive or negative. Thus, adding "normally distributed unobservable errors" to Y*, there is a probability p that Y will be 0 or 1 based upon the how far Y* is from 0 and the standard deviation of errors. Accordingly, for some true value Y*0, the probability of observing 0 for Y is equal to the normalcdf of 0 given a mean of Y*0 and a standard deviation s. In the sample, there will often be only one observation for each value of Y* observed, so for each Y* there is one bernoulli trial. However, as it is theoretically possible to get many observations which yeild the same Y*, the distribution of observable Y would be binomial, as a binomial distribution is constructed of many bernoulli trials. Assuming homoschedastic errors, p is only related to Y* and something like a single binomial distribution can be inferred from all values of Y* in the results. Can someone comment on this? 188.8.131.52 (talk) 04:16, 27 April 2011 (UTC)
- "The parameters β are typically estimated by maximum likelihood." Actually, how else can they be understood or estimated? I have the impression that there are different ways that this or any other maxiumum likelihood model parameters can be found, numerically. But it is a maximum likelihood model, so what is meant by the suggestion this is "typically" but not always a maximum likelihood model. I am clearly missing something, or the language is imprecise.
- "While easily motivated without it, the probit model can be generated by a simple latent variable model." Easily motivated by whom / how? I object to the "easily" word.
- "Then it is easy to show that" should be changed to Then it can be shown that". Easy is subjective, and I think it comes across wrong to general readers of the encyclopdia, the vast majority of whom will not find anything easy about showing that. doncram (talk) 23:40, 21 May 2009 (UTC)
- What do you mean by saying that it's a "maximum likelihood model"? There's nothing in the model itself that says anything about maximum likelihood, and one can readily imagine methods other than maximum likelihood for estimating the parameters. For example, if one has a prior probability distribution of the parameters, then one could use the posterior expected values of the parameters as estimates. You are right to say that you're clearly missing something. I don't think the language at that point is imprecise. Michael Hardy (talk) 00:43, 22 May 2009 (UTC)
- Hmm, thanks for responding, that helps me a bit. As a reader, I am really already invested in understanding it as a maximum likelihood model. Given data, I can't really absorb how you could (and why you would) choose any other method of estimating the model, besides trying to figure out what are the parameters that are most likely to have resulted in the observed data (given an assumption of normal errors in the latent variable model). You suggest that i could also want to take into account a prior distribution. But then, I absorb that only as a broader maximum likelihood problem: there was previous data that is summarized in some informed priors, and then there is some new data. I don't exactly know how to do this necessarily, but I would want to use a maximum likelihood approach to combine the priors and new data to come up with new estimates. I wonder then: Is there a non-MLE based approach (which would also have a Bayesian perspective extension)? Is there some non-MLE approach to estimating the parameters of the model that has ever been used for practical purposes? In a regular linear regression context, i do understand other alternatives, but here i do not. doncram (talk) 01:23, 22 May 2009 (UTC)
- I don't see any "messed up" math displays. They're all working normally for me.
- Posterior expected values are not "extended" maximum likelihood estimates. It is true that you would still use the likelihood function, but the estimates would not always correspond to points that maximize it. "most likely to have resulted in the observed data" is an often heard but misleading phrase. MLEs are not the parameter values that are most likely, given the data; the are the values that make the data more likely than the data would have been with any other parameter values. Let's say you want to estimate the frequency with which car accidents happen on a certain highway. You observe it for an hour. No accidents happen. Then the maximum likelihood estimate of the frequency is zero. Why, then, might you want to use any other estimate? Or consider the German tank problem: tanks are numbered 1, 2, 3, ..., N and you want to estimate N. A sample of 20 tanks gives various numbers in the range from 1 to 9541. The largest number observed is 9541. What, then, do you take to be an estimate of the total number of tanks? The MLE is exactly 9541. But does it not seem likely that the very highest number, corresponding to the exact number of tanks, is not in the sample of 20, and therefore that number is probably higher than 9541? That does not conflict with the fact that the data you actually observed are more probable if the total number of tanks is 9541 than if it is bigger than that. Why, then, might you want to use any other estimate? Michael Hardy (talk) 02:16, 22 May 2009 (UTC)
Berkson's minimum chi-square method
In the section of the same name in the article, what the heck is going on? What is the beta value that is being estimated? The article now reads
Its advantage is the presence of a closed-form formula for the estimator, and possibility to carry out analysis even when individual observations are not available, only their aggregated counts , , and (for example in the analysis of voting behavior).
- A possible reference for Berkson’s method is «Amemiya, Takeshi (1985). Advanced Econometrics. Harvard University Press. ISBN 0-674-00560-0.». The β being estimates is exactly the same beta which was in the definition of the model: P[Y=1|X] = Φ(X ′β). The presence of closed-form solution is indeed an advantage, as it is generally more difficult to implement the maximization routine. As for the “applicability to tabular data” — I’m not sure about that, better to find the original article. ... stpasha » talk » 07:50, 4 August 2009 (UTC)
Dr. Winkelmann's comment on this article
Dr. Winkelmann has reviewed this Wikipedia page, and provided us with the following comments to improve its quality:
"A probit model is a popular specification for an ordinal or a binary response model. "
Drop reference to "ordinal". This is confusing.
Keep consistent notation: Transpose is first denoted by "T", later by " ' ".
The maximum likelihood estimator need not exist if there is perfect separation.
There should be a subsection on how to interpret probit results: non-constant marginal, or partial, effects: they go to zero, as Pr approaches to 1 or 0.
We hope Wikipedians on this talk page can take advantage of these comments and improve the quality of the article accordingly.
We believe Dr. Winkelmann has expertise on the topic of this article, since he has published relevant scholarly research:
- Reference : Rainer Winkelmann, 2009. "Copula-based bivariate binary response models," SOI - Working Papers 0913, Socioeconomic Institute - University of Zurich.