# Talk:Logistic regression

WikiProject Statistics (Rated C-class, High-importance)

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

C  This article has been rated as C-Class on the quality scale.
High  This article has been rated as High-importance on the importance scale.

## proposed merge

I think this article should be mergedb with logit Pdbailey 03:19, 20 April 2006 (UTC)

Interesting question: the logit link function is the inverse of the logistic function, which also has its own article (that talks about epidemiology, etc.). However, I think the inverse function is only really used in logistic regression, so the merge does make sense. -- hike395 14:31, 20 April 2006 (UTC)

NO! This suggestion is totally wrong! logit and logistic and logistic regression are different things,vvggghh and should not be mixed up merely because they have close relations. For example, I was initially interested in logistic function, then thought logit can be used for other purpose, but not the logistic regression. And I don't need to know the logistric regression at all to use logistic function. --Pren 14:43, 24 April 2006 (UTC)

Pren, thanks for the input. Can you please expand on why you think the two should not be merged. I'm specifically interested in what purpose you had for using the logit function that was not associated with logistic regression. Thanks a lot for including your input. Pdbailey 16:45, 24 April 2006 (UTC)

I agree with Pren. The logit function is just a function, easily described by a formula. Logistic regression is a mathematical procedure in applied mathematics, that makes use of the logit function. Of course both articles should link to each other, but they are different things, objects of a different category and complexity. The logit function is interesting in itself. --zeycus 11:25, 25 April 2006 (UTC)

Pren and Zeycus, have you read the logit entry and the wikipedia page on [[wp:mm|merges]? It looks to me like these pages meet the second or third criteria for merging, which are
• 'There are two or more pages on related subjects that have a large overlap. Wikipedia is not a dictionary; there doesn't need to be a separate entry for every concept in the universe. For example, "Flammable" and "Non-flammable" can both be explained in an article on Flammability.
• 'If a page is very short and cannot or should not be expanded terribly much, it often makes sense to merge it with a page on a broader topic.'
Certainly the portion of the logit page that is on logistic regression can be merged with this page and then removed from that page. Once that is done, the logit page is one paragraph long and is either a stub or should be deleted. This raises the question, are we then making the wikipedia a dictionary by including it? If there is some real substantive material about logit that does not fold well into other areas, probably not -- it should probably stay. But I don't think the logit is as important a function as, say, the gamma function. I take as evidence of its unimportance that it does not appear in "Handbook of Mathematical Functions." Just saying that it is a function distinct from the regression that it is often used for does not seem sufficient. Pdbailey 14:42, 25 April 2006 (UTC)
Thank you, Pdbaley, I read what you suggested and I see your point. So now the question seems subjective to me, I don't dare to defend any of the options. --zeycus 16:41, 26 April 2006 (UTC)
I tend towards inclusionism -- I recommend that we delete the overlapping part of logit, but then leave the rest alone: it may expand in the future to include history or other applications, who knows? -- hike395 17:31, 26 April 2006 (UTC)
Hike395, -isms aside, can you identify what is included in logit that should not be included in this page? I'm not sure I see anything. Pdbailey 04:59, 27 April 2006 (UTC)
In mathematics, especially as applied in statistics, the logit (pronounced with a long "o" and a soft "g", IPA /loʊdʒɪt/) of a number p between 0 and 1 is
${\displaystyle {\rm {logit}}(p)=\log \left({\frac {p}{1-p}}\right)=\log(p)-\log(1-p).}$
Plot of logit in the range 0 to 1, base is e
The logit function is the inverse of the "sigmoid", or "logistic" function. If p is a probability then p/(1 − p) is the corresponding odds, and the logit of the probability is the logarithm of the odds; similarly the difference between the logits of two probabilities is the logarithm of the odds-ratio, thus providing an additive mechanism for combining odds-ratios.
-- hike395 05:40, 27 April 2006 (UTC)

Okay, I can see why (given your wikipedia philosophy) you want to keep that article and I think it's just subjective at this point. I'll just say now that short of another voice we can use your proposed text. That said, let me tell you why I disagree that there is value in having that article separate. I think that it might make new users think that it is all wikipedia has to say on logistic regression. Afterall, search logit on google and you get that page. If you still disagree, again, I'll concede the point and I'll update both articles after a few days for others to throw in their two cents. Pdbailey 14:20, 27 April 2006 (UTC)

Yes, I can see your point, it's valid. How about we append
The logit function is an important part of logistic regression: for more information, please see that article.
Would that take care of your objection? -- hike395 03:33, 28 April 2006 (UTC)

I am reading 'Gatrell, A.C. (2002) Geographies of Health: an Introduction, Oxford: Blackwell.' today and it discusses 'logistic regression model' in a health geography context of case and controls. It appears to be mentioned in a lot of academic literature, why is wikipedia trying to call it something else?Supposed 19:22, 9 May 2006 (UTC)

This article would be more useful if an example could be given of how logistic regression is used in statistical analysis. For example, it would be great if someone could use actual data to describe how logistic regression makes X concept more clear. Zminer 01:58, 15 May 2006 (UTC)

I know it is a long time since this issue was discussed, but if I may, can I say that I am very happy that this page was not merged with the logistic function. I specifically searched for 'logit' before I found out that it was the inverse of the logistic function, as I needed a basic knowledge for my PhD viva. I'm pleased to say that I passed and can attribute some of my useful revision to WP. I have since contributed to the page myself. This is a nice example of what Wikipedia is about - accessible, useful knowledge that we can all build upon. Thanks guys. Davwillev 15:54, 25 July 2007 (UTC)

A happy middle ground might be to include some more background material on logistic regression. What about why it is used, when it is used? It would not help me at all to have it merged with another (to me) obscure statistical term, rather it would help me to develop the page so I can understand it! SallaCT 12:47, 18 August 2007 (UTC)

Actually, this is a sad middle ground because the text you are interested in isn't present. The fact that two highly related terms aren't merged undoubtedly contributed to that. Pdbailey 22:24, 19 August 2007 (UTC)

### decision

no merge was performed, it's been quite a while since this was open. Pdbailey 03:56, 21 August 2007 (UTC)

## Mistake?

What does i, = 1, ..., n mean? What the comma after i stands for? -- Neoforma 12:54, 13 July 2006 (UTC)

Was just about to answer the wrong question. Yup, that's a mistake. — cBuckley (TalkContribs) 17:42, 13 July 2006 (UTC)

## binomial distributed errors?

Since when does the logit model have binomial distributed errors? This must be standard (with mean 0 and s=1) logistic distributed.

I improved the wording of this part to be more accurate. Have a look. Baccyak4H (talk) 17:42, 22 November 2006 (UTC)

Along a similar line, the article read that the dependant variable was bernouli distributed, I updated this to bionomially distributed because the binomial is a generalization of the bernouli to more than one trial. Perhaps this further clarifies things. Pdbailey 01:39, 5 March 2007 (UTC)

I agree in principle that binomial is correct and more general than Bernoulli so in some sense preferred. However, the article refers to the Yis equalling 1, which means the context here is considering any binomial Y as rather several Bernoulli Y. So I would leave the description as Bernoulli, unless the math notation were rewrote to reflect binomial. And come to think of it, how would one even do that? Baccyak4H (Yak!) 17:54, 11 June 2007 (UTC)
Baccyak4H, (1) I think we should be as general as possible, for now, lets just note that the example is worked for a specific case of the Binomial distribution. (2) if you don't know how, I'd suggest that you read Generalized Linear Models by McCullagh and Nelder (1989), table 2.1 on page 30. This shows how the binomial fits in the exponential family from, which allows for the fitting technique used in the book. BTW, I don't think I just did an RV on this page, but if I did, I'm sorry and you can undo it pending the conclusion of this conversation. Pdbailey 02:23, 12 June 2007 (UTC)
You did mention elsewhere the page could use a rewrite :). Without even looking at M&N, just thinking about GLMs made me realize one could reformulate in terms of expectations of the binomial, as in E(Yi)/ni, rather than probabilities. The article still needs work though.
I would note that the binomial/Bernoulli debate has gone back and forth in the article history. In a perfect world, it would stay binomial, but in that world the text would be consistent with binomial too. Let's see what we can do. Baccyak4H (Yak!) 13:14, 12 June 2007 (UTC)

## rewrite tag

This article is a series of barely strung together thoughts, most of which are half there. Most of which probably should be or are already done better in another page. As an example, the concept of a link function and interpretation is covered much better in Generalized_linear_model. The applications section comes second and doesn't ever explain what "lift" is, but it appears to be the effect of the link function. Why is it surprising that a link has an effect? Why include this? Why havemore than one link to GLM? I could go on. Pdbailey 18:29, 29 May 2007 (UTC)

We both have put some good work in, and it does read better. The big thing in my eyes is that the example is not of logistic regression but rather just of a calculation of odds. It could use a better example. Baccyak4H (Yak!) 17:42, 14 June 2007 (UTC)

## Remove Jarrow Turnbull model?

Is there a reason to include the Jarrow Turnbull Model section in this page? Is there a reason that the logistic regression has to be used for this model and not just a binomial regression in general? Would anyone object to removing this section and moving it to a "see also". Pdbailey 15:29, 13 June 2007 (UTC)

I am not familiar with that model in general. Its article reads even poorer than this one does, so it is little help for me. If it is really sometimes done in ways which are not strictly logistic regression (e.g., other links), but are all analysis of binomial (Bernoulli) defaults, then go ahead and move it. I am going to have a look at the overview section... Baccyak4H (Yak!) 15:46, 13 June 2007 (UTC)
Move it where? Pdbailey 16:54, 13 June 2007 (UTC)
Sorry, meant remove to "see also", which you did. It's starting to look a lot better... Baccyak4H (Yak!) 17:04, 13 June 2007 (UTC)

While it's mentioned that the beta-coefficients can be obtained via maximum likelihood estimation (ostensibly by taking the log-likelihood function and then taking derivatives with respect to the coefficients), how about actually writing up a simple example for obtaining the coefficients and values for p? Fully-solved examples are remarkably helpful to us neophytes.

## sympathy for the novice?

This page is utterly incomprehensible for the novice who just wants a basic idea of what logistic regression analysis *does*. The rigorous math is fine but before diving into it it would be nice to give a more comprehensible introduction and maybe a real world example that might illuminate the topic a bit.

The point above is extremely relevant. Most people do not have a firm understanding of Applied Mathmatics or Statistics in general. Quite a surprise that none of the contributing authors has ventured into making their knowledge understandable for the lay person. The ability to teach or communicate concepts to others is a distinction between an expert and an apprentice. Johnbushiii 18:41, 20 August 2007 (UTC)

I think they must be long gone, and this page, like so many of the GLM related pages has almost no editors. Any change requires a huge amount of thought to get things going in the right diredtion and to hang together. It might be just beyond wikipedia to support these articles. Pdbailey 03:36, 21 August 2007 (UTC)
I agree, although point out that it is hard to find third party references to it (second party is easy, journals and the like, but that's different). Perhaps Ed Tufte's proposal of rethinking the O-ring data leading up the the Challenger disaster might be a good start. Let me look it up. Baccyak4H (Yak!) 18:02, 13 September 2007 (UTC)
Hmm, no, that example was not logistic regression (although it could be if I did some OR). I hope to take another look or two to improve the article. Baccyak4H (Yak!) 18:18, 13 September 2007 (UTC)
Wikipedia's statistics articles are usually excellent. This is the poorest one I've seen. It sounds like it was written by a student who had just learned the concept formally, and didn't really understand it yet. 131.107.0.73 23:03, 15 November 2007 (UTC)
Generally I've found that statistics articles not saying very much (although a few of them do) and consequently incoprehensible, in contrast to math articles generally, which explicitly define the concepts they're about and consequently are comprehensible (except when they're on a topic in some area of math in which you don't know the basic definitions). Michael Hardy 23:21, 15 November 2007 (UTC)
This is my suggestion for a re-written introduction: 1. Regression models are a group of statistical methods to describe the relationship between multiple risk factors and an outcome. 2. Linear regression is a type of regression model that is used when the outcome is binary or dichotomous (that is, the outcome can only take one of two possible values, like lived/died or failed/succeeded).
This clearly explains what logistic regression is commonly used for, and tells the reader briefly when it is used. The current introduction simply does not provide enough context for the lay reader. We could also add a section at the end for links to chapters describing other regression models like linear regression. --Gak (talk) 02:02, 16 December 2007 (UTC)
As a novice, most wikipedia articles on statistics are useless. An encylopedia article should present basic information, and direct users to more detailed information at other entries. Someone has written a very fine statistics textbook, in wiki-form, that is useless to either laymen or novices.Theblindsage (talk) 08:14, 26 November 2013 (UTC)

## Logistic regression for the layman

Here follows my proposed explanation for the layman. I will post this on 6 Feb if there are no revisions or objections.

Figure 1. The logistic function, with z on the horizontal axis and f(z) on the vertical axis.

An explanation of logistic regression begins with an explanation of the logistic function:

${\displaystyle f(z)={\frac {1}{1+e^{-z}}}}$

A graph of the function is shown in figure 1. The "input" is z and the "output" is f(z). The logistic function is useful because it can take as an input, any value from negative infinity to positive infinity, whereas the output is confined to values between 0 and 1. The variable, z represents the exposure to some set of risk factors, while f(z) represents the probability of a particular outcome, given that set of risk factors. The variable z is a measure of the total contribution of all the risk factors used in the model.

The variable z is usually defined as

${\displaystyle z=\beta _{0}+\beta _{1}x_{1}+\beta _{2}x_{2}+\beta _{3}x_{3}+\cdots +\beta _{k}x_{k},}$

where ${\displaystyle \beta _{0}}$ is called the "intercept" and ${\displaystyle \beta _{1}}$, ${\displaystyle \beta _{2}}$, ${\displaystyle \beta _{3}}$, and so on, are called the "regression coefficients" of ${\displaystyle x_{1}}$, ${\displaystyle x_{2}}$, ${\displaystyle x_{3}}$ respectively. The intercept is the value of z when the value of all the other risk factors is zero (i.e., the value of z in someone with no risk factors). Each of the regression coefficients describes the size of the contribution of that risk factor. A positive regression coefficient means that that risk factor increases the probability of the outcome, while a negative regression coefficient means that that risk factor decreases the probability of that outcome; a large regression coefficient means that that risk factor strongly influences the probability of that outcome; while a near-zero regression coefficient means that that risk factor has little influence on the probability of that outcome.

Logistic regression is a useful way of describing the relationship between one or more risk factors (e.g., age, sex, etc.) and an outcome such as death (which only takes two possible values: dead or not dead).

The application of a logistic regression may be illustrated using a fictitious example of death from heart disease. This simplified model uses only three risk factors (age, sex and cholesterol) to predict the 10-year risk of death from heart disease. This is the model that we fit:

${\displaystyle \beta _{0}=-5.0{\text{ (the intercept)}}}$
${\displaystyle \beta _{1}=+2.0}$
${\displaystyle \beta _{2}=-1.0}$
${\displaystyle \beta _{3}=+1.2}$
${\displaystyle x_{1}={\text{ age in decades}}}$
${\displaystyle x_{2}={\text{ sex, where 0 is male and 1 is female}}}$
${\displaystyle x_{3}={\text{ cholesterol level, in mmol/dl less 5.0}}}$

Which means the model is

${\displaystyle {\text{Risk of death}}={\frac {1}{1+e^{-z}}}{\text{, where }}z=-12.0+2.0x_{1}-1.0x_{2}+1.2x_{3}}$

In this model, increasing age is associated with an increasing risk of death from heart disease (z goes up by 2.0 for every 10 years over the age of 50), female sex is associated with a decreased risk of death from heart disease (z goes down by 1.0 if the patient is female) and increasing cholesterol is associated with an increasing risk of death (z goes up by 0.2 for each 1 mmol/dl increase in cholesterol).

We wish to use this model to predict Mr Petrelli's risk of death from heart disease: he is 50-years-old and his cholesterol level is 7.0 mmol/dl. Mr Petrelli's risk of death therefore ${\displaystyle ={\frac {1}{1+e^{-z}}}{\text{, where }}z=-5.0+(+2.0)(5.0-5.0)+(-1.0)0+(+1.2)(7.0-5.0).}$

Which means that by this model, Mr Petrelli's risk of dying from heart disease in the next 10 years is 0.07 (or 7%). --Gak (talk) 06:49, 1 February 2008 (UTC)

Old example section removed because there is already an example given in the new layman's section.
The old example section is reproduced here:

Let p(x) be the probability of success when the value of the predictor variable is x. Then let

${\displaystyle p(x)={\frac {1}{1+e^{-(B_{0}+B_{1}x)}}}={\frac {e^{B_{0}+B_{1}x}}{1+e^{B_{0}+B_{1}x}}}.}$

Algebraic manipulation shows that

${\displaystyle {\frac {p(x)}{1-p(x)}}=e^{B_{0}+B_{1}x},}$

where ${\displaystyle {\frac {p(x)}{1-p(x)}}}$ is the odds in favor of success. If we take, say p(50) = 2/3, then

${\displaystyle {\frac {p(50)}{1-p(50)}}={\frac {\frac {2}{3}}{1-{\frac {2}{3}}}}=2.}$

So when x = 50, a success is twice as likely as a failure. Or, it can be simply said that the odds are 2 to 1.

--Gak (talk) 01:12, 12 February 2008 (UTC)

One request: It seems this section was recently taken out. As a student trying to grasp this statistic technique, I found this section to be one of the best lay explanations I had read in statistics. The example offered an intuitive way to help grasp the material. The outline for setting up the model for the variable z and the example that followed was well done. While the formal mathematical definition should always be included I guarantee that most of the people that visited this page in the past got what they needed in the lay explanation section. It should somehow be included again in the main page with as close to the wording above as possible. Cgall (talk) 17:22, 24 September 2008 (UTC)cgall

## For the layman???

You MUST be joking. I don't think I am a dolt. However I am not a mathematician nor a statistician; I am a professional translator (also a linguist and also a contributor to Wikipedia but in language-related articles and such). I looked up this article today because I NEED to know, in a very basic LAYMAN's sort of way, what logistic regression is, what it is about, and ideally (for my purposes) an intelligible explanation of how it works which provides a model of the language that ought to be used when explaining this to someone. That would help me to get my language right in my translation, where I need to translate just such an explanation, in one short paragraph, that forms part of a 170-page report written for a readership that is not expected to know anything about statistics. While that is just what I need from this article, it is also roughly what I expect to find in such an article and roughly what I believe would be expected and found useful by many other Wikipedia users. This fails totally to provide any of that. It is useless to me. Wikipedia has helped me out time and time again which is why I consider it one of my most valuable tools for work. But it wouldn't be if all articles were like this one. I'm sorry to be so negative, but you really do need to get your act together. --A R King (talk) 17:23, 2 March 2008 (UTC)

To be a little bit less negative and try to help some of you guys out there come down to earth, I though it might be useful if I gave you a snippet of the article I'm working on in the English translation I've done (which may be improvable), so you can see how many lightyears distance separates one kind of discourse from another:

Logistic regression analysis is a technique for identifying the variables that best predict a given event or situation according to a model or equation produced by the analysis itself. In the present case, we used this analysis to find the variables that best determine the occurrence or non-occurrence of certain levels of Basque language use among pupils. This kind of analysis has one strict condition: the variable that is to be predicted must be dichotomic, that is, there can only be two possible values, such as A-or-B, yes-or-no, etc....

--A R King (talk) 17:32, 2 March 2008 (UTC)

I'm glad to see there's been previous discussion of the readability of this article, and some suggestions. I think there are improvements to make. I like the general approach of explaining what logistic regression is useful for, first.

Also I have problems with the current exposition, which starts off in the first sentence with "logistic regression is a model used for prediction of the probability of occurrence of an event by fitting data to a logistic curve." I think that is not useful, there is no way to explain simply how fitting a logit model is like fitting data to a logistic curve. To speak of curve-fitting as done here would be appropriate if you are literally doing curve-fitting that can be visualized. For example curve-fitting is an exercise of finding parameters of a specific curve that best fits to specified data, in the same way that an ordinary least squares regression is the straight line closest-fitting (by measure of sum of squared deviations) to data that can be plotted. Logistic regression, instead, is a maximum likelihood technique, and there are no obvious curves in plots of data to be fit by a logit model, and it would be very hard to convey how logit regression is curve-fitting.

I have added a small public domain dataset to FitzPatrick 1932 article and may use that in demonstrating logit regression here and/or in a bankruptcy prediction article which i am developing. As this develops, comments would be welcomed. doncram (talk) 17:56, 5 September 2008 (UTC)

## How to estimate parameters?

There's nothing in this article about how to actually do logistic regression, i.e., estimate the parameters, except one sentence ""The unknown parameters βj are usually estimated by maximum likelihood". This is a pretty huge omission. Surely ought to add a section on how to do this. It would probably describe minimizing the cross-entropy function derived from the likelihood function, as in, e.g., section 6.7 of [Christopher M. Bishop, "Neural Networks for Pattern Recognition"]. RVS (talk) 19:44, 26 January 2009 (UTC)

All generalized linear models are fit in (approximately) the same way. There is one speedup available for logit, but agreed that we should mention that this is where this information is. PDBailey (talk) 00:21, 27 January 2009 (UTC)
The generalized linear models page only says "The unknown parameters, β, are typically estimated with maximum likelihood, maximum quasi-likelihood, or Bayesian techniques." RVS (talk) 20:00, 30 January 2009 (UTC)
The method is described in detail in Chapter 2 of McCullagh & Nelder if you would like to add it, I think it would be a great thing to add since GLM are basically similar because of the unified fitting technique. PDBailey (talk) 00:19, 31 January 2009 (UTC)
It could be mentioned that parameter estimation is relatively easy by Newton-Raphson or any other search method, because the loglikelihood function is globally concave. There's a footnote source on the global concavity that i could dig up. So there is no possibility of getting trapped in a local maximum. I think the Newton-Raphson approach is different than the McCullagh & Nelder-described method, not sure what that is, and I am curious what is the speedup available for logit as opposed to other GLM models, by the way. Probably it is the McCullagh & Nelder approach that is implemented in Splus software, which fails to converge for certain datasets, from my experience. doncram (talk) 03:03, 31 January 2009 (UTC)
Doncram, the method is equivalent to Newton-Raphson with a method of approximating the Jacobian. It is globally concave only if X is full rank, this tends to be the problem with non-convergent solutions in R (and S plus?). PDBailey (talk) 15:34, 31 January 2009 (UTC)
More than 2 years later: FYI, i meant that for certain valid datasets, where X is full rank, SPlus software nonetheless fails. My experience was a number years ago, and I reported it on an Splus email list but believe it will not have been changed. My datasets involved some observations where the estimated probability values would have been very close to one and contained little useful information on the relative size of parameters, but were valid observations and should not have crashed the software. The datasets less these observations would estimate fine, and would predict P(y=1|X) = 1 or close thereto for the given X of the omitted observations. The full datasets estimated fine in SAS, with no need to identify and remove these valid observations. Maybe the Splus approach would calculate both P(y=1|X) and P(y=0|X) for these observations, and would explode when it found P(y=0|X) = 0, unexpectedly. --doncram 17:31, 24 July 2011 (UTC)
I recently found code for a Matlab/Octave function that seems to work well for estimating the parameters of a multiple logistic regression. It uses Newton-Raphson iteration and takes about fifteen lines of code to do the job. It is in the public domain as it was written at one of the National Labs. I also have a write-up of the methods used including the actual maximum likelihood computation. It uses a gradient and the Hessian to compute the successive approximations of the logistic regression weights. The discussions of the logit I have found in the literature mostly do not suggest how to proceed to estimate the parameters, and I think it would be a service to clarify how that actually works. I have the write-up in open office. Do you guys know how to transform that to a Wiki page? Pseudopigraphia (talk) 02:57, 11 May 2009 (UTC)
As long as it is public domain, you could just copy / write it into a temporary working page as a subpage of this Talk page, say to Talk:Logistic regression/Estimation, and we could try working on it there. I think it might be unusual to include code written in one programming language in a general article about a statistical method, but I for one would be interested in seeing if we could do that well. doncram (talk) 05:07, 11 May 2009 (UTC)
Pseudopigraphia, are you still reading? --doncram 17:31, 24 July 2011 (UTC)
There are many other wiki articles that have examples in particular programming languages, so the precedent is well established. Python is used frequently as is Matlab/Octave. Both Octave and Python are freely available open source language projects that run on all major operating systems, so I would not be hesitant to publish algorithms in them. I will post to a test page when I figure out the formatting. Pseudopigraphia (talk) 19:51, 11 May 2009 (UTC)
I notice that as of 2009/6/24, the generalized linear models page now has a section on "Fitting", so I'd say this is pretty well covered now. I have nothing against adding more detail on the algorithm, though. RVS (talk) 22:18, 1 September 2009 (UTC)
Agreed with OP. The article needs to say something about how to actually do the regression. The simple example on the page right now just shows you how to plug numbers into a formula. There's no need for any code or pseudo-code, just a description -- for example, something like the Normal equations. If there's no closed form, then at least formulate the problem, and say it be solved by, e.g. Newton-Raphson, and provide the derivative. And description of why the logistic function is useful (I assume because it's smooth): why not use something like Tanh? Lavaka (talk) 02:01, 28 July 2010 (UTC)

## Connection to Support Vector Machines and Adaboost

Support Vector Machines and Adaboost are only slight variations on the idea of logistic regression. They fit nicely into the logistic regression framework, and this is a very enlightening/easy way to view them. However, I don't understand this point of view well yet myself. I think this would be good to include in this article, given the importance of SVMs and Adaboost.Singularitarian (talk) 09:03, 2 February 2010 (UTC)

What exactly is the perceived relation to SVMs? Logistic Regression is a probabilistic model. SVM is a maximum margin method. There may be perceived similarities, but there goals, models, and implementation are quite different.

Adaboost could be implemented with Logistic Regression as the weak classifier but there are alternatives. Jfolson (talk) 15:22, 11 March 2010 (UTC)

The point of view I'm referring to is described in a presentation by Hastie called Support Vector Machines, Kernel Logistic Regression, and Boosting. It seems to be a rather enlightening viewpoint. --Singularitarian (talk) 22:14, 1 April 2010 (UTC)

## Redirected from Maximum entropy classifier?

It seems like this is a mistake.. I can hardly even find a mention of entropy on this page. Either this article missing a section or link should go somewhere else. Anyone know the answer?

Sukisuki (talk) 16:56, 10 April 2010 (UTC)

I came here to the talk page to ask the very same question. 66.191.103.200 (talk) 15:13, 30 September 2010 (UTC)

## Realistic Example?

It seems the numbers in the example are very unrealistic. If I assume that Nathan is 55 years old instead of 50, the risk of death increases from 0.07 to 0.9994. --93.198.2.131 (talk) 10:46, 7 November 2010 (UTC)

## Model accuracy section

I don't think this section belongs here. It describes only one of several methods, which is applicable to any learning method, not just to logistic regression. There are other similar methods, e.g. cross-validation and bootstrapping. And cross-validation is the most commonly used method, in my opinion. It would be best to describe all these methods in a separate article on model selection, and leave a link to it here. -- X7q (talk) 16:48, 22 December 2010 (UTC)

Not only cross-validation is the most common, but the method described in the section is just a very naive and simplistic cross-validation technique. Unfortunately cross-validation reference has been removed for no good reason CarrKnight (talk) 16:40, 24 March 2011 (UTC)

Sorry, i don't follow you, what cross-validation reference has been removed? --Qwfp (talk) 17:34, 24 March 2011 (UTC)

## Intro suggestion

From time to time I found people in my lab doing the wrong thing with logistic or linear regression and I want to point them to the right page in wikipedia so they can learn which regression to choose according with the data. But the first paragraph here is not very outsiders-friendly.

Could I suggest to explicitly say, in the first paragraph of this article, that logistic regression is used to evaluate dichotomous outcomes, or when your response has only two categories. it says "is used for prediction of the probability of occurrence of an event ", that is very clear when you know what logistic regression is, or you are able to translate "occurrence of an event" to "yes/no, disease/no_disease". Probably you would do it quickly ever after the first time that you get use to speak in these terms. Unfortunately for non statistician researchers (with only the basic statistics like anova and linear models knowledge) that needs to do an statistical test for their data, the first time that read this passage they would not quickly grasp that logistic regression is used for analyzing binary outcomes.

And probably something with this kind of approach would be nice to have in an intro. That way it would also work for non-statisticians and would be appreciated by students and people from other fields needing to do some statistical analysis. — Preceding unsigned comment added by Pablomarin (talkcontribs) 12:58, 21 September 2011 (UTC)

I say, be bold and go for it. 018 (talk) 23:28, 21 September 2011 (UTC)

## Wrong notation in latent variable model section?

Hi, I'm pretty sure the notation should be fixed in the "As a latent-variable model" section.

The formula is given as:

${\displaystyle Y_{i}^{\ast }={\boldsymbol {\beta }}\cdot \mathbf {X} _{i}+\varepsilon \,}$

But I believe it should be:

${\displaystyle Y_{i}^{\ast }={\boldsymbol {\beta }}\cdot \mathbf {X} _{i}+\varepsilon _{i}\,}$

That is, ${\displaystyle \varepsilon }$ should be ${\displaystyle \varepsilon _{i}}$.

Similarly, later below, in the "As a two-way latent-variable model" section,

{\displaystyle {\begin{aligned}Y_{i}^{0\ast }&={\boldsymbol {\beta }}_{0}\cdot \mathbf {X} _{i}+\varepsilon _{0}\,\\Y_{i}^{1\ast }&={\boldsymbol {\beta }}_{1}\cdot \mathbf {X} _{i}+\varepsilon _{1}\,\\\end{aligned}}}

{\displaystyle {\begin{aligned}Y_{i}^{0\ast }&={\boldsymbol {\beta }}_{0}\cdot \mathbf {X} _{i}+\varepsilon _{i}^{0}\,\\Y_{i}^{1\ast }&={\boldsymbol {\beta }}_{1}\cdot \mathbf {X} _{i}+\varepsilon _{i}^{1}\,\\\end{aligned}}}

That is, ${\displaystyle \varepsilon _{k}}$ should be ${\displaystyle \varepsilon _{i}^{k}}$

You know, while we're at it, it may help to always write ${\displaystyle k}$ as a superscript and ${\displaystyle i}$ as a subscript, so that it is clear that ${\displaystyle \beta _{0}}$ really means ${\displaystyle \beta _{k}}$ with ${\displaystyle k=0}$ rather than ${\displaystyle \beta _{i}}$ with ${\displaystyle i=0}$.

Anyway, I didn't want to just change things without first having another pair of eyes look over this. But my suggested change does seem correct, e.g. compare with the notation in http://en.wikipedia.org/wiki/Discrete_choice (Consumer Utility Section) where ${\displaystyle U_{ni}=\beta z_{ni}+\varepsilon _{ni}}$.

Thanks. John745 (talk) 23:48, 7 April 2012 (UTC)

## intro way too long

why does the introduction define what a logarithm is and so on? This is way too much information (and the information is too basic), and the paragraphs are too long. This needs to be simplified. If you want to add a section for the extreme math-phobic, then put it somewhere else, not in the intro. The intro should be neither too nontechnical nor too technical; rather, assume the reader is familiar with concepts defined in parent categories (like regression). Lavaka (talk) 14:42, 31 May 2012 (UTC)

---

I agree, the intro has actually spurred me to create an account and join the statistics project group. There's many things I would change, however the first three paragraphs are pretty easy candidates because they're at times incorrect (LR doesn't necessarily assume gaussian errors) and verbose.

To motivate the use of logistic regression, we will discuss why logistic regression is frequently found to be preferable to linear regression, for the analysis of a dichotomous criterion. The first reason involves linearity. The conditional mean of a dichotomous criterion must be greater than or equal to zero and less than or equal to one, thus, the distribution is not linear but sigmoid or S-shaped [2]. Linear regression does not incorporate this information within its model assumptions, due to its linearity its mean is theoretically unbounded and it becomes possible for the criterion to take on probabilities less than zero and greater than one. Such values are not theoretically permissible for modeling a probability[3].
Second, conducting linear regression with a dichotomous criterion violates the assumption that the error term is homoscedastic.[7] Homoscedasticity is the assumption that variance in the criterion is constant at all levels of the predictor(s). This assumption will always be violated when one has a criterion that is distributed binomially, for example. Although non-constant variance can be remedied within a linear regression model, by using the method of weighted least squares for example, it is implicitly dealt with in the logistic regression model.
Third, conducting linear regression with a dichotomous variable violates the classical assumption that error is normally distributed because the criterion has only two values.[3] Given that a dichotomous criterion violates these assumptions of linear regression, conducting linear regression with a dichotomous criterion may lead to errors in inference and at the very least, interpretation of the outcome will not be straightforward [3]. It is worth noting, however, that linear regression can be implemented with discrete error models.

I'm still not a great fan of that to be honest, but I think it does highlight some of the key features of logistic regression. I believe the original author was not fully aware of what can and cannot be done with linear regression. (Jack.w.rae) 07:53 18th Sept 2012.

## Common mistake about distributional assumptions in linear regression

" Third, conducting linear regression with a dichotomous variable violates the assumption that error is normally distributed because the criterion has only two values.[3] Given that a dichotomous criterion violates these assumptions of linear regression, conducting linear regression with a dichotomous criterion may lead to errors in inference and at the very least, interpretation of the outcome will not be straightforward.[3]"

This wrong, and incorrectly sourced, statement describes a very common misconception: dichotomous dependent variables do NOT violate the assumptions required for linear regression techniques to work, and a priori there is no reason to assume that a dichotomous dependent variable somehow induces correlation in the residuals of the model. I would thus delete this sentence entirely. Appeals to incorrect distributional assumptions should not motivate this Wikipedia write-up. — Preceding unsigned comment added by 156.145.113.40 (talk) 18:53, 13 November 2012 (UTC)

## Definition

I'm not sure about the following sentence in the "Definition" section.

(...)The first formula illustrates that the probability of being a case is equal to the odds of the exponential function of the linear regression equation.

in particular, I'm arguing against the use of the term "odds" here.

62.16.237.33 (talk) 16:34, 26 January 2013 (UTC)

## Understanding?

This article describes the mechanics of logistic regression not the logic of logistic regression. It seems to me that it has evolved to be recognisable (faithful might be a better word) to those familiar working with logistic regression but completely opaque to neophytes. I cannot understand it and I'm really trying. I suspect the author(s) do not really understand logistic regression.

To demonstrate an understanding of a topic an author must show how it originated, what problem it was initially designed to solve, how it fits in with other simple concepts (e.g. the idea of regression being that 'traits in children tend towards (regress) to those of parents') etc. Thereafter, the author may develop it to its current form including all the mathematical bells and whistles. Currently, this article painstakingly describes abstractions that are, for all intents and purposes, irrelevant to the idea of logistic regression.

It deserves, pray demands, an overhaul.

PKK — Preceding unsigned comment added by Polariseke (talkcontribs) 19:46, 18 July 2013 (UTC)

## "Cells" in the discussion of the maximum likelihood method?

In the section titled "Maximum likelihood estimation", the following text is found:

Sparseness in the data refers to having a large proportion of empty cells (cells with zero counts). Zero cell counts are particularly problematic with categorical predictors. [...]

This does not make sense to me, since there are no obvious "cells" in the maximum likelihood method. Maybe this section should be moved to the chi-squared section below? --Jochen (talk) 16:10, 10 July 2014 (UTC)

## Conditional logistic regression

I recently created a page on conditional logistic regression. I was recommended on the talk page of that article to instead include this as a sub-section to the logistic regression page. I am no statistician, so would appreciate opinion on whether this should (a) be included as a sub-section here; (b) should have its own page; (c) is already covered in WP under some other term.Jimjamjak (talk) 14:04, 20 February 2015 (UTC)

I will make the article on conditional logistic regression. It is a version of logistic regression that has a specific field of application. I will add a link and short description in the extensions section. Felixbalazard (talk) 10:44, 3 November 2016 (UTC) I did. Felixbalazard (talk) 14:41, 4 November 2016 (UTC)

This article in principle contains a lot of solid information about the logistic regression, but it is severely lacking in both in clarity (mostly because of long winded and vague descriptions) and organization.

Clarity:

First and foremost, the description of logistic regression in the introduction is essentially useless even if you already know what it is. We're just talking about a basic linear model convolved with the logistic function: Probability of event occurring = F( sum of constants times variables). It's frustrating how difficult it is to extract that basic point from the first half of the article. There seems to have been an enormous (and in my opinion failed) effort to describe everything in words, which seems problematic for a subject that is essentially purely mathematical. Just as an example, this passage is far too verbose:

"The logit of success is then fitted to the predictors using linear regression analysis. The predicted value of the logit is converted back into predicted odds via the inverse of the natural logarithm, namely the exponential function. Thus, although the observed dependent variable in logistic regression is a zero-or-one variable, the logistic regression estimates the odds, as a continuous variable, that the dependent variable is a success (a case). In some applications the odds are all that is needed. In others, a specific yes-or-no prediction is needed for whether the dependent variable is or is not a case; this categorical prediction can be based on the computed odds of a success, with predicted odds above some chosen cutoff value being translated into a prediction of a success."

Organization:

Fields and example applications should be moved to the end of the article. The Basics section should just be deleted or written again from scratch. The formal mathematical specification should come much earlier (perhaps with some simplification since it uses a lot of unnecessary formalism). The section on fitting should surely come after the model is specified.

Iellwood (talk) 00:54, 18 July 2015 (UTC)

WP:Be bold! Qwfp (talk) 11:27, 18 July 2015 (UTC

## Begun cleanup

I have begun a cleanup of this article. PeterLFlomPhD (talk) 23:53, 20 July 2015 (UTC)

## Figure

I think the general figure for regression with a continuous dependent variable in the right box is misleading. Logistic regression is about a categorical dependent variable. Even the figure of (binary) classification as in https://en.wikipedia.org/wiki/Statistical_classification would be more appropriate. Anne van Rossum (talk) 09:35, 16 August 2015 (UTC)

## Is more cleanup needed?

Several of us have done substantial work on this article. Is more cleanup needed? If so, which parts are still unclear? If not, should the notice be removed? PeterLFlomPhD (talk) 20:43, 17 August 2015 (UTC)

Nice work by you and the others in improving this article! Reading through it again, it looks fairly well organized in terms of progression from simple to complex. For section organization, Model suitability seems just tacked on the end. The section doesn't really have much to do with logistic regression in particular, but has good information and Type I and II errors are a basic part of evaluating goodness of fit for LR. Would it be better placed in the Evaluating goodness of fit section? Except for this wart, I'd support removal of the cleanup tag; the tagging editor can always come back with more specific criticisms if such are needed. --Mark viking (talk) 21:00, 17 August 2015 (UTC)
Thanks! I think that that section should probably go in Evaluating model performance. But I'm willing to be convinced otherwise. -- PeterLFlomPhD (talk) 21:46, 17 August 2015 (UTC)
The section Evaluating model performance would be a fine destination, too. I defer to the editors actually doing the work :-) --Mark viking (talk) 22:12, 17 August 2015 (UTC)

## Mistake in initial example

Resolved

The logistic regression example in "Fields and example applications" seems to be wrong. I can't reproduce the results, given the data. Here is the Mathematica script that I'm using to replicate the results:

data = {{0.5, 0}, {0.75, 0}, {1.0, 0}, {1.25, 0}, {1.5, 0}, {1.75, 1}, {2.0, 0}, {2.25, 1}, {2.5, 0}, {2.75, 1}, {3.0, 0}, {3.25, 1}, {3.5, 0}, {4.0, 1}, {4.25, 1}, {4.5, 1}, {4.75, 1}, {5.0, 1}, {5.5, 1}};
logit = LogitModelFit[data, x, x];
Normal[logit]


This script suggests that the intercept should be -3.817, and the "hours" quantity 1.436, rather than the stated -4.0777 and 1.5046, respectively. I don't know what any of the other quantities in the example mean, so I have no idea if they're right or not. Can somebody who knows how to do these (likely simply) calculations check the numbers, please? — Preceding unsigned comment added by Jolyonb (talkcontribs) 06:18, 14 January 2016 (UTC)

The example seems to be correct as I've just reproduced the results using R's glm() function. I'm not familiar with Mathematica, sorry. Perhaps you could check that you're using the software correctly? Tayste (edits) 07:51, 14 January 2016 (UTC)
> dta = data.frame(Hours=c(2:7,7:14,16:20,22)/4,Pass=c(rep(0,6),rep(1:0,4),rep(1,6)))
> glm1 = glm(Pass~Hours,binomial,dta)
Call:
glm(formula = Pass ~ Hours, family = binomial, data = dta)

Deviance Residuals:
Min         1Q     Median         3Q        Max
-1.705574  -0.573569  -0.046544   0.454702   1.820076

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.07771    1.76098 -2.3156  0.02058 *
Hours        1.50465    0.62872  2.3932  0.01670 *

Here's my code, above. Tayste (edits) 08:04, 14 January 2016 (UTC)
Thanks for looking at this. I've checked the Mathematica code; I believe it's behaving correctly. The deviance residuals at comparable to yours, as are the standard errors, z and P values. I'm guessing the difference is because Mathematica and R are optimizing slightly differently. I believe Mathematica is optimizing using a maximum likelihood approach (I obtained the same numbers through a custom python script using this approach). Perhaps R is using something slightly different? Edit: I've checked, and R uses "iteratively reweighted least squares", while Mathematica uses "maximum likelihood". The different residual weightings assigned by the two methods will yield slightly different fits. Jolyonb (talk) 21:24, 23 January 2016 (UTC)
Just to say Stata (version 13) gives the same results as R. I've no idea what Mathematica is doing; the fitting algorithm shouldn't affect the result to any non-negligible extent (at least not unless the likelihood is very flat, which this isn't). Qwfp (talk) 20:50, 16 February 2016 (UTC)
It is possible to create reasonable looking cost functions for the logistic regression optimization problem that are non-convex, which could lead to multiple minima. Otherwise, I agree: if there is a global minimum to the cost function, then IRLS should find the same minimum as other ML based optimization schemes. --Mark viking (talk) 23:03, 16 February 2016 (UTC)
Just spotted the problem: Jolyonb isn't using the same data. His Mathematica script above only has one entry with 1.75 hours, whereas the table in the example in the article (and the data used by Tayste and me) contain two entries with 1.75 hours, one pass and one fail. There's a moral here somewhere... Qwfp (talk) 15:55, 17 February 2016 (UTC)
Sigh. Thank you Qwfp (talk · contribs). I swear that that was the first thing I checked, but apparently I'm blind. With that additional data point, Mathematica yields identical results to what is on the page. Mea culpa. Jolyonb (talk) 18:54, 17 February 2016 (UTC)

## Why capital F for denoting logistic function?

Can someone please give the rational of using capital F for the logistic function and little g for the logit function? in the context of distributions, I'm used to capital letters being used for CDF to discern it from a pmf... is this somehow related? — Preceding unsigned comment added by Ihadanny (talkcontribs) 12:25, 14 February 2016 (UTC)

## Issue Tags

It looks like two issue tags were recently added (as of March 10th):

However, no description of the particular issues is given. Mr._Guye, what specific improvements did you have in mind? Crazy2be (talk) 05:28, 27 September 2016 (UTC)

Sorry! to Crazy2be. I didn't like the style of the writing in areas, and I think I overreacted. I have removed those templates and made some fixes. Thanks for dealing with this civilly. --Mr. Guye (talk) 17:04, 2 October 2016 (UTC)

## false equality

I'm just browsing, so didn't want to make this change, but if I'm right, can someone else: The intro says "It is also called a qualitative response/discrete choice model in the terminology of economics." That implies these are equivalent terms. They are not: Shouldn't this say "It is also one example (along with, for instance, probit_regression) of a ...". — Preceding unsigned comment added by Theredsprite (talkcontribs) 01:28, 11 January 2017 (UTC)

Good point, thanks. I revised the sentence to "It is an example of a qualitative response/discrete choice model in the terminology of economics." I don't mind if anyone else wants to revise it differently. --doncram 02:37, 11 January 2017 (UTC)

## example choice

The existing example, "Probability of passing an exam versus hours of study", serves reasonably well in the article to make the topic accessible. Its statement of data, its graphic, and its simple interpretation of analysis results are all good.

But the sample is made up, I suspect, and is textbook-like (not in a good way) and not encyclopedic. It has preachy connotations: if you study more, you will pass the exam. I doubt that the number of hours of study of 20 students was actually measured. In an actual exam in a real course, many of the top grades will be from students who did not study at all, based on my experience/observations. The model is ridiculously simplistic, ignoring factors such as student skill levels that might be measured by variables such as students' grade level, students' maturity/age, number of times the student has already taken the same exam, etc. The example smells false to me.

The plausibility of the example could be salvaged if further stuff is made up, such as asserting the students were all starting at the same skill level, it was new material to them all, and that measurement of their study time was obtained as part of the study in some way that is explainable. My experience/observations are my own, from a certain culture, which is not universal...perhaps there is some scenario where data like presented would be plausible.

But I think we should do better in example choice, and use a similarly sized sample from real life. Possibly from the history of logistic regression's development, or take some other important and/or interesting real life example. What data did David Cox (stated to be developer of logistic regression) use, for example? Is there any pithy example from cancer research? --doncram 10:44, 16 January 2017 (UTC)