Hosmer–Lemeshow test

The Hosmer–Lemeshow test is a statistical test for goodness of fit for logistic regression models. It is used frequently in risk prediction models. The test assesses whether or not the observed event rates match expected event rates in subgroups of the model population. The Hosmer–Lemeshow test specifically identifies subgroups as the deciles of fitted risk values. Models for which expected and observed event rates in subgroups are similar are called well calibrated.

The Hosmer–Lemeshow test statistic is given by:

H=\sum _{g=1}^{G}{\frac {(O_{1g}-E_{1g})^{2}}{E_{1g}}}+{\frac {(O_{0g}-E_{0g})^{2}}{E_{0g}}}=\sum _{g=1}^{G}{\frac {(O_{1g}-E_{1g})^{2}}{N_{g}\pi _{g}}}+{\frac {(N_{g}-O_{1g}-(N_{g}-E_{1g}))^{2}}{N_{g}(1-\pi _{g})}}=\sum _{g=1}^{G}{\frac {(O_{1g}-E_{1g})^{2}}{N_{g}\pi _{g}(1-\pi _{g})}}.\,\!

Here O_1g, E_1g, O_0g, E_0g, N_g, and π_g denote the observed Y=1 events, expected Y=1 events, observed Y=0 events, expected Y=0 events, total observations, predicted risk for the g^th risk decile group, and G is the number of groups. The test statistic asymptotically follows a $\chi ^{2}$ distribution with G − 2 degrees of freedom. The number of risk groups may be adjusted depending on how many fitted risks are determined by the model. This helps to avoid singular decile groups.

References

Hosmer, David W.; Lemeshow, Stanley (2013). Applied Logistic Regression. New York: Wiley. ISBN 978-0-470-58247-3.
Alan Agresti (2012). Categorical Data Analysis. Hoboken: John Wiley and Sons. ISBN 978-0-470-46363-5.