Chi-square automatic interaction detection
Chi-square automatic interaction detection (CHAID) is a decision tree technique, based on adjusted significance testing (Bonferroni testing). The technique was developed in South Africa and was published in 1980 by Gordon V. Kass, who had completed a PhD thesis on this topic. CHAID can be used for prediction (in a similar fashion to regression analysis, this version of CHAID being originally known as XAID) as well as classification, and for detection of interaction between variables. CHAID is based on a formal extension of the US AID (Automatic Interaction Detection) and THAID (THeta Automatic Interaction Detection) procedures of the 1960s and 1970s, which in turn were extensions of earlier research, including that performed in the UK in the 1950s.
In practice, CHAID is often used in the context of direct marketing to select groups of consumers and predict how their responses to some variables affect other variables, although other early applications were in the field of medical and psychiatric research.
Like other decision trees, CHAID's advantages are that its output is highly visual and easy to interpret. Because it uses multiway splits by default, it needs rather large sample sizes to work effectively, since with small sample sizes the respondent groups can quickly become too small for reliable analysis.
One important advantage of CHAID over alternatives such as multiple regression is that it is non-parametric.
- Chi-squared distribution
- Latent class model
- Structural equation modeling
- Market segment
- Decision tree learning
- Multiple comparisons
This article includes a list of references, related reading or external links, but its sources remain unclear because it lacks inline citations. (July 2010) (Learn how and when to remove this template message)
- Ignatov, D.Yu.; Ignatov, A.D. (2017). "Decision Stream: Cultivating Deep Decision Trees". IEEE ICTAI: 905–912. arXiv: . doi:10.1109/ICTAI.2017.00140.
- Belson, William A.; Matching and prediction on the principle of biological classification, Applied Statistics, Vol. 8 (1959), pp. 65–75
- Morgan, John A.; & Sonquist, James N.; Problems in the analysis of survey data and a proposal, Journal of the American Statistical Association, Vol. 58 (1963), pp. 415–434
- Press, Laurence I.; Rogers, Miles S.; & Shure, Gerald H.; An interactive technique for the analysis of multivariate data, Behavioral Science, Vol. 14 (1969), pp. 364–370
- Kass, Gordon V.; An Exploratory Technique for Investigating Large Quantities of Categorical Data, Applied Statistics, Vol. 29, No. 2 (1980), pp. 119–127
- Hawkins, Douglas M. ; and Kass, Gordon V.; Automatic Interaction Detection, in Hawkins, Douglas M. (ed), Topics in Applied Multivariate Analysis, Cambridge University Press, Cambridge, 1982, pp. 269–302
- Hooton, Thomas M.; Haley, Robert W.; Culver, David H.; White, John W.; Morgan, W. Meade; & Carroll, Raymond J.; The Joint Associations of Multiple Risk Factors with the Occurrence of Nosocomial Infections, American Journal of Medicine, Vol. 70, (1981), pp. 960–970
- Brink, Susanne; & Van Schalkwyk, Dirk J.; Serum ferritin and mean corpuscular volume as predictors of bone marrow iron stores, South African Medical Journal, Vol. 61, (1982), pp. 432–434
- McKenzie, Dean P.; McGorry, Patrick D.; Wallace, Chris S.; Low, Lee H.; Copolov, David L.; & Singh, Bruce S.; Constructing a Minimal Diagnostic Decision Tree, Methods of Information in Medicine, Vol. 32 (1993), pp. 161–166
- Magidson, Jay; The CHAID approach to segmentation modeling: chi-squared automatic interaction detection, in Bagozzi, Richard P. (ed); Advanced Methods of Marketing Research, Blackwell, Oxford, GB, 1994, pp. 118–159
- Hawkins, Douglas M.; Young, S. S.; & Rosinko, A.; Analysis of a large structure-activity dataset using recursive partitioning, Quantitative Structure-Activity Relationships, Vol. 16, (1997), pp. 296–302
- Antipov, Evgeny; & Pokryshevskaya, Elena; Applying CHAID for logistic regression diagnostics and classification accuracy improvement, Journal of Targeting, Measurement and Analysis for Marketing 18 (2010), 109-117
- Luchman, J.N.; CHAID: Stata module to conduct chi-square automated interaction detection, Available for free download, or type within Stata: ssc install chaid.
- Luchman, J.N.; CHAIDFOREST: Stata module to conduct random forest ensemble classification based on chi-square automated interaction detection (CHAID) as base learner, Available for free download, or type within Stata: ssc install chaidforest.