Random naive Bayes

Random naive Bayes extends the Naive Bayes classifier by adopting the random forest principles: random input selection, bagging (i.e. bootstrap aggregating), and random feature selection (^[1]).

Naive Bayes classifier

The naive Bayes classifier is a probabilistic classifier simplifying Bayes' theorem by naively assuming class conditional independence. Although this assumption leads to biased posterior probabilities, the ordered probabilities of Naive Bayes result in a classification performance comparable to that of classification trees and neural networks. ^[2] Notwithstanding Naive Bayes' popularity due to its simplicity combined with high accuracy and speed, its conditional independence assumption rarely holds. There are mainly two approaches to alleviate this naivety:

Selecting attribute subsets in which attributes are conditionally independent (cf. Selective Bayesian Classifier ^[3]).
Extending the structure of Naive Bayes to represent attribute dependencies (cf. Averaged One-Dependence Estimators (AODE) ^[4]).

Random naive Bayes' alleviation of the class conditional independence assumption

Random Naive Bayes adopts the first approach by randomly selecting a subset of attributes in which attributes are assumed to be conditionally independent. Naive Bayes' performance might benefit from this random feature selection. Analogous to AODE, Random Naive Bayes builds an ensemble, but unlike AODE, the ensemble combines zero-dependence classifiers.

Random naive Bayes and random forest

Generalizing Random Forest to Naive Bayes, Random Naive Bayes (Random NB), is a bagged classifier combining a forest of B Naive Bayes. Each bth Naive Bayes is estimated on a bootstrap sample S_b with m randomly selected features. To classify an observation put the input vector down the B Naive Bayes in the forest. Each Naive Bayes generates posterior class probabilities. Unlike Random Forest, the predicted class of the ensemble is assessed by adjusted majority voting rather than majority voting, as each bth Naive Bayes delivers continuous posterior probabilities. Similar to Random Forests, the importance of each feature is estimated on the out-of-bag (oob) data.

Notes

^ Breiman, 2001
^ Langley, Iba and Thomas, 1992
^ Langley and Sage, 1994
^ Webb et al. 2005

References

Breiman,L. (2001). Random Forests, Machine Learning, 45(1), 5–32.
Langley, P., Iba, W. and Thomas, K. (1992). An analysis of Bayesian Classifiers, Proceedings of the Tenth National Conference on Artificial Intelligence, AAAI Press,223–228.
Langley, P. and Sage, S. (1994). Induction of selective Bayesian Classifiers, Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence, Seattle, WA: Morgan Kaufmann.
Prinzie, A., Van den Poel, D. (2007). Random Multiclass Classification: Generalizing Random Forests to Random MNL and Random NB, Dexa 2007, Lecture Notes in Computer Science, 4653, 349–358.
Webb, G.I., Boughton, J., Wang, Z. (2005). Not so naive Bayes: aggregating one-dependence estimators, Machine Learning, 58(1), 5–24.
Video lecture on Random Naive Bayes and Random MultiNomial Logit

Naive Bayes classifier

Random naive Bayes' alleviation of the class conditional independence assumption

Random naive Bayes and random forest

Notes

References

See also