|This article does not cite any references or sources. (August 2009)|
The generalization error of a machine learning model is a function that measures how well a learning machine generalizes to unseen data. It is measured as the distance between the error on the training set and the test set and is averaged over the entire set of possible training data that can be generated after each iteration of the learning process. It has this name because this function indicates the capacity of a machine that learns with the specified algorithm to infer a rule (or generalize) that is used by the teacher machine to generate data based only on a few examples.
The theoretical model assumes a probabilistic distribution of the examples, and a function giving the exact target. The model can also include noise in the example (in the input and/or target output). The generalization error is usually defined as the expected value of the square of the difference between the learned function and the exact target (mean-square error). In practical cases, the distribution and target are unknown; statistical estimates are used.
The generalization error of a perceptron is the probability of the student perceptron to classify an example differently from the teacher and is given by the overlap of the student and teacher synaptic vectors and is a function of their scalar product.
- Bishop, C.M. (1995), Neural Networks for Pattern Recognition, Oxford: Oxford University Press, especially section 6.4.
- Finke, M., and Müller, K.-R. (1994), "Estimating a-posteriori probabilities using stochastic network models," in Mozer, Smolensky, Touretzky, Elman, & Weigend, eds., Proceedings of the 1993 Connectionist Models Summer School, Hillsdale, NJ: Lawrence Erlbaum Associates, pp. 324-331.
- Geman, S., Bienenstock, E. and Doursat, R. (1992), "Neural Networks and the Bias/Variance Dilemma", Neural Computation, 4, 1-58.
- Husmeier, D. (1999), Neural Networks for Conditional Probability Estimation: Forecasting Beyond Point Predictions, Berlin: Springer Verlag, ISBN 185233095.
- McCullagh, P. and Nelder, J.A. (1989) Generalized Linear Models, 2nd ed., London: Chapman & Hall.
- Moody, J.E. (1992), "The Effective Number of Parameters: An Analysis of Generalization and Regularization in Nonlinear Learning Systems", in Moody, J.E., Hanson, S.J., and Lippmann, R.P., Advances in Neural Information Processing Systems 4, 847-854.
- Ripley, B.D. (1996) Pattern Recognition and Neural Networks, Cambridge: Cambridge University Press.
- Rohwer, R., and van der Rest, J.C. (1996), "Minimum description length, regularization, and multimodal data," Neural Computation, 8, 595-609.
- Rojas, R. (1996), "A short proof of the posterior probability property of classifier neural networks," Neural Computation, 8, 41-43.
- White, H. (1990), "Connectionist Nonparametric Regression: Multilayer Feedforward Networks Can Learn Arbitrary Mappings," Neural Networks, 3, 535-550. Reprinted in White (1992).
- White, H. (1992a), "Nonparametric Estimation of Conditional Quantiles Using Neural Networks," in Page, C. and Le Page, R. (eds.), Proceedings of the 23rd Sympsium on the Interface: Computing Science and Statistics, Alexandria, VA: American Statistical Association, pp. 190-199. Reprinted in White (1992b).
- White, H. (1992b), Artificial Neural Networks: Approximation and Learning Theory, Blackwell.
|This applied mathematics-related article is a stub. You can help Wikipedia by expanding it.|