Jump to content

Generalization error: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
stub
Line 13: Line 13:
== See also ==
== See also ==
*[[Bias-variance dilemma]]
*[[Bias-variance dilemma]]
*[[Problem of induction|The problem of induction]]
*[[Categorical imperative]]
*[[Generalization (logic)]]
*[[Generalization (logic)]]
*[[Hasty generalization]]
*[[Hasty generalization]]

Revision as of 12:51, 18 October 2014

The generalization error of a machine learning model is a function that measures how well a learning machine generalizes to unseen data. It is measured as the distance between the error on the training set and the test set and is averaged over the entire set of possible training data that can be generated after each iteration of the learning process. It has this name because this function indicates the capacity of a machine that learns with the specified algorithm to infer a rule (or generalize) that is used by the teacher machine to generate data based only on a few examples.

The theoretical model assumes a probability distribution of the examples, and a function giving the exact target. The model can also include noise in the example (in the input and/or target output). The generalization error is usually defined as the expected value of the square of the difference between the learned function and the exact target (mean-square error). In practical cases, the distribution and target are unknown; statistical estimates are used.

The performance of a machine learning algorithm is measured by plots of the generalization error values through the learning process and are called learning curves.

The generalization error of a perceptron is the probability of the student perceptron to classify an example differently from the teacher and is given by the overlap of the student and teacher synaptic vectors and is a function of their scalar product.

See also

References

  • Bishop, C.M. (1995), Neural Networks for Pattern Recognition, Oxford: Oxford University Press, especially section 6.4.
  • Finke, M., and Müller, K.-R. (1994), "Estimating a-posteriori probabilities using stochastic network models," in Mozer, Smolensky, Touretzky, Elman, & Weigend, eds., Proceedings of the 1993 Connectionist Models Summer School, Hillsdale, NJ: Lawrence Erlbaum Associates, pp. 324–331.
  • Geman, S., Bienenstock, E. and Doursat, R. (1992), "Neural Networks and the Bias/Variance Dilemma", Neural Computation, 4, 1-58.
  • Husmeier, D. (1999), Neural Networks for Conditional Probability Estimation: Forecasting Beyond Point Predictions, Berlin: Springer Verlag, ISBN 1-85233-095-3.
  • McCullagh, P. and Nelder, J.A. (1989) Generalized Linear Models, 2nd ed., London: Chapman & Hall.
  • Moody, J.E. (1992), "The Effective Number of Parameters: An Analysis of Generalization and Regularization in Nonlinear Learning Systems", in Moody, J.E., Hanson, S.J., and Lippmann, R.P., Advances in Neural Information Processing Systems 4, 847-854.
  • Ripley, B.D. (1996) Pattern Recognition and Neural Networks, Cambridge: Cambridge University Press.
  • Rohwer, R., and van der Rest, J.C. (1996), "Minimum description length, regularization, and multimodal data," Neural Computation, 8, 595-609.
  • Rojas, R. (1996), "A short proof of the posterior probability property of classifier neural networks," Neural Computation, 8, 41-43.
  • White, H. (1990), "Connectionist Nonparametric Regression: Multilayer Feedforward Networks Can Learn Arbitrary Mappings," Neural Networks, 3, 535-550. Reprinted in White (1992).
  • White, H. (1992a), "Nonparametric Estimation of Conditional Quantiles Using Neural Networks," in Page, C. and Le Page, R. (eds.), Proceedings of the 23rd Sympsium on the Interface: Computing Science and Statistics, Alexandria, VA: American Statistical Association, pp. 190–199. Reprinted in White (1992b).
  • White, H. (1992b), Artificial Neural Networks: Approximation and Learning Theory, Blackwell.