Learning curve (machine learning)

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
Learning curve showing training score and cross validation score

In machine learning, a learning curve (or training curve) plots the optimal value of a model's loss function for a training set against this loss function evaluated on a validation data set with same parameters as produced the optimal function. It is a tool to find out how much a machine model benefits from adding more training data and whether the estimator suffers more from a variance error or a bias error. If both the validation score and the training score converge to a value that is too low with increasing size of the training set, it will not benefit much from more training data.[1]

The machine learning curve is useful for many purposes including comparing different algorithms,[2] choosing model parameters during design,[3] adjusting optimization to improve convergence, and determining the amount of data used for training.[4]

In the machine learning domain, there are two implications of learning curves differing in the x-axis of the curves, with experience of the model graphed either as the number of training examples used for learning or the number of iterations used in training the model.[5]

Formal definition[edit]

One model of a machine learning is producing a function, f(x), which given some information, x, predicts some variable, y, from training data and . It is distinct from mathematical optimization because should predict well for outside of .

We often constrain the possible functions to a family so that the function is generalizable[6] and so that certain properties are true, either to make finding a good easier, or because we have some a priori reason to think they are true.[6]: 172 

Given that it is not possible to produce a function that perfectly fits out data, it is then necessary to produce a loss function to measure how good our prediction is. We then define an optimization process which finds a which minimizes referred to as .

Training curve for amount of data[edit]

Then if our training data is and our validation data is a learning curve is the plot of the two curves

where

Training curve for number of iterations[edit]

Many optimization processes are iterative, repeating the same step until the process converges to an optimal value. Gradient descent is one such algorithm. If you define as the approximation of the optimal after steps, a learning curve is the plot of

See also[edit]

References[edit]

  1. ^ scikit-learn developers. "Validation curves: plotting scores to evaluate models — scikit-learn 0.20.2 documentation". Retrieved February 15, 2019.
  2. ^ Madhavan, P.G. (1997). "A New Recurrent Neural Network Learning Algorithm for Time Series Prediction" (PDF). Journal of Intelligent Systems. p. 113 Fig. 3.
  3. ^ "Machine Learning 102: Practical Advice". Tutorial: Machine Learning for Astronomy with Scikit-learn.
  4. ^ Meek, Christopher; Thiesson, Bo; Heckerman, David (Summer 2002). "The Learning-Curve Sampling Method Applied to Model-Based Clustering". Journal of Machine Learning Research. 2 (3): 397. Archived from the original on 2013-07-15.
  5. ^ Sammut, Claude; Webb, Geoffrey I. (Eds.) (28 March 2011). Encyclopedia of Machine Learning (1st ed.). Springer. p. 578. ISBN 978-0-387-30768-8.CS1 maint: extra text: authors list (link)
  6. ^ a b Goodfellow, Ian; Bengio, Yoshua; Courville, Aaron (2016-11-18). Deep Learning. MIT Press. p. 108. ISBN 978-0-262-03561-3.