Hyperparameter optimization

From Wikipedia, the free encyclopedia
  (Redirected from Grid search)
Jump to: navigation, search

In the context of machine learning, hyperparameter optimization or model selection is the problem of choosing a set of hyperparameters for a learning algorithm, usually with the goal of obtaining good generalisation.[1] Hyperparameter optimization contrasts with actual learning problems, which are also often cast as optimization problems, but optimize a loss function on the training set alone. In effect, learning algorithms learn parameters that model/reconstruct their inputs well, while hyperparameter optimization is to ensure the model does not overfit its data by tuning, e.g., regularization.

Contents

Algorithms for hyperparameter optimization [edit]

Grid search [edit]

The de facto standard way of performing hyperparameter optimization is grid search, which is simply an exhaustive searching through a manually specified subset of the hyperparameter space of a learning algorithm. A grid search algorithm must be guided by some performance metric, typically measured by cross-validation on the training set[2] or evaluation on a held-out validation set.

Since the parameter space of a machine learner may include real-valued or unbounded value spaces for certain parameters, manually set bounds and discretization may be necessary before applying grid search.

For example, a typical soft-margin SVM classifier equipped with an RBF kernel has at least two hyperparameters that need to be tuned for good performance on unseen data: a regularization constant C and a kernel hyperparameter γ. Both parameters are continuous, so to perform grid search, one selects a finite set of "reasonable" values for each, say

C \in \{10, 100, 1000\}
\gamma \in \{0.1, 0.2, 0.5, 1.0\}

Grid search then trains an SVM with each pair (C, γ) in the cross-product of these two sets and evaluates their performance on a held-out validation set (or by internal cross-validation on the training set, in which case multiple SVMs are trained per pair). Finally, the grid search algorithm outputs the settings that achieved the highest score in the validation procedure.

Grid search suffers from the curse of dimensionality, but is often embarrassingly parallel because typically the hyperparameter settings it evaluates are independent of each other.[1]

Alternatives [edit]

Since grid searching is an exhaustive and therefore potentially expensive method, several alternatives have been proposed. In particular, random search has been found to be more effective in high-dimensional spaces than exhaustive search.[1]

External links [edit]

References [edit]

  1. ^ a b c Bergstra, James; Bengio, Yoshua (2012). "Random Search for Hyper-Parameter Optimization". J. Machine Learning Research 13: 281—305. 
  2. ^ Chin-Wei Hsu, Chih-Chung Chang and Chih-Jen Lin (2010). A practical guide to support vector classification. Technical Report, National Taiwan University.