Algorithm selection

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Algorithm selection (sometimes also called per-instance algorithm selection or offline algorithm selection) is a meta-algorithmic technique to choose an algorithm from a portfolio on an instance-by-instance basis. It is motivated by the observation that on many practical problems, algorithms have different performances. That is, while one algorithm performs well on some instances, it performs poorly on others and vice versa for another algorithm. If we can identify when to use which algorithm, we can get the best of both worlds and improve overall performance. This is what algorithm selection aims to do. The only prerequisite for applying algorithm selection techniques is that there exists (or that there can be constructed) a set of complementary algorithms.

Definition[edit]

Given a portfolio of algorithms , a set of instances and a cost metric , the algorithm selection problem consists of finding a mapping from instances to algorithms such that the cost across all instances is optimized.[1][2]

Examples[edit]

Boolean satisfiability problem (and other hard combinatorial problems)[edit]

A well-known application of algorithm selection is the Boolean satisfiability problem. Here, the portfolio of algorithms is a set of (complementary) SAT solvers, the instances are Boolean formulas, the cost metric is for example average runtime or number of unsolved instances. So, the goal is to select a well-performing SAT solver for each individual instance. In the same way, algorithm selection can be applied to many other -hard problems (such as mixed integer programming, CSP, AI planning, TSP, MAXSAT, QBF and answer set programming). Competition-winning systems in SAT are SATzilla,[3] 3S[4] and CSHC[5]

Machine learning[edit]

In machine learning, algorithm selection is better known as meta-learning. The portfolio of algorithms consists of machine learning algorithms (e.g., Random Forest, SVM, DNN), the instances are data sets and the cost metric is for example the error rate. So, the goal is to predict which machine learning algorithm will have a small error on each data set.

Instance features[edit]

The algorithm selection problem is mainly solved with machine learning techniques. By representing the problem instances by numerical features , algorithm selection can be seen as a multi-class classification problem by learning a mapping for a given instance .

Instance features are numerical representations of instances. For example, we can count the number of variables, clauses, average clause length for Boolean formulas,[6] or number of samples, features, class balance for ML data sets to get an impression about their characteristics.

Static vs. probing features[edit]

We distinguish between two kinds of features:

  1. Static features are in most cases some counts and statistics (e.g., clauses-to-variables ratio in SAT). These features ranges from very cheap features (e.g. number of variables) to very complex features (e.g., statistics about variable-clause graphs).
  2. Probing features (sometimes also called landmarking features) are computed by running some analysis of algorithm behavior on an instance (e.g., accuracy of a cheap decision tree algorithm on an ML data set, or running for a short time a stochastic local search solver on a Boolean formula). These feature often cost more than simple static features.

Feature costs[edit]

Depending on the used performance metric , feature computation can be associated with costs. For example, if we use running time as performance metric, we include the time to compute our instance features into the performance of an algorithm selection system. SAT solving is a concrete example, where such feature costs cannot be neglected, since instance features for CNF formulas can be either very cheap (e.g., to get the number of variables can be done in constant time for CNFs in the DIMACs format) or very expensive (e.g., graph features which can cost tens or hundreds of seconds).

It is important to take the overhead of feature computation into account in practice in such scenarios; otherwise a misleading impression of the performance of the algorithm selection approach is created. For example, if the decision which algorithm to choose can be made with prefect accuracy, but the features are the running time of the portfolio algorithms, there is no benefit to the portfolio approach. This would not be obvious if feature costs were omitted.

Approaches[edit]

Regression approach[edit]

One of the first successful algorithm selection approaches predicted the performance of each algorithm and selecting the algorithm with the best predicted performance for a new instance .[3]

Clustering approach[edit]

A common assumption is that the given set of instances can be clustered into homogeneous subsets and for each of these subsets, there is one well-performing algorithm for all instances in there. So, the training consists of identifying the homogeneous clusters via an unsupervised clustering approach and associating an algorithm with each cluster. A new instance is assigned to a cluster and the associated algorithm selected.[7]

A more modern approach is cost-sensitive hierarchical clustering[5] using supervised learning to identify the homogeneous instance subsets.

Pairwise cost-sensitive classification approach[edit]

A common approach for multi-class classification is to learn pairwise models between every pair of classes (here algorithms) and choose the class that was predicted most often by the pairwise models. We can weight the instances of the pairwise prediction problem by the performance difference between the two algorithms. This is motivated by the fact that we care most about getting predictions with large differences correct, but the penalty for an incorrect prediction is small if there is almost no performance difference. Therefore, each instance for training a classification model vs is associated with a cost .[8]

Requirements[edit]

Clustering of SAT solvers from SAT12-INDU ASlib scenario according to the correlation coefficient of spearman.
Shapley values for complementary analysis on SAT12-INDU ASlib Scenario[9]

The algorithm selection problem can be effectively applied under the following assumptions:

  • The portfolio of algorithms is complementary with respect to the instance set , i.e., there is no single algorithm that dominates the performance of all other algorithms on (see figures to the right for examples on complementary analysis).
  • In some application, the computation of instance features is associated with a cost. For example, if the cost metric is running time, we have also to consider the time to compute the instance features. In such cases, the cost to compute features should not be larger than the performance gain through algorithm selection.

Application domains[edit]

Algorithm selection is not limited to single domains but can be applied to any kind of algorithm if the above requirements are satisfied. Application domains include:

For an extensive list of literature about algorithm selection, we refer to a literature overview.

Variants of algorithm selection[edit]

Online selection[edit]

Online algorithm selection in Hyper-heuristic refers to switching between different algorithms during the solving process. In contrast, (offline) algorithm selection is an one-shot game where we select an algorithm for a given instance only once.

Computation of schedules[edit]

An extension of algorithm selection is the per-instance algorithm scheduling problem, in which we do not select only one solver, but we select a time budget for each algorithm on a per-instance base. This approach improves the performance of selection systems in particular if the instance features are not very informative and a wrong selection of a single solver is likely.[10]

Selection of parallel portfolios[edit]

Given the increasing importance of parallel computation, an extension of algorithm selection for parallel computation is parallel portfolio selection, in which we select a subset of the algorithms to simultaneously run in a parallel portfolio.[11]

External links[edit]

References[edit]

  1. ^ J. Rice: The Algorithm Selection Problem. Advances in Computers 15: 65–118 (1976)
  2. ^ B. Bischl, P. Kerschke, L. Kotthoff, M. Lindauer, Y. Malitsky, A. Frechétte, H. Hoos, F. Hutter, K. Leyton-Brown, K. Tierney and J. Vanschoren. ASlib: A Benchmark Library for Algorithm Selection In: Artificial Intelligence Journal (AIJ) (2016)
  3. ^ a b L. Xu and F. Hutter and H. Hoos and K. Leyton-Brown (2008). "SATzilla: Portfolio-based Algorithm Selection for SAT". Journal of Artificial Intelligence Research. 32.
  4. ^ S. Kadioglu; Y. Malitsky; A. Sabharwal; H. Samulowitz; M. Sellmann: (2011). "Algorithm Selection and Scheduling". Proceedings of CP.
  5. ^ a b Y. Malitsky; A. Sabharwal; H. Samulowitz; M. Sellmann: (2013). "Algorithm Portfolios Based on Cost-Sensitive Hierarchical Clustering". Proceedings of IJCAI.
  6. ^ E. Nudelman; K. Leyton-Brown; H. Hoos; A. Devkar; Y. Shoham (2004). "Understanding Random SAT: Beyond the Clauses-to-Variables Ratio". Proccedings of CP.
  7. ^ S. Kadioglu; Y. Malitsky; M. Sellmann; K. Tierney (2010). "ISAC – Instance-Specific Algorithm Configuration". Proceedings of the European Conference on Artificial Intelligence.
  8. ^ L. Xu; F. Hutter; J. Shen; H. Hoos; K. Leyton-Brown (2012). "SATzilla2012: Improved Algorithm Selection Based on Cost-sensitive Classification Models". Proceedings of the SAT Challenge 2012: Solver and Benchmark Descriptions.
  9. ^ A. Frechette, L. Kotthoff, T. Michalak, T. Rahwan, H. Hoos and K. Leyton{-}Brown (2016). "Using the Shapley Value to Analyze Algorithm Portfolios". Proceedings of the International Conference on AAAI.
  10. ^ M. Lindauer, R. Bergdoll; F. Hutter (2016). "An Empirical Study of Per-Instance Algorithm Scheduling". Proceedings of the International Conference on Learning and Intelligent Optimization.
  11. ^ M. Lindauer and H. Hoos and F. Hutter (2015). "From Sequential Algorithm Selection to Parallel Portfolio Selection". Proceedings of the International Conference on Learning and Intelligent Optimization.