Inductive bias

The inductive bias (also known as learning bias) of a learning algorithm is the set of assumptions that the learner uses to predict outputs of given inputs that it has not encountered.^[1] Inductive bias is anything which makes the algorithm learn one pattern instead of another pattern (e.g. step-functions in decision trees instead of continuous function in a linear regression model). Learning is the process of apprehending useful knowledge by observing and interacting with the world.^[2] It involves searching a space of solutions for one expected to provide a better explanation of the data or to achieve higher rewards. But in many cases, there are multiple solutions which are equally good.^[3] An inductive bias allows a learning algorithm to prioritize one solution (or interpretation) over another, independent of the observed data.^[4]

In machine learning, one aims to construct algorithms that are able to learn to predict a certain target output. To achieve this, the learning algorithm is presented some training examples that demonstrate the intended relation of input and output values. Then the learner is supposed to approximate the correct output, even for examples that have not been shown during training. Without any additional assumptions, this problem cannot be solved since unseen situations might have an arbitrary output value. The kind of necessary assumptions about the nature of the target function are subsumed in the phrase inductive bias.^[1]^[5]

A classical example of an inductive bias is Occam's razor, assuming that the simplest consistent hypothesis about the target function is actually the best. Here consistent means that the hypothesis of the learner yields correct outputs for all of the examples that have been given to the algorithm.

Approaches to a more formal definition of inductive bias are based on mathematical logic. Here, the inductive bias is a logical formula that, together with the training data, logically entails the hypothesis generated by the learner. However, this strict formalism fails in many practical cases, where the inductive bias can only be given as a rough description (e.g. in the case of artificial neural networks), or not at all.

Types[edit]

The following is a list of common inductive biases in machine learning algorithms.

Maximum conditional independence: if the hypothesis can be cast in a Bayesian framework, try to maximize conditional independence. This is the bias used in the Naive Bayes classifier.
Minimum cross-validation error: when trying to choose among hypotheses, select the hypothesis with the lowest cross-validation error. Although cross-validation may seem to be free of bias, the "no free lunch" theorems show that cross-validation must be biased, for example assuming that there is no information encoded in the ordering of the data.
Maximum margin: when drawing a boundary between two classes, attempt to maximize the width of the boundary. This is the bias used in support vector machines. The assumption is that distinct classes tend to be separated by wide boundaries.
Minimum description length: when forming a hypothesis, attempt to minimize the length of the description of the hypothesis.
Minimum features: unless there is good evidence that a feature is useful, it should be deleted. This is the assumption behind feature selection algorithms.
Nearest neighbors: assume that most of the cases in a small neighborhood in feature space belong to the same class. Given a case for which the class is unknown, guess that it belongs to the same class as the majority in its immediate neighborhood. This is the bias used in the k-nearest neighbors algorithm. The assumption is that cases that are near each other tend to belong to the same class.

Shift of bias[edit]

Although most learning algorithms have a static bias, some algorithms are designed to shift their bias as they acquire more data.^[6] This does not avoid bias, since the bias shifting process itself must have a bias.

References[edit]

^ ^a ^b Mitchell, T. M. (1980), The need for biases in learning generalizations, CBM-TR 5-110, New Brunswick, New Jersey, USA: Rutgers University, CiteSeerX 10.1.1.19.5466
^ Battaglia, Peter W.; Hamrick; Bapst; Sanchez-Gonzalez; Zambaldi; Malinowski; Tacchetti; Raposo; Santoro; Faulkner (2018). "Relational inductive biases, deep learning, and graph networks". arXiv:1806.01261 [cs.LG].
^ Goodman, Nelson (1955). "The new riddle of induction". Fact, Fiction, and Forecast. Harvard University Press. pp. 59–83. ISBN 978-0-674-29071-6.{{cite book}}: CS1 maint: date and year (link)
^ Mitchell, Tom M (1980). "The need for biases in learning generalizations" (PDF). Rutgers University Technical Report CBM-TR-117: 184–191.
^ DesJardins, M.; Gordon, D. F. (1995), "Evaluation and selection of biases in machine learning", Machine Learning, 20 (1–2): 5–22, doi:10.1007/BF00993472
^ Utgoff, P. E. (1984), Shift of bias for inductive concept learning, New Brunswick, New Jersey, USA: Doctoral dissertation, Department of Computer Science, Rutgers University, ISBN 9780934613002

[Mitchell1980-1] Mitchell, T. M. (1980), The need for biases in learning generalizations, CBM-TR 5-110, New Brunswick, New Jersey, USA: Rutgers University, CiteSeerX 10.1.1.19.5466

[2] Battaglia, Peter W.; Hamrick; Bapst; Sanchez-Gonzalez; Zambaldi; Malinowski; Tacchetti; Raposo; Santoro; Faulkner (2018). "Relational inductive biases, deep learning, and graph networks". arXiv:1806.01261 [cs.LG].

[3] Goodman, Nelson (1955). "The new riddle of induction". Fact, Fiction, and Forecast. Harvard University Press. pp. 59–83. ISBN 978-0-674-29071-6.{{cite book}}: CS1 maint: date and year (link)

[4] Mitchell, Tom M (1980). "The need for biases in learning generalizations" (PDF). Rutgers University Technical Report CBM-TR-117: 184–191.

[DesJardinsandGordon1995-5] DesJardins, M.; Gordon, D. F. (1995), "Evaluation and selection of biases in machine learning", Machine Learning, 20 (1–2): 5–22, doi:10.1007/BF00993472

[Utgoff1984-6] Utgoff, P. E. (1984), Shift of bias for inductive concept learning, New Brunswick, New Jersey, USA: Doctoral dissertation, Department of Computer Science, Rutgers University, ISBN 9780934613002

[1]

[2]

[3]

[4]

[5]

[6]

v t e Biases
Cognitive biases	Acquiescence Ambiguity Anchoring Attentional Attribution Actor–observer Correspondence Authority Automation Availability Mean world Belief Blind spot Choice-supportive Commitment Confirmation Compassion fade Congruence Cultural Distinction Dunning–Kruger Egocentric Curse of knowledge Emotional Extrinsic incentives Fading affect Framing Frequency Frog pond effect Halo effect Hindsight Horn effect Hostile attribution Impact Implicit In-group Illusion of transparency Mean world syndrome Mere-exposure effect Negativity Normalcy Omission Optimism Out-group homogeneity Outcome Overton window Precision Present Pro-innovation Response Restraint Self-serving Social comparison Social influence bias Spotlight Status quo Substitution Time-saving Trait ascription Turkey illusion von Restorff effect Zero-risk In animals
Statistical biases	Estimator Forecast Healthy user Information Psychological Lead time Length time Non-response Observer Omitted-variable Participation Recall Sampling Selection Self-selection Social desirability Spectrum Survivorship Systematic error Systemic Verification Wet
Other biases	Academic Basking in reflected glory Funding FUTON Inductive Infrastructure Inherent In education Liking gap Media False balance Vietnam War Norway South Asia Sweden United States Arab–Israeli conflict Ukraine Net Political bias Publication Reporting White hat
Bias reduction	Cognitive bias mitigation Debiasing Heuristics in judgment and decision-making
Lists: General Memory

Types[edit]

Shift of bias[edit]

See also[edit]

References[edit]