Version space learning

A version space in concept learning or induction is the subset of all hypotheses that are consistent with the observed training examples.^[1] This set contains all hypotheses that have not been eliminated as a result of being in conflict with observed data.

The Version Space algorithm

In settings where there is a generality-ordering on hypotheses, it is possible to represent the version space by two sets of hypotheses: (1) the most specific consistent hypotheses, and (2) the most general consistent hypotheses, where "consistent" indicates agreement with observed data.

The most specific hypotheses (i.e., the specific boundary SB) cover the observed positive training examples, and as little of the remaining feature space as possible. These hypotheses, if reduced any further, exclude a positive training example, and hence become inconsistent. These minimal hypotheses essentially constitute a (pessimistic) claim that the true concept is defined just by the positive data already observed: Thus, if a novel (never-before-seen) data point is observed, it should be assumed to be negative. (I.e., if data has not previously been ruled in, then it's ruled out.)

The most general hypotheses (i.e., the general boundary GB) cover the observed positive training examples, but also cover as much of the remaining feature space without including any negative training examples. These, if enlarged any further, include a negative training example, and hence become inconsistent. These maximal hypotheses essentially constitute a (optimistic) claim that the true concept is defined just by the negative data already observed: Thus, if a novel (never-before-seen) data point is observed, it should be assumed to be positive. (I.e., if data has not previously been ruled out, then it's ruled in.)

Thus, during learning, the version space (which itself is a set – possibly infinite – containing all consistent hypotheses) can be represented by just its lower and upper bounds (maximally general and maximally specific hypothesis sets), and learning operations can be performed just on these representative sets.

Historical background

The notion of Version Spaces was introduced by Mitchell as a framework for understanding the basic problem of supervised learning within the context of solution search. Although the basic "candidate elimination" search method that accompanies the Version Space framework is not a popular learning algorithm (for various reasons, including the issue of noise (Russell & Norvig 2002)), there are some practical implementations that have been developed (e.g., Sverdlik & Reynolds 1992, Hong & Tsang 1997, Dubois & Quafafou 2002).

References

^ Mitchell, Tom M. (1982). "Generalization as search". Artificial Intelligence. 18 (2): 203–226. doi:10.1016/0004-3702(82)90040-6.
^ Rendell, Larry (1986). "A general framework for induction and a study of selective induction". Machine Learning. 1 (2): 177–226. doi:10.1007/BF00114117.

Dubois, Vincent (2002). "Concept learning with approximation: Rough version spaces". Rough Sets and Current Trends in Computing: Proceedings of the Third International Conference, RSCTC 2002. Malvern, Pennsylvania. pp. 239–246. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help); Unknown parameter |coauthors= ignored (|author= suggested) (help)
Hong, Tzung-Pai (1997). "A generalized version space learning algorithm for noisy and uncertain data". IEEE Transactions on Knowledge and Data Engineering. 9 (2): 336–340. doi:10.1109/69.591457. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)
Mill, John Stuart (1843/2002). A System of Logic, Ratiocinative and Inductive: Being a Connected View of the Principles of Evidence and the Methods of Scientific Investigation. Honolulu, HI: University Press of the Pacific. {{cite book}}: Check date values in: |date= (help)
Mitchell, Tom M. (1997). Machine Learning. Boston: McGraw-Hill.
Russell, Stuart J.; Norvig, Peter (2003), Artificial Intelligence: A Modern Approach (2nd ed.), Upper Saddle River, New Jersey: Prentice Hall, ISBN 0-13-790395-2
Sverdlik, W. (1992). "Dynamic version spaces in machine learning". Proceedings, Fourth International Conference on Tools with Artificial Intelligence (TAI '92). Arlington, VA. pp. 308–315. {{cite conference}}: Unknown parameter |booktitle= ignored (|book-title= suggested) (help); Unknown parameter |coauthors= ignored (|author= suggested) (help)

[Mitchel-1982-1] Mitchell, Tom M. (1982). "Generalization as search". Artificial Intelligence. 18 (2): 203–226. doi:10.1016/0004-3702(82)90040-6.

[Rendell_1986-2] Rendell, Larry (1986). "A general framework for induction and a study of selective induction". Machine Learning. 1 (2): 177–226. doi:10.1007/BF00114117.

[1]

[2]

The Version Space algorithm

Historical background

See also

References