The growth function, also called the shatter coefficient or the shattering number, measures the richness of a set family or class of function. It is especially used in the context of statistical learning theory, where it is used to study properties of statistical learning methods.
The term 'growth function' was coined by Vapnik and Chervonenkis in their 1968 paper, where they also proved many of its properties.[1]
It is a basic concept in machine learning.[2][3]
Let be a set family (a set of sets) and a set. Their intersection is defined as the following set-family:
The intersection-size (also called the index) of with respect to is . If a set has elements then the index is at most . If the index is exactly 2m then the set is said to be shattered by , because contains all the subsets of , i.e.:
The growth function measures the size of as a function of . Formally:
Equivalently, let be a hypothesis-class (a set of binary functions) and a set with elements. The restriction of to is the set of binary functions on that can be derived from :[3]: 45
The growth function measures the size of as a function of :[3]: 49
1. The domain is the real line .
The set-family contains all the half-lines (rays) from a given number to positive infinity, i.e., all sets of the form for some .
For any set of real numbers, the intersection contains sets: the empty set, the set containing the largest element of , the set containing the two largest elements of , and so on. Therefore: .[1]: Ex.1 The same is true whether contains open half-lines, closed half-lines, or both.
2. The domain is the segment .
The set-family contains all the open sets.
For any finite set of real numbers, the intersection contains all possible subsets of . There are such subsets, so .
[1]: Ex.2
4. The domain is the real line . The set-family contains all the real intervals, i.e., all sets of the form for some . For any set of real numbers, the intersection contains all runs of between 0 and consecutive elements of . The number of such runs is , so .
I.e, the growth function has an exponential upper-bound.
We say that a set-family shatters a set if their intersection contains all possible subsets of , i.e. .
If shatters of size , then , which is the upper bound.
The VC dimension of is defined according to these two cases:
In the polynomial case, = the largest integer for which .
In the exponential case.
So if-and-only-if .
The growth function can be regarded as a refinement of the concept of VC dimension. The VC dimension only tells us whether is equal to or smaller than , while the growth function tells us exactly how changes as a function of .
Another connection between the growth function and the VC dimension is given by the Sauer–Shelah lemma:[3]: 49
If , then:
for all :
In particular,
for all :
so when the VC dimension is finite, the growth function grows polynomially with .
This upper bound is tight, i.e., for all there exists with VC dimension such that:[2]: 56
Let be a set on which a probability measure is defined.
Let be family of subsets of (= a family of events).
Suppose we choose a set that contains elements of ,
where each element is chosen at random according to the probability measure , independently of the others (i.e., with replacements). For each event , we compare the following two quantities:
Its relative frequency in , i.e., ;
Its probability .
We are interested in the difference, . This difference satisfies the following upper bound:
In words: the probability that for all events in , the relative-frequency is near the probability, is lower-bounded by an expression that depends on the growth-function of .
A corollary of this is that, if the growth function is polynomial in (i.e., there exists some such that ), then the above probability approaches 1 as . I.e, the family enjoys uniform convergence in probability.
^ abcdefghVapnik, V. N.; Chervonenkis, A. Ya. (1971). "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Theory of Probability & Its Applications. 16 (2): 264. doi:10.1137/1116025.
This is an English translation, by B. Seckler, of the Russian paper: "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Dokl. Akad. Nauk. 181 (4): 781. 1968.
The translation was reproduced as:
Vapnik, V. N.; Chervonenkis, A. Ya. (2015). "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Measures of Complexity. p. 11. doi:10.1007/978-3-319-21852-6_3. ISBN978-3-319-21851-9.
^ abcdMohri, Mehryar; Rostamizadeh, Afshin; Talwalkar, Ameet (2012). Foundations of Machine Learning. US, Massachusetts: MIT Press. ISBN9780262018258., especially Section 3.2
^ abcdShalev-Shwartz, Shai; Ben-David, Shai (2014). Understanding Machine Learning – from Theory to Algorithms. Cambridge University Press. ISBN9781107057135.