Jump to content

Shattered set

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Igny (talk | contribs) at 12:19, 26 June 2007 (remove the incorrect sentence: the terminology was introduced in 1960s by Vapnik/Chervonenkis). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

The concept of shattering of a set of points plays an important role in Vapnik-Chervonenkis theory, also known as VC-theory. Shattering and VC-theory are used in the study of empirical processes as well as in statistical computational learning theory.

Definition

Suppose we have a class C of sets and a given set A. C is said to shatter A if, for each subset T of A, there is some element U of C such that

Equivalently, C shatters A when the power set P(A) is the set { UA | UC }.

For example, the class of all discs in the plane (two-dimensional space) cannot shatter every set of four points, yet the class of all convex sets in the plane shatters every finite set on the (unit) circle. (For the collection of all convex sets, connect the dots!)

We employ the letter to refer to "class" or "collection" of sets, as in a Vapnik-Chervonenkis class (VC-class). The set is often assumed to be finite, because, in empirical processes, we are interested in the shattering of finite sets of data points.

Shatter coefficient

To quantify the richness of a collection of sets, we use the concept of shattering coefficients (also known as shatter coefficients or growth function). For a collection of sets Ω, Ω being any space, often a probability space, we define the nth shattering coefficient of as

where "card" denotes the cardinality, that is the number of elements of a set.

is equal to the largest number of subsets of any set of n points that can be formed by intersecting with the sets in collection .

It is obvious that

1. for all n.
2. If , that means there is a set of cardinality n, which can be shattered by C.
3. If for some then for all

The third property means that if C can not shatter any set of cardinality N then it can not shatter sets of larger cardinalities.

Vapnik-Chervonenkis class

The VC dimension of class C is defined as

or, alternatively,

Note that . If for any n there is a set of cardinality n which can be shattered by C, then for all n and the VC dimension of this class is infinite.

Class with finite VC dimension is called Vapnik-Chervonenkis class or VC class. A class C is uniformly Glivenko-Cantelli class if and only if it is a VC class.

References

  • Wencour, R.S., R.M. Dudley (1981), "Some special Vapnik-Chervonenkis classes", Discrete Math, 33, 1981, 313-318.
  • Steele, J.M. (1975), "Combinatorial Entropy and Uniform Limit Laws," Stanford University Ph.d Thesis (Mathematics)
  • Steele, J.M. (1978), "Empirical discrepancies and subadditive processes," Annals of Probability, 6, 118--227.