Last Week's Potatoes: The VC dimension.

Andrea Bonvini

The VC dimension.

Machine Learning Classification

A quick explanation of the VC dimension.

Author

Affiliation

Andrea Bonvini

Published

Feb. 13, 2021

Citation

Bonvini, 2021

When talking about binary classification, an hypothesis is a function that maps an input from the entire input space to a result:

The number of hypotheses

can be infinite.

A dichotomy is a hypothesis that maps from an input from the sample size to a result:

The number of dichotomies is at most , where is the sample size.

e.g. for a sample size we have at most possible dichotomies:

        x1 x2 x3
1       -1 -1 -1
2       -1 -1 +1
3       -1 +1 -1
4       -1 +1 +1
5       +1 -1 -1 
6       +1 -1 +1
7       +1 +1 -1
8       +1 +1 +1

The growth function is a function that counts the most dichotomies on any points.

This translates into choosing any

points and laying them out in any fashion in the input space. Determining

is equivalent to looking for such a layout of the

points that yields the most dichotomies.

The growth function satisfies:

This can be applied to the perceptron. For example, when

, we can lay out the points so that they are easily separated. However, given a layout, we must then consider all possible configurations of labels on the points, one of which is the following:

This is where the perceptron breaks down because it cannot separate that configuration, and so because two configurations—this one and the one in which the left/right points are blue and top/bottom are red—cannot be represented. For this reason, we have to expect that for perceptrons, can’t be .

The VC ( Vapnik-Chervonenkis ) dimension of a hypothesis set , denoted by is the largest value of for which , in other words is “the most points can shatter”

We can say that the VC dimension is one of many measures that characterize the expressive power, or capacity, of a hypothesis class.

You can think of the VC dimension as “how many points can this model class memorize/shatter?” (a ton? BAD! not so many? GOOD!).

With respect to learning, the effect of the VC dimension is that if the VC dimension is finite, then the hypothesis will generalize:

The key observation here is that this statement is independent of:

The learning algorithm
The input distribution
The target function

The only things that factor into this are the training examples, the hypothesis set, and the final hypothesis.

The VC dimension for a linear classifier (i.e. a line in 2D, a plane in 3D etc…) is (a line can shatter at most points, a plane can shatter at most points etc…)

Proof: here

How many randomly drawn examples sufﬁce to guarantee error of at most with probability at least (1−)?

PAC BOUND using VC dimension:

Citation

For attribution, please cite this work as

Bonvini (2021, Feb. 14). Last Week's Potatoes: The VC dimension.. Retrieved from https://lastweekspotatoes.com/posts/2021-07-22-the-vc-dimension/

BibTeX citation

@misc{bonvini2021the,
  author = {Bonvini, Andrea},
  title = {Last Week's Potatoes: The VC dimension.},
  url = {https://lastweekspotatoes.com/posts/2021-07-22-the-vc-dimension/},
  year = {2021}
}

The VC dimension.

Author

Affiliation

Published

Citation

Footnotes

Citation