Ugly duckling theorem

From Wikipedia, the free encyclopedia
Jump to: navigation, search

The Ugly Duckling theorem is an argument asserting that classification is impossible without some sort of bias. It is named for Hans Christian Andersen's story "The Ugly Duckling." It gets its name because it shows that, all things being equal, an ugly duckling is just as similar to a swan as two swans are to each other, although it is only a theorem in a very informal sense. It was proposed by Satosi Watanabe in 1969.[1]

Basic idea[edit]

Watanabe came to realize there is a unquantifiable number of shared properties between all objects, making any classification biased. Murphy and Medin (1985) give an example of two putative classified things, plums and lawnmowers:

"Suppose that one is to list the attributes that plums and lawnmowers have in common in order to judge their similarity. It is easy to see that the list could be infinite: Both weigh less than 10,000 kg (and less than 10,001 kg), both did not exist 10,000,000 years ago (and 10,000,001 years ago), both cannot hear well, both can be dropped, both take up space, and so on. Likewise, the list of differences could be infinite… any two entities can be arbitrarily similar or dissimilar by changing the criterion of what counts as a relevant attribute."[2]

Unless some properties are considered more salient, or ‘weighted’ more important than others, everything will appear equally similar, hence Watanabe (1986) wrote: “any objects, in so far as they are distinguishable, are equally similar".[3] However since there is an unlimited number of properties to choose from, it remains an arbitrary choice what properties to select/deselect. This makes classification biased. Watanabe named this the "Ugly Duckling theorem" because a swan is as similar to a duckling as to another swan (there are no constraints or fixes on what constitutes similarity).

Mathematical formula[edit]

Suppose there are n things in the universe, and one wants to put them into classes or categories. One has no preconceived ideas or biases about what sorts of categories are "natural" or "normal" and what are not. So one has to consider all the possible classes that could be, all the possible ways of making sets out of the n objects. There are 2^n such ways, the size of the power set of n objects. One can use that to measure the similarity between two objects: and one would see how many sets they have in common. However one can not. Any two objects have exactly the same number of classes in common if we can form any possible class, namely 2^{n-1} (half the total number of classes there are). To see this is so, one may imagine each class is a represented by an n-bit string (or binary encoded integer), with a zero for each element not in the class and a one for each element in the class. As one finds, there are 2^n such strings.

As all possible choices of zeros and ones are there, any two bit-positions will agree exactly half the time. One may pick two elements and reorder the bits so they are the first two, and imagine the numbers sorted lexicographically. The first 2^n/2 numbers will have bit #1 set to zero, and the second 2^n/2 will have it set to one. Within each of those blocks, the top 2^n/4 will have bit #2 set to zero and the other 2^n/4 will have it as one, so they agree on two blocks of 2^n/4 or on half of all the cases. No matter which two elements one picks. So if we have no preconceived bias about which categories are better, everything is then equally similar (or equally dissimilar). The number of predicates simultaneously satisfied by two non-identical elements is constant over all such pairs and is the same[citation needed] as the number of those satisfied by one. Thus, some kind of inductive[citation needed] bias is needed to make judgements; i.e. to prefer certain categories over others.

Boolean functions[edit]

Let x_1, x_2, \dots, x_n be a set of vectors of k booleans each. The ugly duckling is the vector which is least like the others. Given the booleans, this can be computed using Hamming distance.

However, the choice of boolean features to consider could have been somewhat arbitrary. Perhaps there were features derivable from the original features that were important for identifying the ugly duckling. The set of booleans in the vector can be extended with new features computed as boolean functions of the k original features. The only canonical way to do this is to extend it with all possible Boolean functions. The resulting completed vectors have 2^k features. The Ugly Duckling Theorem states that there is no ugly duckling because any two completed vectors will either be equal or differ in exactly half of the features.

Proof. Let x and y be two vectors. If they are the same, then their completed vectors must also be the same because any Boolean function of x will agree with the same Boolean function of y. If x and y are different, then there exists a coordinate i where the i-th coordinate of x differs from the i-th coordinate of y. Now the completed features contain every Boolean function on k Boolean variables, with each one exactly once. Viewing these Boolean functions as polynomials in k variables over GF(2), segregate the functions into pairs (f,g) where f contains the i-th coordinate as a linear term and g is f without that linear term. Now, for every such pair (f,g), x and y will agree on exactly one of the two functions. If they agree on one, they must disagree on the other and vice versa. (This proof is believed to be due to Watanabe.)


A solution to the Ugly Ducking Theorem would be to introduce a constraint on how similarity is measured by limiting the properties involved in classification, say between A and B. However Medin et al. (1993) point out that this does not actually resolve the arbitrariness or bias problem since in what respects A is similar to B: “varies with the stimulus context and task, so that there is no unique answer, to the question of how similar is one object to another”.[4][5] For example "a barberpole and a zebra would be more similar than a horse and a zebra if the feature striped had sufficient weight. Of course, if these feature weights were fixed, then these similarity relations would be constrained". Yet the property "striped" as a weight 'fix' or constraint is arbitrary itself, meaning: "unless one can specify such criteria, then the claim that categorization is based on attribute matching is almost entirely vacuous".

Stamos (2003) has attempted to solve the Ugly Ducking Theorem by showing some judgments of overall similarity are non-arbitrary in the sense they are useful:

"Presumably, people's perceptual and conceptual processes have evolved that information that matters to human needs and goals can be roughly approximated by a similarity heuristic... If you are in the jungle and you see a tiger but you decide not to stereotype (perhaps because you believe that similarity is a false friend), then you will probably be eaten. In other words, in the biological world stereotyping based on veridical judgments of overall similarity statistically results in greater survival and reproductive success."[6]

See also[edit]


  1. ^ Watanabe, Satosi (1969). Knowing and Guessing: A Quantitative Study of Inference and Information. New York: Wiley. pp. 376–377. 
  2. ^ Murphy, G. L., Medin, D. L. (1985). "The Role of Theories in Conceptual Coherence". Psychological Review. 92(3): 289-316.
  3. ^ Watanabe, S. (1986). “Epistemological Relativity”. Annals of the Japan Association for Philosophy of Science. 7(1): 1-14.
  4. ^ Medin, D. L., Goldstone, R. L., Gentner, D. (1993). “Respects for similarity”. Psychological Review. 100(2): 254-278.
  5. ^ The philosopher Nelson Goodman (1972) came to the same conclusion: "But importance is a highly volatile matter, varying with every shift of context and interest, and quite incapable of supporting the fixed distinctions that philosophers so often seek to rest upon it".
  6. ^ Stamos, D. N. (2003). The Species Problem. Lexington Books. p. 344.