Canonical form

From Wikipedia, the free encyclopedia
  (Redirected from Data normalization)
Jump to: navigation, search
Algorithmic anagram test using multisets as canonical forms: The strings "madam curie" and "radium came" are given as C arrays. Each one is converted into a canonical form by sorting. Since both sorted strings literally agree, the original strings were anagrams of each other.

In mathematics and computer science, a canonical, normal, or standard form of a mathematical object is a standard way of presenting that object as a mathematical expression. The distinction between "canonical" and "normal" forms varies by subfield. In most fields, a canonical form specifies a unique representation for every object, while a normal form simply specifies its form, without the requirement of uniqueness.

The canonical form of a positive integer in decimal representation is a finite sequence of digits that does not begin with zero.

More generally, for a class of objects on which an equivalence relation is defined, a canonical form consists in the choice of a specific object in each class. For example, Jordan normal form is a canonical form for matrix similarity, and the row echelon form is a canonical form, when one considers as equivalent a matrix and its left product by an invertible matrix.

In computer science, and more specifically in computer algebra, when representing mathematical objects in a computer, there are usually many different ways to represent the same object. In this context, a canonical form is a representation such that every object has a unique representation. Thus, the equality of two objects can easily be tested by testing the equality of their canonical forms. However canonical forms frequently depend on arbitrary choices (like ordering the variables), and this introduces difficulties for testing the equality of two objects resulting on independent computations. Therefore, in computer algebra, normal form is a weaker notion: A normal form is a representation such that zero is uniquely represented. This allows testing for equality by putting the difference of two objects in normal form.

Canonical form can also mean a differential form that is defined in a natural (canonical) way.

In computer science, data that has more than one possible representation can often be canonicalized into a completely unique representation called its canonical form. Putting something into canonical form is canonicalization.[1]


Suppose we have some set S of objects, with an equivalence relation R. A canonical form is given by designating some objects of S to be "in canonical form", such that every object under consideration is equivalent to exactly one object in canonical form. In other words, the canonical forms in S represent the equivalence classes, once and only once. To test whether two objects are equivalent, it then suffices to test their canonical forms for equality. A canonical form thus provides a classification theorem and more, in that it not just classifies every class, but gives a distinguished (canonical) representative.

Formally, a canonicalization with respect to an equivalence relation R on a set S is a mapping c:SS such that for all s, s1, s2S:

  1. c(s) = c(c(s))   (idempotence),
  2. s1 R s2 if and only if c(s1) = c(s2)   (decisiveness), and
  3. s R c(s)   (representativeness).

Property 3 is redundant, it follows by applying 2 to 1.

In practical terms, one wants to be able to recognize the canonical forms. There is also a practical, algorithmic question to consider: how to pass from a given object s in S to its canonical form s*? Canonical forms are generally used to make operating with equivalence classes more effective. For example, in modular arithmetic, the canonical form for a residue class is usually taken as the least non-negative integer in it. Operations on classes are carried out by combining these representatives and then reducing the result to its least non-negative residue. The uniqueness requirement is sometimes relaxed, allowing the forms to be unique up to some finer equivalence relation, like allowing reordering of terms (if there is no natural ordering on terms).

A canonical form may simply be a convention, or a deep theorem.

For example, polynomials are conventionally written with the terms in descending powers: it is more usual to write x2 + x + 30 than x + 30 + x2, although the two forms define the same polynomial. By contrast, the existence of Jordan canonical form for a matrix is a deep theorem.


Note: in this section, "up to" some equivalence relation E means that the canonical form is not unique in general, but that if one object has two different canonical forms, they are E-equivalent.

Linear algebra[edit]

Objects A is equivalent to B if: Normal form Notes
Normal matrices over the complex numbers for some unitary matrix U Diagonal matrices (up to reordering) This is the Spectral theorem
Matrices over the complex numbers for some unitary matrices U and V Diagonal matrices with real positive entries (in descending order) Singular value decomposition
Matrices over an algebraically closed field for some invertible matrix P Jordan normal form (up to reordering of blocks)
Matrices over an algebraically closed field for some invertible matrix P Weyr canonical form (up to reordering of blocks)
Matrices over a field for some invertible matrix P Frobenius normal form
Matrices over a principal ideal domain for some invertible Matrices P and Q Smith normal form The equivalence is the same as allowing invertible elementary row and column transformations
Finite-dimensional vector spaces over a field K A and B are isomorphic as vector spaces , n a non-negative integer

Classical logic[edit]

Functional analysis[edit]

Objects A is equivalent to B if: Normal form
Hilbert spaces If A and B are both separable Hilbert spaces of infinite dimension, then A and B are isometrically isomorphic. sequence spaces (up to exchanging the index set I with another index set of the same cardinality)
Commutative -algebras with unit A and B are isomorphic as -algebras The algebra of continuous functions on a compact Hausdorff space, up to homeomorphism of the base space.

Number theory[edit]


Objects A is equivalent to B if: Normal form
Finitely generated R-modules with R a principal ideal domain A and B are isomorphic as R-modules Primary decomposition (up to reordering) or invariant factor decomposition


  • The equation of a line: Ax + By = C, with A2 + B2 = 1 and C ≥ 0
  • The equation of a circle:

By contrast, there are alternative forms for writing equations. For example, the equation of a line may be written as a linear equation in point-slope and slope-intercept form.

Mathematical notation[edit]

Standard form is used by many mathematicians and scientists to write extremely large numbers in a more concise and understandable way.

Set theory[edit]

Game theory[edit]

Proof theory[edit]

Rewriting systems[edit]

Lambda calculus[edit]

  • A lambda term is in beta normal form if no beta reduction is possible; lambda calculus is a particular case of an abstract rewriting system. In the untyped lambda calculus, e.g., the term doesn't have a normal form. In the typed lambda calculus, every well-formed term can be rewritten to its normal form.

Dynamical systems[edit]

Graph theory[edit]

In graph theory, a branch of mathematics, graph canonization is the problem finding a canonical form of a given graph G. A canonical form is a labeled graph Canon(G) that is isomorphic to G, such that every graph that is isomorphic to G has the same canonical form as G. Thus, from a solution to the graph canonization problem, one could also solve the problem of graph isomorphism: to test whether two graphs G and H are isomorphic, compute their canonical forms Canon(G) and Canon(H), and test whether these two canonical forms are identical.

Differential forms[edit]

Canonical differential forms include the canonical one-form and canonical symplectic form, important in the study of Hamiltonian mechanics and symplectic manifolds.


In computing, the reduction of data to any kind of canonical form is commonly called data normalization.

For instance, Database normalization is the process of organizing the fields and tables of a relational database to minimize redundancy and dependency. In the field of software security, a common vulnerability is unchecked malicious input. The mitigation for this problem is proper input validation. Before input validation may be performed, the input must be normalized, i.e., eliminating encoding (for instance HTML encoding) and reducing the input data to a single common character set.

Other forms of data, typically associated with signal processing (including audio and imaging) or machine learning, can be normalized in order to provide a limited range of values.

See also[edit]


  1. ^ The term 'canonization' is sometimes incorrectly used for this.


  • Shilov, Georgi E. (1977), Silverman, Richard A., ed., Linear Algebra, Dover, ISBN 0-486-63518-X .
  • Hansen, Vagn Lundsgaard (2006), Functional Analysis: Entering Hilbert Space, World Scientific Publishing, ISBN 981-256-563-9 .