Abuse of notation

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In mathematics, abuse of notation occurs when an author uses a mathematical notation in a way that is not formally correct but that seems likely to simplify the exposition or suggest the correct intuition (while being unlikely to introduce errors or cause confusion). Abuse of notation should be contrasted with misuse of notation, which should be avoided.

A related concept is abuse of language or abuse of terminology, when not notation but a term is misused. Abuse of language is an almost synonymous expression that is usually used for non-notational abuses. For example, while the word representation properly designates a group homomorphism from a group G to GL(V), where V is a vector space, it is common to call V "a representation of G". A common abuse of language consists in identifying two mathematical objects that are different but canonically isomorphic. For example, identifying a constant function and its value or identifying to \mathbb R^3 the Euclidean space of dimension three equipped with a Cartesian coordinate system.

The latter uses may achieve clarity in the new area in an unexpected way, but it may borrow arguments from the old area that do not carry over, creating a false analogy.


Common examples occur when speaking of compound mathematical objects. For example, a topological space consists of a set X (called the underlying set of the topological space) and a topology \mathcal{T}, and two topological spaces (X, \mathcal{T}) and (X, \mathcal{T}'), even with the same underlying set X, can be quite different if they have different topologies. Nevertheless, it is common to refer to such a space simply as X when there is no danger of confusion—that is, when it is implicitly clear what topology is being considered. Similarly, one often refers to a group (G, \star) as simply G when the group operation is clear from context.

Functional notation[edit]

One encounters, in many textbooks, sentences such as "Let f(x) be a function ...". This is an abuse of notation, as the name of the function is f, and f(x) denotes normally the value of the function f for the element x of its domain. The correct phrase would be "Let f be a function of the variable x ..." or "Let xf(x) be a function ..." This abuse of notation is widely used, as it simplifies the formulation, and the systematic use of a correct notation becomes quickly pedantic.

A similar abuse of notation occurs in sentences such as "Let us consider the function x2 + x + 1.". In fact x2 + x + 1 is not a function. The function is the operation that associates x2 + x + 1 to x, that is xx2 + x + 1. Nevertheless, this abuse of notation is widely used as, generally, it is not confusing. However, in most computer algebra systems, expressions are distinct from functions, and the habit of this abuse of notation leads many beginners in computer algebra to make erroneous computations.

Equivalence classes[edit]

Referring to an equivalence class of an equivalence relation by x instead of [x] is an abuse of notation. Formally, if a set X is partitioned by an equivalence relation ~, then for each xX, the equivalence class {yX | y ~ x} is denoted [x]. But in practice, if the remainder of the discussion is focused on equivalence classes rather than individual elements of the underlying set, it is common to drop the square brackets in the discussion.

For example, in modular arithmetic, a finite group of order n can be formed by partitioning the integers via the equivalence relation x ~ y if and only if xy (mod n). The elements of that group would then be [0], [1], …, [n - 1], but in practice they are usually just denoted 0, 1, …, n - 1.

Another example is the space of (classes of) measurable functions over a measure space, or classes of Lebesgue integrable functions, where the equivalence relation is equality "almost everywhere".


In standard analysis, some algebraic manipulations on the Leibniz notation for the derivative \frac{dy}{dx} are an abuse of notation. It is often convenient to treat the expression \frac{dy}{dx} as a fraction. For example, it leads to the correct formula for differentiation of composed functions (the chain rule): \frac{dy}{dx}=\frac{dy}{du}\frac{du}{dx}. Another example is the separation of variables in solving differential equations, in which one can rewrite the equation \frac{dy}{dx}={g(x) \over h(y)} as h(y) dy = {g(x)dx}, and then integrate.

A related abuse of notation occurs when integrals like \int {1 \over x}\,dx are written as \int {dx \over x}, as if dx were a term multiplied into the integral's 1 \over x argument.

These manipulations can be made rigorous in the theory of differential forms.

Del operator[edit]

The del operator, denoted by \nabla, is a tuple of partial derivative operators posing as a vector. This suggests notations such as \nabla f for gradient, \nabla \cdot \vec v for divergence, and \nabla \times \vec v for curl. The notation is extremely convenient because \nabla does behave like a vector most of the time. But it can be regarded as an abuse because \nabla doesn't commute with vectors, and so doesn't satisfy all properties of vectors.

(A contrary view is that notation is not abused if one does not think of \nabla as a vector. The vector-like notations are simply specially defined uses of the dot and cross.)

Cross product[edit]

The determinant of a 3×3 matrix may be computed by "expanding along the first row" as follows:

\det \begin{bmatrix}
a_1 & a_2 & a_3 \\
b_1 & b_2 & b_3 \\
c_1 & c_2 & c_3
\end{bmatrix} = a_1 \det \begin{bmatrix}
b_2 & b_3 \\
c_2 & c_3\end{bmatrix}- a_2 \det \begin{bmatrix}
b_1 & b_3 \\
c_1 & c_3\end{bmatrix}+ a_3 \det \begin{bmatrix}
b_1 & b_2 \\
c_1 & c_2\end{bmatrix}

The cross product of the vectors (a1, a2, a3) and (b1, b2, b3) is given similarly by

\det \begin{bmatrix}
a_2 & a_3 \\
b_2 & b_3\end{bmatrix}\mathbf{i} - \det \begin{bmatrix}
a_1 & a_3 \\
b_1 & b_3\end{bmatrix}\mathbf{j}+ \det \begin{bmatrix}
a_1 & a_2 \\
b_1 & b_2 \end{bmatrix} \mathbf{k}

Thus the cross product may be computed by writing the "symbolic determinant"

\mathbf{a}\times\mathbf{b}=\det \begin{bmatrix}
\mathbf{i} & \mathbf{j} & \mathbf{k} \\
a_1 & a_2 & a_3 \\
b_1 & b_2 & b_3 \\

and expanding along the first row by rote, ignoring the fact that the matrix is not truly a matrix over the real or complex numbers (or whatever field the matrix entries belong to), and that thus the resulting computation does not compute an ordinary determinant. This is technically an abuse of notation, but is useful as a mnemonic to remember the formula for cross product and is very helpful in calculations.[1]

Function application over set[edit]

John Harrison (1996)[2] cites "the use of f(x) to represent both application of a function f to an argument x, and the image under f of a subset, x, of f's domain". (Note that the last x is usually written differently, e.g. as X, in order to distinguish an element x of the domain from a subset X.)

Exponentiation as repetition[edit]

Exponentiation is repeated multiplication, and multiplication is frequently denoted by juxtaposition of operands, with no operator at all. The suggested superscript notation for other associative operations denoted by juxtaposition follows:

Cartesian product as associative[edit]

The Cartesian product is often seen as associative, with:

(E \times F) \times G = E \times (F \times G) = E \times F \times G

This of course cannot be rigorously true: if x \in E, y \in F and z \in G, the identity ((x, y), z) = (x, (y, z)) would imply that (x, y) = x and z = (y, z), and so ((x, y), z) = (x, y, z) would mean nothing.

This notion can be made rigorous in category theory, using the idea of a natural isomorphism.

Trigonometric functions[edit]

In some countries it is common to denote the square of the value of \sin(x) as \sin^2(x), and the inverse function as \sin^{-1}(x). In his article on notation in the Edinburgh Encyclopedia Charles Babbage complains at length of this abuse of notation, and suggests two alternatives for the notation f^n(x)

  • Function composition, i.e. f^2(x) = f(f(x)) and f^{-1}(x) is the inverse.
  • Powers of the value, i.e. f^2(x) = (f(x))^2 and f^{-1}(x)=\frac{1}{f(x)} is the reciprocal.

Babbage argues strongly for the former, and also that the square of the value should be notated as \sin x^2\ , but beware: Babbage intends (\sin x)^2\ even though what he wrote is easily confused with \sin(x^2)\ (the only non-confusing way to avoid this abuse of notation is to always include the parentheses).

To press his example further, Babbage investigates what the function \sin^2(x) = \sin(\sin x) is like, and also \sin^{\frac{1}{2}}(x), which is the function which, when composed with itself, equals \sin(x), the functional square root.

Big O notation[edit]

With Big O notation, we say that some term f(x) "is" O(g(x)) (given some function g). Example: "Runtime of the algorithm is O(n^2)" or in symbols "T(n) = O(n^2)". Intuitively this notation groups functions according to their growth respective to some parameter; as such, the notation is abusive in two respects: It abuses "=", and it invokes terms that are real numbers instead of function terms. It would be appropriate to use the set membership notation and thus write f(n)\in O(g(n)) instead of f(n)=O(g(n)). This would allow common set operations like O(n\cdot\log n) \subset O(n^2), O(2^n) \cup O(n^2), and it would clarify that the relation is not symmetric in contrast to what the "=" symbol suggests. Some argue for "=" because, as an extension of the abuse, it could be useful to overload relation symbols such as < and ≤, such that, for example, f < O(g) means that f's real growth is less than g. But this further abuse is not necessary if, following Knuth one distinguishes between O and the closely related o and Θ notations.

Concerning the use of terms for scalar numbers instead of functions, one encounters the following troubles:

  1. One cannot cleanly define what f(n)\in O(g(n)) may mean, due to the fact the O notation is about growth of functions, but to the left hand and the right hand side of the relation, there are scalar values, and one cannot decide whether the relation holds if one looks at particular function values.
  2. The abused O notation is bound to one variable, and the identity of that variable can be ambiguous: for instance, in O(n^m) one of the variables might be a parameter which is not in scope of the O.

That is, one might mean O(2^m), since n was the parameter that was assigned 2, or one might mean O(n^3), since m was the parameter substituted by 3 here.

Even O(c) might be the same as O(1), since c might be a parameter, not the concerned function variable.

The carelessness regarding the use of function terms might be caused by the rarely used functional notations, like Lambda notation. One would have to write (n\mapsto n\cdot\log n) \in O(n\mapsto n^2) and O(n\mapsto n\cdot\log n) \subset O(n\mapsto n^2). The correct O notation can be easily extended to complexity functions which map tuples to complexities; this lets one formulate a statement like: "the graph algorithm needs time proportional to the logarithm of the number of edges and to the number of vertices" by T_{\mbox{graph}}\in O((v,e)\mapsto v\cdot\log e), which is not possible with the abused notation.

One can also state theorems like O(f) is a convex cone, and use that for formal reasoning.

Equality vs. isomorphism[edit]

Another common abuse of notation is to blur the distinction between equality and isomorphism. For instance, in the construction of the real numbers from Dedekind cuts of rational numbers, the rational number r is identified with the set of all rational numbers less than r, even though the two are obviously not the same thing (as one is a rational number and the other is a set of rational numbers). However, this ambiguity is tolerated, because the set of rational numbers and the set of Dedekind cuts of the form {x: x<r} have the same structure. It is through this abuse of notation that Q is regarded as a subset of R.

Dirac delta function[edit]

The Dirac delta function cannot be interpreted as a function in classical analysis. However it is often treated as one, for example when calculating convolutions. Treating the Dirac delta "function" as a function lets the user save traditional limit notation, saving its visual clutter.

Values of a random variable[edit]

In probability theory, conventional methods of indicating the probability of a value of a random variable are abuse of notation in two ways: Writing P(x) instead of P(X = x) leaves out the identity of the random variable (here X), which can be confusing out of context. However, even when writing P(X = x), there is a mismatch of types: the expression X=x is an equation and from a type theory point of view has type boolean; that is, it evaluates to either "true" T or "false" F. The domain of the P function here is not \{T,F\}, though; instead P should be logically thought of as taking two arguments: a random variable X and a subset of that random variable X's sample space \Omega. This is important: if one were to implement P in a computer algebra system one would need to give it two arguments (and not only one boolean one), just like an implementation of the summation symbol \sum_{i=min}^{max}f(i) is really a function of the form F(f,i,min,max), not F(f,truthvalue,max). So a logically more appropriate notation could be P(X, \{x\}) (the second argument here is the set of values we consider for X) or (borrowing from analysis, since the value set contains only the single element x in this case) P(\left.X\right|_{x}), but everybody writes P(X = x) or (abbreviated) P(x).

There is a good reason for such widespread so-called abuse: Notational abuse is a matter of perspective. Despite the arguably suggestive manner in which it is written, the notation P(\phi) does not (and is not meant to) mean applying some function to some value. Instead, the meaning is that P(\phi) takes the entire expression \phi as input --- not evaluated --- and expands into a particular, longer, expression in a (nominally) simpler language. Specifically, the notation can be defined by expanding to measure theory and set-builder notation as in (roughly):

P(\phi) = P_\mu(\phi) = P_{\Omega,\mu}(\phi) = \frac{\mu(\{w \mid \phi(w) \text{ and } w\in\Omega\})}{\mu(\{w \mid w\in\Omega\})}

In words: To compute the probability of a formula being true, build the set of all possible worlds in which the formula is true, measure that set, and finally divide that by the measure of the set of all possible worlds. There are, naturally, a number of other, better, ways to define the notation. That which matters here is just to recognize that the notation is no more abusive than some abbreviation ultimately resting on top of set-builder notation. (Whether we consider set-builder notation to be rigorous is another matter entirely.)

Regarding the computer science perspective: P(\phi) can be --- directly --- implemented on a computer as a macro. (The abbreviations can be supported by default parameters, fields, closures, environments, global variables, and so forth.) That implementation is awkward in applicative-order evaluation, as initially sketched, but simple in normal-order evaluation, as just sketched, directly indicates that the concept is primarily about syntax.

So regarding P(X=x), while it can be called abusive, it can also be said to exemplify proper use of notation: it is a primitive of the language of probability theory (so is "notation"), that has been shown to rigorously reduce to the language of set theory (so is "proper").

A perhaps uncontroversial example of abuse in probability theory is to take P(X) as meaning the marginal distribution of random variable X, and, at the same time, to declare that P(X=x) means a number. At face value this seems legitimate, and it could perhaps be kept that way, but for the fact that probability theorists permit any sort of expression inside the P(). So, what would P(Z) mean, where Z is a non-basic random variable (deterministically) defined by Z=(X=x)? That is, Z is true when random variable X equals our favorite value, x, and in all other cases is false.

Given that Z=(X=x) then one concludes that P(Z)=P(X=x) ought to hold. However, the left-hand side is supposed to mean a distribution, while the right hand side is supposed to mean a number. Distributions and numbers are not, of course, equal to one another, so contradiction ensues if we try to rigorously support both conventions at the same time.

The resolution is to call one convention the definition and the other the abuse. If we take P(X=x) meaning a number as the abuse, then the abuse is more specifically that we implicitly typecast a marginal distribution over a Boolean random variable down to its probability of being true. If we take P(X) meaning an entire distribution as the abuse, then the abuse is more specifically that we implicitly surround the expression with quantifiers ranging over all possible values of X (so as to form its entire marginal distribution one entry at a time).


The term "abuse of language" frequently appears in the writings of Nicolas Bourbaki:[3]

We have made a particular effort always to use rigorously correct language, without sacrificing simplicity. As far as possible we have drawn attention in the text to abuses of language, without which any mathematical text runs the risk of pedantry, not to say unreadability.

— Bourbaki (1988)

For example:

Let E be a set. A mapping f of E × E into E is called a law of composition on E. [...] By an abuse of language, a mapping of a subset of E × E into E is sometimes called a law of composition not everywhere defined on E.

— Bourbaki (1988)

In other words, it is an abuse of language to refer to partial functions from E × E to E as "functions from E × E to E that are not everywhere defined." To clarify this, it makes sense to compare the following two sentences.

  1. A partial function from A to B is a function f: A' → B, where A' is a subset of A.
  2. A function not everywhere defined from A to B is a function f: A' → B, where A' is a subset of A.

If one were to be extremely pedantic, one could say that even the term "partial function" could be called an abuse of language, because a partial function is not a function. (Whereas a continuous function is a function that is continuous.) But the use of adjectives (and adverbs) in this way is standard English practice, although it can occasionally be confusing. Some adjectives, such as "generalized", can only be used in this way (e.g., a magma is a generalized group.)

The words "not everywhere defined", however, form a relative clause. Since in mathematics relative clauses are rarely used to generalize a noun, this might be considered an abuse of language. As mentioned above, this does not imply that such a term should not be used; although in this case perhaps "function not necessarily everywhere defined" would give a better idea of what is meant, and "partial function" is clearly the best option in most contexts.

Using the term "continuous function not everywhere defined" after having defined only "continuous function" and "function not everywhere defined" is not an example of abuse of language. In fact, as there are several reasonable definitions for this term, this would be an example of woolly thinking or a cryptic writing style.


The terms "abuse of language" and "abuse of notation" depend on context. Writing "f: AB" for a partial function from A to B is almost always an abuse of notation, but not in a category theoretic context, where f can be seen as a morphism in the category of sets and partial functions.

See also[edit]


  1. ^ Stewart, James (2007). Multivariable Calculus (6th ed.). Brooks/Cole. pp. 822–823. ISBN 0-495-01163-0. 
  2. ^ Harrison, John (1996). "2.2 Criticism and reconstruction". Formalized Mathematics. Technical Reports 36. Turku Centre for Computer Science. ISBN 951-650-813-8. 
  3. ^ Bourbaki, Nicolas (1988). Algebra I: Chapters 1-3. Elements of Mathematics. Springer. 

External links[edit]