# Abuse of notation

In mathematics, abuse of notation occurs when an author uses a mathematical notation in a way that is not formally correct but that seems likely to simplify the exposition or suggest the correct intuition (while being unlikely to introduce errors or cause confusion). Abuse of notation should be contrasted with misuse of notation, which should be avoided.

A related concept is abuse of language or abuse of terminology, when not notation but a term is misused. Abuse of language is an almost synonymous expression that is usually used for non-notational abuses. For example, while the word representation properly designates a group homomorphism from a group G to GL(V) where V is a vector space, it is common to call V "a representation of G". A common abuse of language consists in identifying two mathematical objects that are different but canonically isomorphic. For example, identifying a constant function and its value or identifying to $\mathbb R^3$ the Euclidean space of dimension three equipped with a Cartesian coordinate system.

The latter uses may achieve clarity in the new area in an unexpected way, but it may borrow arguments from the old area that do not carry over, creating a false analogy.

## Examples

Common examples occur when speaking of compound mathematical objects. For example, a topological space consists of a set $X$ (called the underlying set of the topological space) and a topology $\mathcal{T}$, and two topological spaces $(X, \mathcal{T})$ and $(X, \mathcal{T}')$, even with the same underlying set $X$, can be quite different if they have different topologies. Nevertheless, it is common to refer to such a space simply as $X$ when there is no danger of confusion—that is, when it is implicitly clear what topology is being considered. Similarly, one often refers to a group $(G, \star)$ as simply $G$ when the group operation is clear from context.

### Equivalence classes

A very common form of abuse of notation is that often used when a set is partitioned into disjoint subsets (equivalence classes) by an equivalence relation. Formally, if a set X is partitioned by an equivalence relation ~, then for a given x∈X, the (equivalence) class {y∈X|y~x} would be denoted [x]. But in practice, if the remainder of the discussion is focused on equivalence classes rather than individual elements of the underlying set, it is common to drop the square brackets in the discussion. For example, in modular arithmetic, a finite group of size n can be formed by partitioning the integers via the equivalence relation x~y iff x=y(mod n). The elements of that group would then formally be listed as {[0], [1], ..., [n-1]}, but in practice they are usually just called 0, 1, ..., n-1.

Another type of case is the space of (classes of) measurable functions over a measure space, or classes of Lebesgue integrable functions, where the equivalence relation is equality "almost everywhere".

### Derivative

In standard analysis, some algebraic manipulations on the Leibniz notation for the derivative $\frac{dy}{dx}$ are an abuse of notation. It is often convenient to treat the expression $\frac{dy}{dx}$ as a fraction. For example, it leads to the correct formula for differentiation of composed functions (the chain rule): $\frac{dy}{dx}=\frac{dy}{du}\frac{du}{dx}$. Another example is the separation of variables in solving differential equations, in which one can rewrite the equation $\frac{dy}{dx}={g(x) \over h(y)}$ as $h(y) dy = {g(x)dx}$ and then integrate.

A related abuse of notation occurs when integrals like $\int {1 \over x}\,dx$ are written as $\int {dx \over x}$, as if $dx$ were a term multiplied into the integral's $1 \over x$ argument.

### Del operator

The del operator, denoted by $\nabla$, is a tuple of partial derivative operators posing as a vector. This suggests notations such as $\nabla f$ for gradient, $\nabla \cdot \vec v$ for divergence and $\nabla \times \vec v$ for curl. The notation is extremely convenient because $\nabla$ does behave like a vector most of the time. But it can be regarded as an abuse because $\nabla$ doesn't commute with vectors, and so doesn't satisfy all properties of vectors.

(A contrary view is that notation is not abused if one does not think of $\nabla$ as a vector. The vector-like notations are simply specially defined uses of the dot and cross.)

### Cross product

The determinant of a 3×3 matrix may be computed by "expanding along the first row" as follows:

$\det \begin{bmatrix} a_1 & a_2 & a_3 \\ b_1 & b_2 & b_3 \\ c_1 & c_2 & c_3 \end{bmatrix} = a_1 \det \begin{bmatrix} b_2 & b_3 \\ c_2 & c_3\end{bmatrix}- a_2 \det \begin{bmatrix} b_1 & b_3 \\ c_1 & c_3\end{bmatrix}+ a_3 \det \begin{bmatrix} b_1 & b_2 \\ c_1 & c_2\end{bmatrix}$

The cross product of the vectors (a1, a2, a3) and (b1, b2, b3) is given similarly by

$\det \begin{bmatrix} a_2 & a_3 \\ b_2 & b_3\end{bmatrix}\mathbf{i} - \det \begin{bmatrix} a_1 & a_3 \\ b_1 & b_3\end{bmatrix}\mathbf{j}+ \det \begin{bmatrix} a_1 & a_2 \\ b_1 & b_2 \end{bmatrix} \mathbf{k}$

Thus the cross product may be computed by writing the "symbolic determinant"

$\mathbf{a}\times\mathbf{b}=\det \begin{bmatrix} \mathbf{i} & \mathbf{j} & \mathbf{k} \\ a_1 & a_2 & a_3 \\ b_1 & b_2 & b_3 \\ \end{bmatrix}$

and expanding along the first row by rote, ignoring the fact that the matrix is not truly a matrix over the real or complex numbers (or whatever field the matrix entries belong to), and that thus the resulting computation does not compute an ordinary determinant. This is technically an abuse of notation, but is useful as a mnemonic to remember the formula for cross product and is very helpful in calculations.[1]

### Function application over set

John Harrison (1996)[2] cites "the use of f(x) to represent both application of a function f to an argument x, and the image under f of a subset, x, of f's domain". (Note that the last x is usually written differently, e.g. as X, in order to distinguish an element x of the domain from a subset X.)

### Exponentiation as repetition

Exponentiation is repeated multiplication, and multiplication is frequently denoted by juxtaposition of operands, with no operator at all. The suggested superscript notation for other associative operations denoted by juxtaposition follows:

• String repetition: "ab3c" = "abbbc".

### Cartesian product as associative

The cartesian product is often seen as associative, with:

$(E \times F) \times G = E \times (F \times G) = E \times F \times G$

This of course cannot be rigorously true: if $x \in E$, $y \in F$ and $z \in G$, the identity $((x, y), z) = (x, (y, z))$ would imply that $(x, y) = x$ and $z = (y, z)$, and $((x, y), z) = (x, y, z)$ would mean nothing.

This notion can be made rigorous in category theory, using the idea of a natural isomorphism.

### Trigonometric functions

In some countries it is common to denote the square of the value of $\sin(x)$ as $\sin^2(x)$, and the inverse function as $\sin^{-1}(x)$. In his article on notation in the Edinburgh Encyclopedia Charles Babbage complains at length of this abuse of notation and suggests two alternatives for the notation $f^n(x)$

• Function composition, i.e. $f^2(x) = f(f(x))$ and $f^{-1}(x)$ is the inverse.
• Powers of the value, i.e. $f^2(x) = (f(x))^2$ and $f^{-1}(x)=\frac{1}{f(x)}$ is the reciprocal.

Babbage argues strongly for the former, and also that the square of the value should be notated as $\sin x^2\$, but beware: Babbage intends $(\sin x)^2\$ even though what he wrote is easily confused with $\sin(x^2)\$ (the only non-confusing way to avoid this abuse of notation is to always include the parentheses).

To press his example further, Babbage investigates what the function $\sin^2(x) = \sin(\sin x)$ is like, and also $\sin^{\frac{1}{2}}(x),$ which is the function which, when composed with itself, equals $\sin(x)$, the functional square root.

### Big O notation

With Big O notation, we say that some term $f(x)$ "is" $O(g(x))$ (given some function g). Example: "Runtime of the algorithm is $O(n^2)$" or in symbols "$T(n) = O(n^2)$". Intuitively this notation groups functions according to their growth respective to some parameter; as such, the notation is abusive in two respects: It abuses "=", and it invokes terms that are real numbers instead of function terms. It would be appropriate to use the set membership notation and thus write $f(n)\in O(g(n))$ instead of $f(n)=O(g(n))$. This would allow common set operations like $O(n\cdot\log n) \subset O(n^2)$, $O(2^n) \bigcup O(n^2)$, and it would clarify that the relation is not symmetric in contrast to what the "=" symbol suggests. Some argue for "=", because as an extension of the abuse, it could be useful to overload relation symbols such as < and ≤, such that, for example, $f < O(g)$ means that f's real growth is less than $g$. But this further abuse is not necessary if, following Knuth one distinguishes between O and the closely related o and Θ notations. Concerning the use of terms for scalar numbers instead of functions, one encounters the following troubles.

1. You cannot cleanly define what $f(n)\in O(g(n))$ may mean, due to the fact the O notation is about growth of functions, but to the left hand and the right hand side of the relation, there are scalar values, and you cannot decide whether the relation holds if you look at particular function values.
2. The abused O notation is bound to one variable, and the identity of that variable can be ambiguous: for instance, in $O(n^m)$ one of the variables might be a parameter which is not in scope of the $O$.

That is, you might mean $O(2^m)$, since $n$ was the parameter that you assigned 2, or you might mean $O(n^3)$, since $m$ was the parameter substituted by 3 here.

Even $O(c)$ might be the same as $O(1)$, since $c$ might be a parameter, not the concerned function variable.

The carelessness regarding the use of function terms might be caused by the rarely used functional notations, like Lambda notation. You would have to write $(n\mapsto n\cdot\log n) \in O(n\mapsto n^2)$ and $O(n\mapsto n\cdot\log n) \subset O(n\mapsto n^2)$. The correct O notation can be easily extended to complexity functions which map tuples to complexities; this lets you formulate a statement like "the graph algorithm needs time proportional to the logarithm of the number of edges and to the number of vertices" by $T_{\mbox{graph}}\in O((v,e)\mapsto v\cdot\log e)$, which is not possible with the abused notation.

You can also state theorems like $O(f)$ is a convex cone, and use that for formal reasoning.

### Equality vs. isomorphism

Another common abuse of notation is to blur the distinction between equality and isomorphism. For instance, in the construction of the real numbers from Dedekind cuts of rational numbers, the rational number $r$ is identified with the set of all rational numbers less than $r$, even though the two are obviously not the same thing (as one is a rational number and the other is a set of rational numbers). However, this ambiguity is tolerated, because the set of rational numbers and the set of Dedekind cuts of the form {x: x<r} have the same structure. It is through this abuse of notation that Q is regarded as a subset of R.

### Dirac delta function

The Dirac delta function can not be interpreted as a function in classical analysis. However it is often treated as one, for example when calculating convolutions. Treating the Dirac delta "function" as a function lets the user save traditional limit notation, saving its visual clutter.

### Values of a random variable

In probability theory, conventional methods of indicating the probability of a value of a random variable are abuse of notation in two ways: Writing $P(x)$ instead of $P(X = x)$ leaves out the identity of the random variable (here $X$), which can be confusing out of context. However, even when writing $P(X = x)$, there is a mismatch of types: the expression $X=x$ is an equation and from a type theory point of view has type boolean; that is, it evaluates to either "true" or "false". The domain of the $P$ function here is not $\{T,F\}$, though; instead $P$ should be logically thought of as taking two arguments: a random variable $X$ and a subset of that random variable $X$'s sample space $\Omega$. This is important: if one were to implement $P$ in a computer algebra system one would need to give it two arguments (and not only one boolean one), just like an implementation of the summation symbol $\sum_{i=min}^{max}f(i)$ is really a function of the form $F(f,i,min,max)$, not $F(f,truthvalue,max)$. So a logically more appropriate notation could be $P(X, \{x\})$ (the second argument here is the set of values we consider for $X$) or (borrowing from analysis, since the value set contains only the single element $x$ in this case) $P(\left.X\right|_{x})$, but everybody writes $P(X = x)$ or (abbreviated) $P(x)$.

There is a good reason for such widespread so-called abuse: Notational abuse is a matter of perspective. Despite the arguably suggestive manner in which it is written, the notation $P(\phi)$ does not (and is not meant to) mean applying some function to some value. Instead, the meaning is that $P(\phi)$ takes the entire expression $\phi$ as input --- not evaluated --- and expands into a particular, longer, expression in a (nominally) simpler language. Specifically, the notation can be defined by expanding to measure theory and set-builder notation as in (roughly):

$P(\phi) = P_\mu(\phi) = P_{\Omega,\mu}(\phi) = \frac{\mu(\{w \mid \phi(w) \text{ and } w\in\Omega\})}{\mu(\{w \mid w\in\Omega\})}$

In words: To compute the probability of a formula being true, build the set of all possible worlds in which the formula is true, measure that set, and finally divide that by the measure of the set of all possible worlds. There are, naturally, a number of other, better, ways to define the notation. That which matters here is just to recognize that the notation is no more abusive than some abbreviation ultimately resting on top of set-builder notation. (Whether we consider set-builder notation to be rigorous is another matter entirely.)

Regarding the computer science perspective: $P(\phi)$ can be --- directly --- implemented on a computer as a macro. (The abbreviations can be supported by default parameters, fields, closures, environments, global variables, and so forth.) That implementation is awkward in applicative-order evaluation, as initially sketched, but simple in normal-order evaluation, as just sketched, directly indicates that the concept is primarily about syntax.

So regarding $P(X=x)$, while it can be called abusive, it can also be said to exemplify proper use of notation: it is a primitive of the language of probability theory (so is "notation"), that has been shown to rigorously reduce to the language of set theory (so is "proper").

A perhaps uncontroversial example of abuse in probability theory is to take $P(X)$ as meaning the marginal distribution of random variable $X$, and, at the same time, to declare that $P(X=x)$ means a number. At face value this seems legitimate, and it could perhaps be kept that way, but for the fact that probability theorists permit any sort of expression inside the $P()$. So, what would $P(Z)$ mean, where $Z$ is a non-basic random variable (deterministically) defined by $Z=(X=x)$? That is, $Z$ is true when random variable $X$ equals our favorite value, $x$, and in all other cases is false.

Given that $Z=(X=x)$ then one concludes that $P(Z)=P(X=x)$ ought to hold. However, the left-hand side is supposed to mean a distribution, while the right hand side is supposed to mean a number. Distributions and numbers are not, of course, equal to one another, so contradiction ensues if we try to rigorously support both conventions at the same time.

The resolution is to call one convention the definition and the other the abuse. If we take $P(X=x)$ meaning a number as the abuse, then the abuse is more specifically that we implicitly typecast a marginal distribution over a Boolean random variable down to its probability of being true. If we take $P(X)$ meaning an entire distribution as the abuse, then the abuse is more specifically that we implicitly surround the expression with quantifiers ranging over all possible values of $X$ (so as to form its entire marginal distribution one entry at a time).

## Bourbaki

The term "abuse of language" frequently appears in the writings of Nicolas Bourbaki:[3]

We have made a particular effort always to use rigorously correct language, without sacrificing simplicity. As far as possible we have drawn attention in the text to abuses of language, without which any mathematical text runs the risk of pedantry, not to say unreadability. Bourbaki (1988).

For example:

Let E be a set. A mapping f of E × E into E is called a law of composition on E. [...] By an abuse of language, a mapping of a subset of E × E into E is sometimes called a law of composition not everywhere defined on E. Bourbaki (1988).

In other words, it is an abuse of language to refer to partial functions from E × E to E as "functions from E × E to E that are not everywhere defined." To clarify this, it makes sense to compare the following two sentences.

1. A partial function from A to B is a function f: A' → B, where A' is a subset of A.
2. A function not everywhere defined from A to B is a function f: A' → B, where A' is a subset of A.

If one were to be extremely pedantic, one could say that even the term "partial function" could be called an abuse of language, because a partial function is not a function. (Whereas a continuous function is a function that is continuous.) But the use of adjectives (and adverbs) in this way is standard English practice, although it can occasionally be confusing. Some adjectives, such as "generalized", can only be used in this way. (e.g., a magma is a generalized group.)

The words "not everywhere defined", however, form a relative clause. Since in mathematics relative clauses are rarely used to generalize a noun, this might be considered an abuse of language. As mentioned above, this does not imply that such a term should not be used; although in this case perhaps "function not necessarily everywhere defined" would give a better idea of what is meant, and "partial function" is clearly the best option in most contexts.

Using the term "continuous function not everywhere defined" after having defined only "continuous function" and "function not everywhere defined" is not an example of abuse of language. In fact, as there are several reasonable definitions for this term, this would be an example of woolly thinking or a cryptic writing style.

## Subjectivity

The terms "abuse of language" and "abuse of notation" depend on context. Writing "f: AB" for a partial function from A to B is almost always an abuse of notation, but not in a category theoretic context, where f can be seen as a morphism in the category of partial functions.