# Legendre transformation

The function $f(x)$ is defined on the interval $[a,b]$. The difference $px-f(x)$ takes a maximum at $x'$. The value of $f^*(p)$ is $px'-f(x')$.

In mathematics, the Legendre transformation or Legendre transform, named after Adrien-Marie Legendre, is an involutive transformation on the real-valued convex functions of one real variable. Its generalisation to convex functions of affine spaces is sometimes called the Legendre-Fenchel transformation. It is commonly used in thermodynamics and to derive the Hamiltonian formalism of classical mechanics out of the Lagrangian formulation.

## Definition

Specifically, let $I\subset\mathbb R$ be an interval, and $f:I\rightarrow \mathbb R$ a convex function; then its Legendre transform is the function

$f^*:I^*\rightarrow \mathbb R$

with domain of definition

$I^*=\{x^*:\sup_{x\in I}(x^*x-f(x))<\infty\}$

and action

$f^*(x^*) = \sup_{x\in I}\bigl(x^*x-f(x)\bigr),\quad x^*\in I^*$.

Here "sup" represents the supremum. $f^*$ is sometimes called the convex conjugate function of $f$.

The Legendre transformation is involutive, namely $I^*$ is an interval, $f^*$ is convex on it, so that its Legendre transform is well defined, and fulfils $f^{**}=f$ on $I\subset I^{**}$. Note that $I,I^{**}$ may differ at most on their boundaries.

For historical reasons (rooted in analytic mechanics), the conjugate variable is often denoted $p$, instead of $x^*$. If the convex function $f$ is defined on the whole line and is everywhere differentiable, then $f^*(p)=\sup_x(px-f(x))$ can be interpreted as the negative of the y-intercept of the tangent line to the graph of f that has slope p. In particular, the value of x that attains the maximum has the property that

$f^\prime(x) = p.$

That is, the derivative of the function f becomes the argument to the function f. In other words, f satisfies the functional equation

$f^*(f'(x)) = x f'(x) - f(x).$

The Legendre transformation is an application of the duality relationship between points and lines. The functional relationship specified by $x$ can be represented equally well as a set of $(x,y)$ points, or as a set of tangent lines specified by their slope and intercept values.

The generalization to convex functions $f:X\rightarrow \mathbb R$ of convex sets $X\subset\mathbb R^n$ is straightforward: $f^*:X^*\rightarrow \mathbb R$ has domain

$X^*=\{x^*:\sup_{x\in X}(\langle x^*,x\rangle-f(x)\}<\infty$

and action

$f^*(x^*) = \sup_{x\in X}\bigl(\langle x^*,x\rangle-f(x)\bigr),\quad x^*\in X^*$,

where $\langle x,y\rangle=\sum_jx_jy_j$ is the canonical inner product (scalar product) of $\mathbb R^n$.

## Examples

### Example N.1

Let $f(x)=cx^2$ defined on the whole $\mathbb R$, where $c>0$ is a fixed constant. For $x^*$ fixed, the function $x^*x-f(x)=x^*x-cx^2$ of $x$ has first derivative $x^*-2cx$ and second derivative$-2c$; there is one stationary point at $x=x^*/2c$, which is always a maximum. So $I^*=\mathbb R$ and $f^*(x^*)=c^*{x^*}^2,$ where $c^*=\frac{1}{4c}$. Clearly, $f^{**}(x)=\frac{1}{4c^*}x^2=cx^2$, namely $f^{**}=f$.

### Example N.2

Let $f(x)=x^2$ for $x\in I=[2,3]$. For $x^*$ fixed, $x^*x-f(x)$ is continuous on $I$ compact, hence it always takes a finite maximum on it; it follows that $I^*=\mathbb R$. The stationary point at $x=x^*/2$ is in the domain $[2,3]$ if and only if $4\leqslant x^*\leqslant 6$, otherwise the maximum is taken either at $x=2$ or $x=3$. We find $f^*(x^*)=\begin{cases}2x^*-4,\quad&x^*<4\\ \frac{{x^*}^2}{4},&4\leqslant x^*\leqslant 6,\\3x^*-9,&x^*>6\end{cases}$

### Example N.3

The function $f(x)=cx$ is convex, for every $x$ (strict convexity is not required for the Legendre transformation to be well defined). Clearly $x^*x-f(x)=(x^*-c)x$ is never upper-bounded as a function of $x$, unless $x^*-c=0$. Hence $f^*$ is defined on $I^*=\{c\}$ and $f^*(c)=0$. We may check involutivity: of course $xx^*-f^*(x^*)$ is always bounded as a "function" of $x^*\in\{c\}$, hence $I^{**}=\mathbb R$; then for every $x$ we have $\sup_{x^*\in\{c\}}(xx^*-f^*(x^*)=xc$ and thus $f^{**}(x)=cx=f(x)$.

### Example N.4 (many variables)

Let $f(x)=\langle x,Ax\rangle+c$ be defined on $X=\mathbb R^n$, where $A$ is a real, positive definite matrix. Then $f$ is convex. $\langle p,x\rangle-f(x)=\langle p,x \rangle-\langle x,Ax\rangle-c$ has gradient $p-2Ax$ and Hessian $-2A$ which is negative; hence the stationary point $x=A^{-1}p/2$ is a maximum. We have $X^*=\mathbb R^n$ and $f^*(p)=\frac14\langle p,A^{-1}p\rangle-c$

## An equivalent definition in the differentiable case

Equivalently two convex functions $f\,$ and $g$ defined on the whole line are said to be Legendre transforms of each other if their first derivatives are inverse functions of each other:

$Df = \left( Dg \right)^{-1},$

in which case one writes equivalently $f^*=g$ and $g^*=f$ We can see this by first taking the derivative of $f^\star$:

${df^\star(p) \over dp} = {d \over dp}(xp-f(x)) = x + p {dx \over dp} - {df \over dx} {dx \over dp} = x.$

Then this equation taken together with the previous equation resulting from the maximization condition results in the following pair of reciprocal equations:

$p = {df \over dx}(x),$
$x = {df^\star \over dp}(p).$

From these we see that $Df$ and $Df^\star$ are inverses, as promised. They are unique up to an additive constant which is fixed by the additional requirement that

$f(x) + f^\star(p) = x\,p.$

Although in some cases (e.g. thermodynamic potentials) a non-standard requirement is used:

$f(x) - f^\star(p) = x\,p.$

The standard constraint will be considered in this article unless otherwise noted. The Legendre transformation is its own inverse, and is related to integration by parts.

## Behaviour of differentials under Legendre transforms

Let $f$ be a function of two independent variables $x$ and $y$ with the differential $df = {\partial f \over \partial x}dx + {\partial f \over \partial y}dy = udx + vdy$. Assume that it is convex in $x$ for every $y$, then one may perform the Legendre transform in $x$. Let $u$ be the variable conjugate to $x$. If we want to change the differentials $dx$ and $dy$ to $du$ and $dy$ (i.e. we want to build another function with its differential expressed in terms of $du$ and $dy$), we simply consider the function $g(u, y) = f - ux$ and calculate:

$dg = df - udx - xdu = udx + vdy - udx - xdu = -xdu + vdy$
$x = -{\partial g \over \partial u}$
$v = {\partial g \over \partial y}$

The function $g(u, y)$ is the result of Legendre transformation of $f(x,y)$ in which only the independent variable $x$ has been replaced by $u$. This is widely used in thermodynamics.

## Applications

### Hamilton-Lagrange mechanics

A Legendre transform is used in classical mechanics to derive the Hamiltonian formulation from the Lagrangian formulation, and conversely. A typical Lagrangian has the form $L(v,q)=\frac{1}2\langle v,Mv\rangle-V(q)$, where $(v,q)$ are coordinates on $\mathbb R^n\times\mathbb R^n$, $M$ is a positive real matrix, and $\langle x,y\rangle=\sum_jx_jy_j$. For every $q$ fixed, $L(v,q)$ is a convex function of $v$, while $-V(q)$ plays the role of constant. Hence the Legendre transform of $L(v,q)$ as a function of $v$ is the Hamiltonian function $H(p,q)=\frac 12\langle p,M^{-1}p\rangle+V(q)$.

In a more general setting $(v,q)$ are local coordinates on the tangent bundle $T\mathcal M$ of a manifold $\mathcal M$. For each $q$, $L(v,q)$ is a convex function of the tangent space $V_q$. The Legendre transform gives the Hamiltonian $H(p,q)$ as a function of the coordinates $(p,q)$ of the cotangent bundle $T^*\mathcal M$; the inner product used to define the Legendre transform is inherited from the canonical symplectic structure.

### Thermodynamics

The strategy behind the use of Legendre transforms is to shift, from a function with one of its parameters an independent variable, to a new function with its dependence on a new variable (the partial derivative of the original function with respect to the independent variable). The new function is the difference between the original function and the product of the old and new variables. For example, while the internal energy is an explicit function of the extensive variables entropy, volume (and chemical composition)

$U = U(S,V,\{N_i\})\,$

with the differential

$dU = TdS - PdV + \sum \mu _i dN _i$

the Helmholtz free energy is obtained in the following way using the Legendre transform:

$A = U - TS$
$dA = -SdT - PdV + \sum \mu _i dN _i$

It is seen that the independent variable $S$ (entropy) has been replaced with its thermodynamic conjugate $T$ (temperature).

Likewise, the enthalpy, the (non standard) Legendre transform of U with respect to V:

$H = U + PV \, = H(S,P,\{N_i\})\,$
$P=\, -\left( \frac{\partial U}{\partial V}\right)_S\,$

becomes a function of the entropy and the intensive quantity, $P$ (pressure), as natural variables, and is useful when the (external) pressure is constant. The free energies (Helmholtz and Gibbs), are obtained through further Legendre transforms, by subtracting TS (from U and H respectively), shift dependence from the entropy S to its conjugate intensive variable temperature T, and are useful when it is constant.

### An example – variable capacitor

As another example from physics, consider a parallel-plate capacitor in which the plates can move relative to one another. Such a capacitor would allow us to transfer the electric energy which is stored in the capacitor into external mechanical work done by the forces acting on the plates. You can think of the electric charge as analogous to the "charge" of a gas in a cylinder, and the resulting mechanical force being exerted on a piston.

Suppose we wanted to compute the force on the plates as a function of x, the distance which separates them. To find the force we will compute the potential energy and then use the definition of force as the gradient of the potential energy function.

The energy stored in a capacitor of capacitance C(x) and charge Q is

$U (Q, \mathbf{x} ) = \begin{matrix} \frac{1}{2} \end{matrix} QV = \begin{matrix} \frac{1}{2} \end{matrix} \frac{Q^2}{C(\mathbf{x})},$

where we have abstracted away the dependence on the area of the plates, the dielectric constant of the material between the plates, and the separation x as the capacitance C(x).

The force F between the plates due to the electric field is

$\mathbf{F}(\mathbf{x}) = -\frac{dU}{d\mathbf{x}},$

If the capacitor is not connected to any circuit, then the charges on the plates remain constant as they move, the force is the negative gradient of the electrostatic energy

$\mathbf{F}(\mathbf{x}) = \begin{matrix} \frac{1}{2} \end{matrix} \frac{dC}{d\mathbf{x}} \frac{Q^2}{C^2}.$

However, suppose the voltage between the plates V is maintained constant by connection to a battery, which is a reservoir for charge at constant potential difference. To find the force we first compute the non-standard Legendre transform

$U^* = U - QV = \begin{matrix} \frac{1}{2} \end{matrix}QV - QV = -\begin{matrix} \frac{1}{2} \end{matrix} QV\,.$

The force now becomes the negative gradient of the Legendre transform

$\mathbf{F}(\mathbf{x}) = -\frac{dU^*}{d\mathbf{x}}.$

The two functions happen to be negatives only because of the linearity of the capacitance except now Q is no longer a constant.

### Probability theory

In large deviations theory, the rate function is defined as the Legendre transformation of the logarithm of the moment generating function of a random variable. An important application of the rate function is in the calculation of tail probabilities of sums of i.i.d. random variables.

## Geometric interpretation

For a strictly convex function the Legendre transformation can be interpreted as a mapping between the graph of the function and the family of tangents of the graph. (For a function of one variable, the tangents are well-defined at all but at most countably many points since a convex function is differentiable at all but at most countably many points.)

The equation of a line with slope m and y-intercept b is given by

$y = mx + b.\,$

For this line to be tangent to the graph of a function f at the point (x0, f(x0)) requires

$f\left(x_0\right) = m x_0 + b$

and

$m = \dot{f}\left(x_0\right)$

f' is strictly monotone as the derivative of a strictly convex function, and the second equation can be solved for x0, allowing to eliminate x0 from the first giving the y-intercept b of the tangent as a function of its slope m:

$b = f\left(\dot{f}^{-1}\left(m\right)\right) - m \cdot \dot{f}^{-1}\left(m\right) = -f^\star(m).$

Here f* denotes the Legendre transform of f.

The family of tangents of the graph of f parameterized by m is therefore given by

$y = mx - f^\star(m)$

or, written implicitly, by the solutions of the equation

$F(x,y,m) = y + f^\star(m) - mx = 0.$

The graph of the original function can be reconstructed from this family of lines as the envelope of this family by demanding

${\partial F(x,y,m)\over\partial m} = \dot{f}^\star(m) - x = 0.$

Eliminating m from these two equations gives

$y = x \cdot \dot{f}^{\star-1}(x) - f^\star\left(\dot{f}^{\star-1}(x)\right).$

Identifying y with f(x) and recognizing the right side of the preceding equation as the Legendre transform of f* we find

$f(x) = f^{\star\star}(x).$

## Legendre transformation in more than one dimension

For a differentiable real-valued function on an open subset U of Rn the Legendre conjugate of the pair (U, f) is defined to be the pair (V, g), where V is the image of U under the gradient mapping Df, and g is the function on V given by the formula

$g(y) = \left\langle y, x \right\rangle - f\left(x\right), \, x = \left(Df\right)^{-1}(y)$

where

$\left\langle u,v\right\rangle = \sum_{k=1}^{n}u_{k} \cdot v_{k}$

is the scalar product on Rn. The multidimensional transform can be interpreted as an encoding of the convex hull of the function's epigraph in terms of its supporting hyperplanes.[1]

Alternatively, if X is a real vector space and Y is its dual vector space, then for each point x of X and y of Y, there is a natural identification of the cotangent spaces T*Xx with Y and T*Yy with X. If f is a real differentiable function over X, then ∇f is a section of the cotangent bundle T*X and as such, we can construct a map from X to Y. Similarly, if g is a real differentiable function over Y, ∇g defines a map from Y to X. If both maps happen to be inverses of each other, we say we have a Legendre transform.

When the function is not differentiable, the Legendre transform can still be extended, and is known as the Legendre-Fenchel transformation. In this more general setting, a few properties are lost: for example, the Legendre transform is no longer its own inverse (unless there are extra assumptions, like convexity).

## Further properties

In the following the Legendre transform of a function f is denoted as f*.

### Scaling properties

The Legendre transformation has the following scaling properties: For a>0,

$f(x) = a \cdot g(x) \Rightarrow f^\star(p) = a \cdot g^\star\left(\frac{p}{a}\right)$
$f(x) = g(a \cdot x) \Rightarrow f^\star(p) = g^\star\left(\frac{p}{a}\right).$

It follows that if a function is homogeneous of degree r then its image under the Legendre transformation is a homogeneous function of degree s, where 1/r + 1/s = 1. Thus, the only monomial whose degree is invariant under Legendre transform is the quadratic.

### Behavior under translation

$f(x) = g(x) + b \Rightarrow f^\star(p) = g^\star(p) - b$
$f(x) = g(x + y) \Rightarrow f^\star(p) = g^\star(p) - p \cdot y$

### Behavior under inversion

$f(x) = g^{-1}(x) \Rightarrow f^\star(p) = - p \cdot g^\star\left(\frac{1}{p}\right)$

### Behavior under linear transformations

Let A be a linear transformation from Rn to Rm. For any convex function f on Rn, one has

$\left(A f\right)^\star = f^\star A^\star$

where A* is the adjoint operator of A defined by

$\left \langle Ax, y^\star \right \rangle = \left \langle x, A^\star y^\star \right \rangle.$

A closed convex function f is symmetric with respect to a given set G of orthogonal linear transformations,

$f\left(A x\right) = f(x), \; \forall x, \; \forall A \in G$

if and only if f* is symmetric with respect to G.

### Infimal convolution

The infimal convolution of two functions f and g is defined as

$\left(f \star_\inf g\right)(x) = \inf \left \{ f(x-y) + g(y) \, | \, y \in \mathbb{R}^n \right \}.$

Let f1, …, fm be proper convex functions on Rn. Then

$\left( f_1 \star_\inf \cdots \star_\inf f_m \right)^\star = f_1^\star + \cdots + f_m^\star.$