Chain rule

In calculus, the chain rule is a formula to compute the derivative of a composite function. That is, if $f$ and $g$ are differentiable functions, then the chain rule expresses the derivative of their composite $f \circ g$ — the function which maps $x$ to $f(g(x))$ — in terms of the derivatives of $f$ and $g$ and the product of functions as follows:

(f\circ g)'=(f'\circ g)\cdot g'.

Alternatively, by letting $h = f \circ g$ (equiv., $h (x) = f (g (x))$ for all $x$ ), one can also write the chain rule in Lagrange's notation, as follows:

h'(x)=f'(g(x))g'(x).

The chain rule may also be rewritten in Leibniz's notation in the following way. If a variable $z$ depends on the variable $y$ , which itself depends on the variable $x$ (i.e., $y$ and $z$ are dependent variables), then $z$ , via the intermediate variable of $y$ , depends on $x$ as well. In which case, the chain rule states that:

{\frac {dz}{dx}}={\frac {dz}{dy}}\cdot {\frac {dy}{dx}}.

More precisely, to indicate the point each derivative is evaluated at, $\left.{\frac {dz}{dx}}\right|_{x}=\left.{\frac {dz}{dy}}\right|_{y(x)}\cdot \left.{\frac {dy}{dx}}\right|_{x}$ .

The versions of the chain rule in the Lagrange and the Leibniz notation are equivalent, in the sense that if $z=f(y)\!$ and $y=g(x)\!$ , so that $z=f(g(x))=(f\circ g)(x)$ , then

\left.{\frac {dz}{dx}}\right|_{x}=(f\circ g)'(x)

and

\left.{\frac {dz}{dy}}\right|_{y(x)}\cdot \left.{\frac {dy}{dx}}\right|_{x}=f'(y(x))g'(x)=f'(g(x))g'(x).

^[1]

Intuitively, the chain rule states that knowing the instantaneous rate of change of z relative to y and that of y relative to x allows one to calculate the instantaneous rate of change of z relative to x. As put by George F. Simmons: "if a car travels twice as fast as a bicycle and the bicycle is four times as fast as a walking man, then the car travels 2 × 4 = 8 times as fast as the man."^[2]

In integration, the counterpart to the chain rule is the substitution rule.

History

The chain rule seems to have first been used by Gottfried Wilhelm Leibniz. He used it to calculate the derivative of ${\sqrt {a+bz+cz^{2}}}$ as the composite of the square root function and the function $a+bz+cz^{2}\!$ . He first mentioned it in a 1676 memoir (with a sign error in the calculation). The common notation of chain rule is due to Leibniz.^[3] Guillaume de l'Hôpital used the chain rule implicitly in his Analyse des infiniment petits. The chain rule does not appear in any of Leonhard Euler's analysis books, even though they were written over a hundred years after Leibniz's discovery.

One dimension

First example

Suppose that a skydiver jumps from an aircraft. Assume that $t$ seconds after his jump, his height above sea level in meters is given by $g (t) = 4000 - 4.9 t 2$ . One model for the atmospheric pressure at a height $h$ is $f (h) = 101325 e -0.0001 h$ . These two equations can be differentiated and combined in various ways to produce the following data:

$g'(t) = -9.8 t$ is the velocity of the skydiver at time $t$ .
$f'(h) = -10.1325 e -0.0001 h$ is the rate of change in atmospheric pressure with respect to height at the height $h$ and is proportional to the buoyant force on the skydiver at $h$ meters above sea level. (The true buoyant force depends on the volume of the skydiver.)
$(f \circ g)(t)$ is the atmospheric pressure the skydiver experiences $t$ seconds after his jump.
$(f \circ g)'(t)$ is the rate of change in atmospheric pressure with respect to time at $t$ seconds after the skydiver's jump, and is proportional to the buoyant force on the skydiver at $t$ seconds after his jump.

Here, the chain rule gives a method for computing $(f \circ g)'(t)$ in terms of $f'$ and $g'$ . While it is always possible to directly apply the definition of the derivative to compute the derivative of a composite function, this is usually very difficult. The utility of the chain rule is that it turns a complicated derivative into several easy derivatives.

The chain rule states that, under appropriate conditions,

(f\circ g)'(t)=f'(g(t))\cdot g'(t).

In this example, this equals

(f\circ g)'(t)={\big (}{\mathord {-}}10.1325e^{-0.0001(4000-4.9t^{2})}{\big )}\cdot {\big (}{\mathord {-}}9.8t{\big )}.

In the statement of the chain rule, $f$ and $g$ play slightly different roles because $f'$ is evaluated at $g(t)\!$ , whereas $g'$ is evaluated at $t$ . This is necessary to make the units work out correctly.

For example, suppose that we want to compute the rate of change in atmospheric pressure ten seconds after the skydiver jumps. This is $(f \circ g)'(10)$ and has units of pascals per second. The factor $g'(10)$ in the chain rule is the velocity of the skydiver ten seconds after his jump, and it is expressed in meters per second. $f'(g(10))\!$ is the change in pressure with respect to height at the height $g (10)$ and is expressed in pascals per meter. The product of $f'(g(10))\!$ and $g'(10)\!$ therefore has the correct units of pascals per second.

Here, notice that it is not possible to evaluate $f$ anywhere else. For instance, the 10 in the problem represents ten seconds, while the expression $f'(10)\!$ would represent the change in pressure at a height of ten meters, which is not what we wanted. Similarly, while $g'(10) = -98$ has a unit of meters per second, the expression $f'(g'(10))$ would represent the change in pressure at a height of −98 meters, which is again not what we wanted. However, $g (10)$ is 3020 meters above sea level, the height of the skydiver ten seconds after his jump, and this has the correct units for an input to $f$ .

Statement

The simplest form of the chain rule is for real-valued functions of one real variable. It states that if $g$ is a function that is differentiable at a point $c$ (i.e. the derivative $g'(c)$ exists) and $f$ is a function that is differentiable at $g (c)$ , then the composite function $f \circ g$ is differentiable at $c$ , and the derivative is^[4]

(f\circ g)'(c)=f'(g(c))\cdot g'(c).

The rule is sometimes abbreviated as

(f\circ g)'=(f'\circ g)\cdot g'.

If $y = f (u)$ and $u = g (x)$ , then this abbreviated form is written in Leibniz notation as:

{\frac {dy}{dx}}={\frac {dy}{du}}\cdot {\frac {du}{dx}}.

^[1]

The points where the derivatives are evaluated may also be stated explicitly:

\left.{\frac {dy}{dx}}\right|_{x=c}=\left.{\frac {dy}{du}}\right|_{u=g(c)}\cdot \left.{\frac {du}{dx}}\right|_{x=c}.

Carrying the same reasoning further, given $n$ functions $f_{1},\ldots ,f_{n}\!$ with the composite function $f_{1}\circ (f_{2}\circ \cdots (f_{n-1}\circ f_{n}))\!$ , if each function $f_{i}\!$ is differentiable at its immediate input, then the composite function is also differentiable by the repeated application of Chain Rule, where the derivative is (in Leibniz's notation):

{\frac {df_{1}}{dx}}={\frac {df_{1}}{df_{2}}}{\frac {df_{2}}{df_{3}}}\cdots {\frac {df_{n}}{dx}}.

^[5]

Further examples

Absence of formulas

It may be possible to apply the chain rule even when there are no formulas for the functions which are being differentiated. This can happen when the derivatives are measured directly. Suppose that a car is driving up a tall mountain. The car's speedometer measures its speed directly. If the grade is known, then the rate of ascent can be calculated using trigonometry. Suppose that the car is ascending at 2.5 km/h. Standard models for the Earth's atmosphere imply that the temperature drops about 6.5 °C per kilometer ascended (called the lapse rate). To find the temperature drop per hour, we can apply the chain rule. Let the function $g (t)$ be the altitude of the car at time $t$ , and let the function $f (h)$ be the temperature $h$ kilometers above sea level. $f$ and $g$ are not known exactly: For example, the altitude where the car starts is not known and the temperature on the mountain is not known. However, their derivatives are known: $f'$ is −6.5 °C/km, and $g'$ is 2.5 km/h. The chain rule states that the derivative of the composite function is the product of the derivative of $f$ and the derivative of $g$ . This is $-6.5 °C/km \cdot 2.5 km/h = -16.25 °C/h$ .

One of the reasons why this computation is possible is because $f'$ is a constant function. A more accurate description of how the temperature near the car varies over time would require an accurate model of how the temperature varies at different altitudes. This model may not have a constant derivative. To compute the temperature change in such a model, it would be necessary to know $g$ and not just $g'$ , because without knowing $g$ it is not possible to know where to evaluate $f'$ .

Composites of more than two functions

The chain rule can be applied to composites of more than two functions. To take the derivative of a composite of more than two functions, notice that the composite of $f$ , $g$ , and $h$ (in that order) is the composite of $f$ with $g \circ h$ . The chain rule states that to compute the derivative of $f \circ g \circ h$ , it is sufficient to compute the derivative of $f$ and the derivative of $g \circ h$ . The derivative of $f$ can be calculated directly, and the derivative of $g \circ h$ can be calculated by applying the chain rule again.

For concreteness, consider the function

y=e^{\sin(x^{2})}.

This can be decomposed as the composite of three functions:

{\begin{aligned}y&=f(u)=e^{u},\\[6pt]u&=g(v)=\sin v=\sin(x^{2}),\\[6pt]v&=h(x)=x^{2}.\end{aligned}}

Their derivatives are:

{\begin{aligned}{\frac {dy}{du}}&=f'(u)=e^{u}=e^{\sin(x^{2})},\\[6pt]{\frac {du}{dv}}&=g'(v)=\cos v=\cos(x^{2}),\\[6pt]{\frac {dv}{dx}}&=h'(x)=2x.\end{aligned}}

The chain rule states that the derivative of their composite at the point $x = a$ is:

{\begin{aligned}(f\circ g\circ h)'(a)&=f'((g\circ h)(a))\cdot (g\circ h)'(a)\\[10pt]&=f'((g\circ h)(a))\cdot g'(h(a))\cdot h'(a)=(f'\circ g\circ h)(a)\cdot (g'\circ h)(a)\cdot h'(a).\end{aligned}}

In Leibniz notation, this is:

{\frac {dy}{dx}}=\left.{\frac {dy}{du}}\right|_{u=g(h(a))}\cdot \left.{\frac {du}{dv}}\right|_{v=h(a)}\cdot \left.{\frac {dv}{dx}}\right|_{x=a},

or for short,

{\frac {dy}{dx}}={\frac {dy}{du}}\cdot {\frac {du}{dv}}\cdot {\frac {dv}{dx}}.

The derivative function is therefore:

{\frac {dy}{dx}}=e^{\sin(x^{2})}\cdot \cos(x^{2})\cdot 2x.

Another way of computing this derivative is to view the composite function $f \circ g \circ h$ as the composite of $f \circ g$ and h. Applying the chain rule in this manner would yield:

(f\circ g\circ h)'(a)=(f\circ g)'(h(a))\cdot h'(a)=f'(g(h(a)))\cdot g'(h(a))\cdot h'(a).

This is the same as what was computed above. This should be expected because $(f \circ g) \circ h = f \circ (g \circ h)$ .

Sometimes, it is necessary to differentiate an arbitrarily long composition of the form $f_{1}\circ f_{2}\circ \cdots \circ f_{n-1}\circ f_{n}\!$ . In this case, define

f_{a\,.\,.\,b}=f_{a}\circ f_{a+1}\circ \cdots \circ f_{b-1}\circ f_{b}

where $f_{a\,.\,.\,a}=f_{a}$ and $f_{a\,.\,.\,b}(x)=x$ when $b<a$ . Then the chain rule takes the form

Df_{1\,.\,.\,n}=(Df_{1}\circ f_{2\,.\,.\,n})(Df_{2}\circ f_{3\,.\,.\,n})\cdots (Df_{n-1}\circ f_{n\,.\,.\,n})Df_{n}=\prod _{k=1}^{n}\left[Df_{k}\circ f_{(k+1)\,.\,.\,n}\right]

or, in the Lagrange notation,

f_{1\,.\,.\,n}'(x)=f_{1}'\left(f_{2\,.\,.\,n}(x)\right)\;f_{2}'\left(f_{3\,.\,.\,n}(x)\right)\cdots f_{n-1}'\left(f_{n\,.\,.\,n}(x)\right)\;f_{n}'(x)=\prod _{k=1}^{n}f_{k}'\left(f_{(k+1\,.\,.\,n)}(x)\right)

Quotient rule

The chain rule can be used to derive some well-known differentiation rules. For example, the quotient rule is a consequence of the chain rule and the product rule. To see this, write the function $f (x)/ g (x)$ as the product $f (x) \cdot 1/ g (x)$ . First apply the product rule:

{\begin{aligned}{\frac {d}{dx}}\left({\frac {f(x)}{g(x)}}\right)&={\frac {d}{dx}}\left(f(x)\cdot {\frac {1}{g(x)}}\right)\\&=f'(x)\cdot {\frac {1}{g(x)}}+f(x)\cdot {\frac {d}{dx}}\left({\frac {1}{g(x)}}\right).\end{aligned}}

To compute the derivative of $1/ g (x)$ , notice that it is the composite of $g$ with the reciprocal function, that is, the function that sends $x$ to $1/ x$ . The derivative of the reciprocal function is $-1/x^{2}\!$ . By applying the chain rule, the last expression becomes:

f'(x)\cdot {\frac {1}{g(x)}}+f(x)\cdot \left(-{\frac {1}{g(x)^{2}}}\cdot g'(x)\right)={\frac {f'(x)g(x)-f(x)g'(x)}{g(x)^{2}}},

which is the usual formula for the quotient rule.

Derivatives of inverse functions

Suppose that $y = g (x)$ has an inverse function. Call its inverse function $f$ so that we have $x = f (y)$ . There is a formula for the derivative of $f$ in terms of the derivative of $g$ . To see this, note that $f$ and $g$ satisfy the formula

f(g(x))=x.

And because the functions $f(g(x))\!$ and $x$ are equal, their derivatives must be equal. The derivative of $x$ is the constant function with value 1, and the derivative of $f(g(x))\!$ is determined by the chain rule. Therefore, we have that:

f'(g(x))g'(x)=1.

To express $f'$ as a function of an independent variable $y$ , we substitute $f(y)\!$ for $x$ wherever it appears. Then we can solve for $f'$ .

{\begin{aligned}f'(g(f(y)))g'(f(y))&=1\\[5pt]f'(y)g'(f(y))&=1\\[5pt]f'(y)={\frac {1}{g'(f(y))}}.\end{aligned}}

For example, consider the function $g (x) = e x$ . It has an inverse $f (y) = ln y$ . Because $g'(x) = e x$ , the above formula says that

{\frac {d}{dy}}\ln y={\frac {1}{e^{\ln y}}}={\frac {1}{y}}.

This formula is true whenever $g$ is differentiable and its inverse $f$ is also differentiable. This formula can fail when one of these conditions is not true. For example, consider $g (x) = x 3$ . Its inverse is $f (y) = y 1/3$ , which is not differentiable at zero. If we attempt to use the above formula to compute the derivative of $f$ at zero, then we must evaluate $1/ g'(f (0))$ . Since $f (0) = 0$ and $g'(0) = 0$ , we must evaluate 1/0, which is undefined. Therefore, the formula fails in this case. This is not surprising because $f$ is not differentiable at zero.

Higher derivatives

Faà di Bruno's formula generalizes the chain rule to higher derivatives. Assuming that $y = f (u)$ and $u = g (x)$ , then the first few derivatives are:

{\begin{aligned}{\frac {dy}{dx}}&={\frac {dy}{du}}{\frac {du}{dx}}\\[4pt]{\frac {d^{2}y}{dx^{2}}}&={\frac {d^{2}y}{du^{2}}}\left({\frac {du}{dx}}\right)^{2}+{\frac {dy}{du}}{\frac {d^{2}u}{dx^{2}}}\\[4pt]{\frac {d^{3}y}{dx^{3}}}&={\frac {d^{3}y}{du^{3}}}\left({\frac {du}{dx}}\right)^{3}+3\,{\frac {d^{2}y}{du^{2}}}{\frac {du}{dx}}{\frac {d^{2}u}{dx^{2}}}+{\frac {dy}{du}}{\frac {d^{3}u}{dx^{3}}}\\[4pt]{\frac {d^{4}y}{dx^{4}}}&={\frac {d^{4}y}{du^{4}}}\left({\frac {du}{dx}}\right)^{4}+6\,{\frac {d^{3}y}{du^{3}}}\left({\frac {du}{dx}}\right)^{2}{\frac {d^{2}u}{dx^{2}}}+{\frac {d^{2}y}{du^{2}}}\left(4\,{\frac {du}{dx}}{\frac {d^{3}u}{dx^{3}}}+3\,\left({\frac {d^{2}u}{dx^{2}}}\right)^{2}\right)+{\frac {dy}{du}}{\frac {d^{4}u}{dx^{4}}}.\end{aligned}}

Proofs

First proof

One proof of the chain rule begins with the definition of the derivative:

(f\circ g)'(a)=\lim _{x\to a}{\frac {f(g(x))-f(g(a))}{x-a}}.

Assume for the moment that $g(x)\!$ does not equal $g(a)\!$ for any $x$ near $a$ . Then the previous expression is equal to the product of two factors:

\lim _{x\to a}{\frac {f(g(x))-f(g(a))}{g(x)-g(a)}}\cdot {\frac {g(x)-g(a)}{x-a}}.

If $g$ oscillates near $a$ , then it might happen that no matter how close one gets to $a$ , there is always an even closer $x$ such that $g(x)\!$ equals $g(a)\!$ . For example, this happens for $g (x) = x 2 sin(1 / x)$ near the point $a = 0$ . Whenever this happens, the above expression is undefined because it involves division by zero. To work around this, introduce a function $Q\!$ as follows:

Q(y)={\begin{cases}{\frac {f(y)-f(g(a))}{y-g(a)}},&y\neq g(a),\\f'(g(a)),&y=g(a).\end{cases}}

We will show that the difference quotient for $f \circ g$ is always equal to:

Q(g(x))\cdot {\frac {g(x)-g(a)}{x-a}}.

Whenever $g (x)$ is not equal to $g (a)$ , this is clear because the factors of $g (x) - g (a)$ cancel. When $g (x)$ equals $g (a)$ , then the difference quotient for $f \circ g$ is zero because $f (g (x))$ equals $f (g (a))$ , and the above product is zero because it equals $f'(g (a))$ times zero. So the above product is always equal to the difference quotient, and to show that the derivative of $f \circ g$ at $a$ exists and to determine its value, we need only show that the limit as $x$ goes to $a$ of the above product exists and determine its value.

To do this, recall that the limit of a product exists if the limits of its factors exist. When this happens, the limit of the product of these two factors will equal the product of the limits of the factors. The two factors are $Q (g (x))$ and $(g (x) - g (a)) / (x - a)$ . The latter is the difference quotient for $g$ at $a$ , and because $g$ is differentiable at $a$ by assumption, its limit as $x$ tends to $a$ exists and equals $g'(a)$ .

As for $Q (g (x))$ , notice that $Q$ is defined wherever $f$ is. Furthermore, $f$ is differentiable at $g (a)$ by assumption, so $Q$ is continuous at $g (a)$ , by definition of the derivative. The function $g$ is continuous at $a$ because it is differentiable at $a$ , and therefore $Q \circ g$ is continuous at $a$ . So its limit as $x$ goes to $a$ exists and equals $Q (g (a))$ , which is $f'(g (a))$ .

This shows that the limits of both factors exist and that they equal $f'(g (a))$ and $g'(a)$ , respectively. Therefore, the derivative of $f \circ g$ at a exists and equals $f'(g (a))$ $g'(a)$ .^[5]

Second proof

Another way of proving the chain rule is to measure the error in the linear approximation determined by the derivative. This proof has the advantage that it generalizes to several variables. It relies on the following equivalent definition of differentiability at a point: A function g is differentiable at a if there exists a real number g′(a) and a function ε(h) that tends to zero as h tends to zero, and furthermore

g(a+h)-g(a)=g'(a)h+\varepsilon (h)h.

Here the left-hand side represents the true difference between the value of g at a and at $a + h$ , whereas the right-hand side represents the approximation determined by the derivative plus an error term.

In the situation of the chain rule, such a function ε exists because g is assumed to be differentiable at a. Again by assumption, a similar function also exists for f at g(a). Calling this function η, we have

f(g(a)+k)-f(g(a))=f'(g(a))k+\eta (k)k.

The above definition imposes no constraints on η(0), even though it is assumed that η(k) tends to zero as k tends to zero. If we set $η (0) = 0$ , then η is continuous at 0.

Proving the theorem requires studying the difference $f (g (a + h)) - f (g (a))$ as h tends to zero. The first step is to substitute for $g (a + h)$ using the definition of differentiability of g at a:

f(g(a+h))-f(g(a))=f(g(a)+g'(a)h+\varepsilon (h)h)-f(g(a)).

The next step is to use the definition of differentiability of f at g(a). This requires a term of the form $f (g (a) + k)$ for some k. In the above equation, the correct k varies with h. Set $k h = g'(a) h + ε (h) h$ and the right hand side becomes $f (g (a) + k h) - f (g (a))$ . Applying the definition of the derivative gives:

f(g(a)+k_{h})-f(g(a))=f'(g(a))k_{h}+\eta (k_{h})k_{h}.

To study the behavior of this expression as h tends to zero, expand k_h. After regrouping the terms, the right-hand side becomes:

f'(g(a))g'(a)h+[f'(g(a))\varepsilon (h)+\eta (k_{h})g'(a)+\eta (k_{h})\varepsilon (h)]h.

Because ε(h) and η(k_h) tend to zero as h tends to zero, the first two bracketed terms tend to zero as h tends to zero. Applying the same theorem on products of limits as in the first proof, the third bracketed term also tends zero. Because the above expression is equal to the difference $f (g (a + h)) - f (g (a))$ , by the definition of the derivative $f \circ g$ is differentiable at a and its derivative is $f'(g (a)) g'(a).$

The role of Q in the first proof is played by η in this proof. They are related by the equation:

Q(y)=f'(g(a))+\eta (y-g(a)).

The need to define Q at g(a) is analogous to the need to define η at zero.

Third proof

Constantin Carathéodory's alternative definition of the differentiability of a function can be used to give an elegant proof of the chain rule.^[6]

Under this definition, a function $f$ is differentiable at a point $a$ if and only if there is a function $q$ , continuous at $a$ and such that $f (x) - f (a) = q (x)(x - a)$ . There is at most one such function, and if $f$ is differentiable at $a$ then $f'(a) = q (a)$ .

Given the assumptions of the chain rule and the fact that differentiable functions and compositions of continuous functions are continuous, we have that there exist functions $q$ , continuous at $g (a)$ and $r$ , continuous at $a$ and such that,

f(g(x))-f(g(a))=q(g(x))(g(x)-g(a))

and

g(x)-g(a)=r(x)(x-a).

Therefore,

f(g(x))-f(g(a))=q(g(x))r(x)(x-a),

but the function given by $h (x) = q (g (x)) r (x)$ is continuous at $a$ , and we get, for this $a$

(f(g(a)))'=q(g(a))r(a)=f'(g(a))g'(a).

A similar approach works for continuously differentiable (vector-)functions of many variables. This method of factoring also allows a unified approach to stronger forms of differentiability, when the derivative is required to be Lipschitz continuous, Hölder continuous, etc. Differentiation itself can be viewed as the polynomial remainder theorem (the little Bézout theorem, or factor theorem), generalized to an appropriate class of functions. ^{[citation needed]}

Proof via infinitesimals

If $y=f(x)$ and $x=g(t)$ then choosing infinitesimal $\Delta t\not =0$ we compute the corresponding $\Delta x=g(t+\Delta t)-g(t)$ and then the corresponding $\Delta y=f(x+\Delta x)-f(x)$ , so that

{\frac {\Delta y}{\Delta t}}={\frac {\Delta y}{\Delta x}}{\frac {\Delta x}{\Delta t}}

and applying the standard part we obtain

{\frac {dy}{dt}}={\frac {dy}{dx}}{\frac {dx}{dt}}

which is the chain rule.

Multivariable case

The generalization of the chain rule to multi-variable functions is rather technical. However, it is simpler to write in the case of functions of the form

f(g_{1}(x),\dots ,g_{k}(x)).

As this case occurs often in the study of functions of a single variable, it is worth describing it separately.

Case of $f (g 1 (x), ... , g k (x))$

For writing the chain rule for a function of the form

f (g 1 (x), ... , g k (x))

,

one needs the partial derivatives of $f$ with respect to its $k$ arguments. The usual notations for partial derivatives involve names for the arguments of the function. As these arguments are not named in the above formula, it is simpler and clearer to denote by

D_{i}f

the derivative of $f$ with respect to its $i$ th argument, and by

D_{i}f(z)

the value of this derivative at $z$ .

With this notation, the chain rule is

{\frac {d}{dx}}f(g_{1}(x),\dots ,g_{k}(x))=\sum _{i=1}^{k}\left({\frac {d}{dx}}{g_{i}}(x)\right)D_{i}f(g_{1}(x),\dots ,g_{k}(x)).

Example: arithmetic operations

If the function $f$ is addition, that is, if

f(u,v)=u+v,

then $D_{1}f={\frac {\partial f}{\partial u}}=1$ and $D_{2}f={\frac {\partial f}{\partial v}}=1$ . Thus, the chain rule gives

{\frac {d}{dx}}(g(x)+h(x))=\left({\frac {d}{dx}}g(x)\right)D_{1}f+\left({\frac {d}{dx}}h(x)\right)D_{2}f={\frac {d}{dx}}g(x)+{\frac {d}{dx}}h(x).

For multiplication

f(u,v)=uv,

the partials are $D_{1}f=v$ and $D_{2}f=u.$ Thus,

{\frac {d}{dx}}(g(x)h(x))=h(x){\frac {d}{dx}}g(x)+g(x){\frac {d}{dx}}h(x).

The case of exponentiation

f(u,v)=u^{v}

is slightly more complicated, as

D_{1}f=vu^{v-1},

and, as $u^{v}=e^{v\ln u},$

D_{2}f=u^{v}\ln u.

It follows that

{\frac {d}{dx}}(g(x)^{h(x)})=h(x)g(x)^{h(x)-1}{\frac {d}{dx}}g(x)+g(x)^{h(x)}\ln g(x){\frac {d}{dx}}h(x).

General rule

The simplest way for writing the chain rule in the general case is to use the total derivative, which is a linear transformation that captures all directional derivatives in a single formula. Consider differentiable functions $f : R m \to R k$ and $g : R n \to R m$ , and a point $a$ in $R n$ . Let $D a g$ denote the total derivative of $g$ at $a$ and $D g (a) f$ denote the total derivative of $f$ at $g (a)$ . These two derivatives are linear transformations $R n \to R m$ and $R m \to R k$ , respectively, so they can be composed. The chain rule for total derivatives is that their composite is the total derivative of $f \circ g$ at $a$ :

D_{\mathbf {a} }(f\circ g)=D_{g(\mathbf {a} )}f\circ D_{\mathbf {a} }g,

or for short,

D(f\circ g)=Df\circ Dg.

The higher-dimensional chain rule can be proved using a technique similar to the second proof given above.^[7]

Because the total derivative is a linear transformation, the functions appearing in the formula can be rewritten as matrices. The matrix corresponding to a total derivative is called a Jacobian matrix, and the composite of two derivatives corresponds to the product of their Jacobian matrices. From this perspective the chain rule therefore says:

J_{f\circ g}(\mathbf {a} )=J_{f}(g(\mathbf {a} ))J_{g}(\mathbf {a} ),

or for short,

J_{f\circ g}=(J_{f}\circ g)J_{g}.

That is, the Jacobian of a composite function is the product of the Jacobians of the composed functions (evaluated at the appropriate points).

The higher-dimensional chain rule is a generalization of the one-dimensional chain rule. If k, m, and n are 1, so that $f : R \to R$ and $g : R \to R$ , then the Jacobian matrices of f and g are $1 \times 1$ . Specifically, they are:

{\begin{aligned}J_{g}(a)&={\begin{pmatrix}g'(a)\end{pmatrix}},\\J_{f}(g(a))&={\begin{pmatrix}f'(g(a))\end{pmatrix}}.\end{aligned}}

The Jacobian of f ∘ g is the product of these $1 \times 1$ matrices, so it is $f'(g (a))\cdot g'(a)$ , as expected from the one-dimensional chain rule. In the language of linear transformations, D_a(g) is the function which scales a vector by a factor of g′(a) and D_g(a)(f) is the function which scales a vector by a factor of f′(g(a)). The chain rule says that the composite of these two linear transformations is the linear transformation $D a (f \circ g)$ , and therefore it is the function that scales a vector by f′(g(a))⋅g′(a).

Another way of writing the chain rule is used when f and g are expressed in terms of their components as $y = f (u) = (f 1 (u), \dots, f k (u))$ and $u = g (x) = (g 1 (x), \dots, g m (x))$ . In this case, the above rule for Jacobian matrices is usually written as:

{\frac {\partial (y_{1},\ldots ,y_{k})}{\partial (x_{1},\ldots ,x_{n})}}={\frac {\partial (y_{1},\ldots ,y_{k})}{\partial (u_{1},\ldots ,u_{m})}}{\frac {\partial (u_{1},\ldots ,u_{m})}{\partial (x_{1},\ldots ,x_{n})}}.

The chain rule for total derivatives implies a chain rule for partial derivatives. Recall that when the total derivative exists, the partial derivative in the ith coordinate direction is found by multiplying the Jacobian matrix by the ith basis vector. By doing this to the formula above, we find:

{\frac {\partial (y_{1},\ldots ,y_{k})}{\partial x_{i}}}={\frac {\partial (y_{1},\ldots ,y_{k})}{\partial (u_{1},\ldots ,u_{m})}}{\frac {\partial (u_{1},\ldots ,u_{m})}{\partial x_{i}}}.

Since the entries of the Jacobian matrix are partial derivatives, we may simplify the above formula to get:

{\frac {\partial (y_{1},\ldots ,y_{k})}{\partial x_{i}}}=\sum _{\ell =1}^{m}{\frac {\partial (y_{1},\ldots ,y_{k})}{\partial u_{\ell }}}{\frac {\partial u_{\ell }}{\partial x_{i}}}.

More conceptually, this rule expresses the fact that a change in the x_i direction may change all of g₁ through g_m, and any of these changes may affect f.

In the special case where $k = 1$ , so that f is a real-valued function, then this formula simplifies even further:

{\frac {\partial y}{\partial x_{i}}}=\sum _{\ell =1}^{m}{\frac {\partial y}{\partial u_{\ell }}}{\frac {\partial u_{\ell }}{\partial x_{i}}}.

This can be rewritten as a dot product. Recalling that $u = (g 1, \dots, g m)$ , the partial derivative $\partial u / \partial x i$ is also a vector, and the chain rule says that:

{\frac {\partial y}{\partial x_{i}}}=\nabla y\cdot {\frac {\partial \mathbf {u} }{\partial x_{i}}}.

Example

Given $u (x, y) = x 2 + 2 y$ where $x (r, t) = r sin(t)$ and $y (r, t) = sin 2 (t)$ , determine the value of $\partial u / \partial r$ and $\partial u / \partial t$ using the chain rule.

{\frac {\partial u}{\partial r}}={\frac {\partial u}{\partial x}}{\frac {\partial x}{\partial r}}+{\frac {\partial u}{\partial y}}{\frac {\partial y}{\partial r}}=(2x)(\sin(t))+(2)(0)=2r\sin ^{2}(t),

and

{\begin{aligned}{\frac {\partial u}{\partial t}}&={\frac {\partial u}{\partial x}}{\frac {\partial x}{\partial t}}+{\frac {\partial u}{\partial y}}{\frac {\partial y}{\partial t}}\\&=(2x)(r\cos(t))+(2)(2\sin(t)\cos(t))\\&=(2r\sin(t))(r\cos(t))+4\sin(t)\cos(t)\\&=2(r^{2}+2)\sin(t)\cos(t)\\&=(r^{2}+2)\sin(2t).\end{aligned}}

Higher derivatives of multivariable functions

Faà di Bruno's formula for higher-order derivatives of single-variable functions generalizes to the multivariable case. If $y = f (u)$ is a function of $u = g (x)$ as above, then the second derivative of $f \circ g$ is:

{\frac {\partial ^{2}y}{\partial x_{i}\partial x_{j}}}=\sum _{k}\left({\frac {\partial y}{\partial u_{k}}}{\frac {\partial ^{2}u_{k}}{\partial x_{i}\partial x_{j}}}\right)+\sum _{k,\ell }\left({\frac {\partial ^{2}y}{\partial u_{k}\partial u_{\ell }}}{\frac {\partial u_{k}}{\partial x_{i}}}{\frac {\partial u_{\ell }}{\partial x_{j}}}\right).

Further generalizations

All extensions of calculus have a chain rule. In most of these, the formula remains the same, though the meaning of that formula may be vastly different.

One generalization is to manifolds. In this situation, the chain rule represents the fact that the derivative of $f \circ g$ is the composite of the derivative of f and the derivative of g. This theorem is an immediate consequence of the higher dimensional chain rule given above, and it has exactly the same formula.

The chain rule is also valid for Fréchet derivatives in Banach spaces. The same formula holds as before.^[8] This case and the previous one admit a simultaneous generalization to Banach manifolds.

In differential algebra, the derivative is interpreted as a morphism of modules of Kähler differentials. A ring homomorphism of commutative rings $f : R \to S$ determines a morphism of Kähler differentials $Df : Ω R \to Ω S$ which sends an element dr to d(f(r)), the exterior differential of f(r). The formula $D (f \circ g) = Df \circ Dg$ holds in this context as well.

The common feature of these examples is that they are expressions of the idea that the derivative is part of a functor. A functor is an operation on spaces and functions between them. It associates to each space a new space and to each function between two spaces a new function between the corresponding new spaces. In each of the above cases, the functor sends each space to its tangent bundle and it sends each function to its derivative. For example, in the manifold case, the derivative sends a C^r-manifold to a C^r−1-manifold (its tangent bundle) and a C^r-function to its total derivative. There is one requirement for this to be a functor, namely that the derivative of a composite must be the composite of the derivatives. This is exactly the formula $D (f \circ g) = Df \circ Dg$ .

There are also chain rules in stochastic calculus. One of these, Itō's lemma, expresses the composite of an Itō process (or more generally a semimartingale) dX_t with a twice-differentiable function f. In Itō's lemma, the derivative of the composite function depends not only on dX_t and the derivative of f but also on the second derivative of f. The dependence on the second derivative is a consequence of the non-zero quadratic variation of the stochastic process, which broadly speaking means that the process can move up and down in a very rough way. This variant of the chain rule is not an example of a functor because the two functions being composed are of different types.

References

^ ^a ^b "Chain Rule in Leibniz Notation". oregonstate.edu. Retrieved 2019-07-28.
^ George F. Simmons, Calculus with Analytic Geometry (1985), p. 93.
^ Rodríguez, Omar Hernández; López Fernández, Jorge M. (2010). "A Semiotic Reflection on the Didactics of the Chain Rule". The Mathematics Enthusiast. 7 (2): 321–332. Retrieved 2019-08-04.
^ Apostol, Tom (1974). Mathematical analysis (2nd ed.). Addison Wesley. Theorem 5.5.
^ ^a ^b "Chain Rule for Derivative". Math Vault. 2016-06-05. Retrieved 2019-07-28.
^ Kuhn, Stephen (1991). "The Derivative á la Carathéodory". The American Mathematical Monthly. 98 (1): 40–44. JSTOR 2324035.
^ Spivak, Michael (1965). Calculus on Manifolds. Boston: Addison-Wesley. pp. 19–20. ISBN 0-8053-9021-9.
^ Cheney, Ward (2001). "The Chain Rule and Mean Value Theorems". Analysis for Applied Mathematics. New York: Springer. pp. 121–125. ISBN 0-387-95279-9.

External links

"Leibniz rule", Encyclopedia of Mathematics, EMS Press, 2001 [1994]
Weisstein, Eric W. "Chain Rule". MathWorld.

[:0-1] "Chain Rule in Leibniz Notation". oregonstate.edu. Retrieved 2019-07-28.

[2] George F. Simmons, Calculus with Analytic Geometry (1985), p. 93.

[3] Rodríguez, Omar Hernández; López Fernández, Jorge M. (2010). "A Semiotic Reflection on the Didactics of the Chain Rule". The Mathematics Enthusiast. 7 (2): 321–332. Retrieved 2019-08-04.

[4] Apostol, Tom (1974). Mathematical analysis (2nd ed.). Addison Wesley. Theorem 5.5.

[:1-5] "Chain Rule for Derivative". Math Vault. 2016-06-05. Retrieved 2019-07-28.

[6] Kuhn, Stephen (1991). "The Derivative á la Carathéodory". The American Mathematical Monthly. 98 (1): 40–44. JSTOR 2324035.

[spivak_manifolds-7] Spivak, Michael (1965). Calculus on Manifolds. Boston: Addison-Wesley. pp. 19–20. ISBN 0-8053-9021-9.

[8] Cheney, Ward (2001). "The Chain Rule and Mean Value Theorems". Analysis for Applied Mathematics. New York: Springer. pp. 121–125. ISBN 0-387-95279-9.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

v t e Calculus
Precalculus	Binomial theorem Concave function Continuous function Factorial Finite difference Free variables and bound variables Graph of a function Linear function Radian Rolle's theorem Secant Slope Tangent
Limits	Indeterminate form Limit of a function One-sided limit Limit of a sequence Order of approximation (ε, δ)-definition of limit
Differential calculus	Derivative Second derivative Partial derivative Differential Differential operator Mean value theorem Notation Leibniz's notation Newton's notation Rules of differentiation linearity Power Sum Chain L'Hôpital's Product General Leibniz's rule Quotient Other techniques Implicit differentiation Inverse functions and differentiation Logarithmic derivative Related rates Stationary points First derivative test Second derivative test Extreme value theorem Maximum and minimum Further applications Newton's method Taylor's theorem Differential equation Ordinary differential equation Partial differential equation Stochastic differential equation
Integral calculus	Antiderivative Arc length Riemann integral Basic properties Constant of integration Fundamental theorem of calculus Differentiating under the integral sign Integration by parts Integration by substitution trigonometric Euler Tangent half-angle substitution Partial fractions in integration Quadratic integral Trapezoidal rule Volumes Washer method Shell method Integral equation Integro-differential equation
Vector calculus	Derivatives Curl Directional derivative Divergence Gradient Laplacian Basic theorems Line integrals Green's Stokes' Gauss'
Multivariable calculus	Divergence theorem Geometric Hessian matrix Jacobian matrix and determinant Lagrange multiplier Line integral Matrix Multiple integral Partial derivative Surface integral Volume integral Advanced topics Differential forms Exterior derivative Generalized Stokes' theorem Tensor calculus
Sequences and series	Arithmetico-geometric sequence Types of series Alternating Binomial Fourier Geometric Harmonic Infinite Power Maclaurin Taylor Telescoping Tests of convergence Abel's Alternating series Cauchy condensation Direct comparison Dirichlet's Integral Limit comparison Ratio Root Term
Special functions and numbers	Natural logarithm Exponential function Stirling's approximation Bernoulli numbers e (mathematical constant)
History of calculus	Adequality Brook Taylor Colin Maclaurin Generality of algebra Gottfried Wilhelm Leibniz Infinitesimal Infinitesimal calculus Isaac Newton Fluxion Law of Continuity Leonhard Euler Method of Fluxions The Method of Mechanical Theorems
Lists	List of limits List of derivatives List of integrals
Miscellaneous topics	Complex calculus Contour integral Differential geometry Manifold Curvature of curves of surfaces Tensor Euler–Maclaurin formula Gabriel's horn Integration Bee Proof that 22/7 exceeds π Regiomontanus' angle maximization problem Steinmetz solid

History

One dimension

First example

Statement

Further examples

Absence of formulas

Composites of more than two functions

Quotient rule

Derivatives of inverse functions

Higher derivatives

Proofs

First proof

Second proof

Third proof

Proof via infinitesimals

Multivariable case

Case of f(g1(x), ... , gk(x))

Example: arithmetic operations

General rule

Example

Higher derivatives of multivariable functions

Further generalizations

See also

References

External links

Case of $f (g 1 (x), ... , g k (x))$