Fermat's theorem (stationary points)

For other theorems named after Pierre de Fermat, see Fermat's theorem.

In mathematics, Fermat's theorem (not to be confused with Fermat's last theorem) is a method to find local maxima and minima of differentiable functions on open sets by showing that every local extremum of the function is a stationary point (the function derivative is zero in that point). Fermat's theorem is a theorem in real analysis, named after Pierre de Fermat.

By using Fermat's theorem, the potential extrema of a function $\displaystyle f$, with derivative $\displaystyle f'$, are found by solving an equation in $\displaystyle f'$. Fermat's theorem gives only a necessary condition for extreme function values, and some stationary points are inflection points (not a maximum or minimum). The function's second derivative, if it exists, can determine if any stationary point is a maximum, minimum, or inflection point.

Fermat's theorem

Let $f\colon (a,b) \rightarrow \mathbb{R}$ be a function and suppose that $\displaystyle x_0 \in (a,b)$ is a local extremum of $\displaystyle f$. If $\displaystyle f$ is differentiable at $\displaystyle x_0$ then $\displaystyle f'(x_0) = 0$.

Another way to understand the theorem is via the contrapositive statement:

• If $\displaystyle f$ is differentiable at $\displaystyle x_0 \in (a,b)$, and
• $\displaystyle f'(x_0) \neq 0$,
• then $x_0$ is not a local extremum of f.

Exactly the same statement is true in higher dimensions, with the proof requiring only slight generalization.

Application to optimization

As a corollary, global extrema of a function f on a domain A occur only at boundaries, non-differentiable points, and stationary points. If $x_0$ is a global extremum of f, then one of the following is true:

• boundary: $x_0$ is in the boundary of A
• non-differentiable: f is not differentiable at $x_0$
• stationary point: $x_0$ is a stationary point of f

Intuition

Intuitively , a differentiable function is approximated by its derivative – a differentiable function behaves infinitesimally like a linear function $a+bx,$ or more precisely, $f(x_0) + f'(x_0)\cdot (x-x_0).$ Thus, from the perspective that "if f is differentiable and has non-vanishing derivative at $x_0,$ then it does not attain an extremum at $x_0,$" the intuition is that if the derivative at $x_0$ is positive, the function is increasing near $x_0,$ while if the derivative is negative, the function is decreasing near $x_0.$ In both cases, it cannot attain a maximum or minimum, because its value is changing. It can only attain a maximum or minimum if it "stops" – if the derivative vanishes (or if it is not differentiable, or if one runs into the boundary and cannot continue). However, making "behaves like a linear function" precise requires careful analytic proof.

More precisely, the intuition can be stated as: if the derivative is positive, there is some point to the right of $x_0$ where f is greater, and some point to the left of $x_0$ where f is less, and thus f attains neither a maximum nor a minimum at $x_0.$ Conversely, if the derivative is negative, there is a point to the right which is lesser, and a point to the left which is greater. Stated this way, the proof is just translating this into equations and verifying "how much greater or less".

The intuition is based on the behavior of polynomial functions. Assume that function f has a maximum at x0, the reasoning being similar for a function minimum. If $\displaystyle x_0 \in (a,b)$ is a local maximum then, roughly, there is a (possibly small) neighborhood of $\displaystyle x_0$ such as the function "is increasing before" and "decreasing after"[note 1] $\displaystyle x_0$. As the derivative is positive for an increasing function and negative for a decreasing function, $\displaystyle f'$ is positive before and negative after $\displaystyle x_0$. $\displaystyle f'$ doesn't skip values (by Darboux's theorem), so it has to be zero at some point between the positive and negative values. The only point in the neighbourhood where it is possible to have $\displaystyle f'(x) = 0$ is $\displaystyle x_0$.

The theorem (and its proof below) is more general than the intuition in that it doesn't require the function to be differentiable over a neighbourhood around $\displaystyle x_0$. It is sufficient for the function to be differentiable only in the extreme point.

Proof

Proof 1: Non-vanishing derivatives implies not extremum

Suppose that f is differentiable at $x_0 \in (a,b),$ with derivative K, and assume without loss of generality that $K > 0,$ so the tangent line at $x_0$ has positive slope (is increasing). Then there is a neighborhood of $x_0$ on which the secant lines through $x_0$ all have positive slope, and thus to the right of $x_0,$ f is greater, and to the left of $x_0,$ f is lesser.

The schematic of the proof is:

• an infinitesimal statement about derivative (tangent line) at $x_0$ implies
• a local statement about difference quotients (secant lines) near $x_0,$ which implies
• a local statement about the value of f near $x_0.$

Formally, by the definition of derivative, $f'(x_0) = K$ means that

$\lim_{\epsilon \to 0} \frac{f(x_0+\epsilon)-f(x_0)}{\epsilon} = K.$

In particular, for sufficiently small $\epsilon$ (less than some $\epsilon_0$), the fraction must be at least $K/2,$ by the definition of limit. Thus on the interval $(x_0-\epsilon_0,x_0+\epsilon_0)$ one has:

$\frac{f(x_0+\epsilon)-f(x_0)}{\epsilon} > K/2;$

one has replaced the equality in the limit (an infinitesimal statement) with an inequality on a neighborhood (a local statement). Thus, rearranging the equation, if $\epsilon > 0,$ then:

$f(x_0+\epsilon) > f(x_0) + (K/2)\epsilon > f(x_0),\,$

so on the interval to the right, f is greater than $f(x_0),$ and if $\epsilon < 0,$ then:

$f(x_0+\epsilon) < f(x_0) + (K/2)\epsilon < f(x_0),\,$

so on the interval to the left, f is less than $f(x_0).$

Thus $x_0$ is not a local or global maximum or minimum of f.

Proof 2: Extremum implies derivative vanishes

Alternatively, one can start by assuming that $\displaystyle x_0$ is a local maximum, and then prove that the derivative is 0.

Suppose that $\displaystyle x_0$ is a local maximum (a similar proof applies if $\displaystyle x_0$ is a local minimum). Then there $\exists \, \delta > 0$ such that $(x_0 - \delta,x_0 + \delta) \subset (a,b)$ and such that we have $f(x_0) \ge f(x)\, \forall x$ with $\displaystyle |x - x_0| < \delta$. Hence for any $h \in (0,\delta)$ we notice that it holds

$\frac{f(x_0+h) - f(x_0)}{h} \le 0.$

Since the limit of this ratio as $\displaystyle h$ gets close to 0 from above exists and is equal to $\displaystyle f'(x_0)$ we conclude that $f'(x_0) \le 0$. On the other hand for $h \in (-\delta,0)$ we notice that

$\frac{f(x_0+h) - f(x_0)}{h} \ge 0$

but again the limit as $\displaystyle h$ gets close to 0 from below exists and is equal to $\displaystyle f'(x_0)$ so we also have $f'(x_0) \ge 0$.

Hence we conclude that $\displaystyle f'(x_0) = 0.$

Higher dimensions

Exactly the same statement holds; however, the proof is slightly more complicated. The complication is that in 1 dimension, one can either move left or right from a point, while in higher dimensions, one can move in many directions. Thus, if the derivative does not vanish, one must argue that there is some direction in which the function increases – and thus in the opposite direction the function decreases. This is the only change to the proof or the analysis.

Applications

Fermat's theorem is central to the calculus method of determining maxima and minima: in one dimension, one can find extrema by simply computing the stationary points (by computing the zeros of the derivative), the non-differentiable points, and the boundary points, and then investigating this set to determine the extrema.

One can do this either by evaluating the function at each point and taking the maximum, or by analyzing the derivatives further, using the first derivative test, the second derivative test, or the higher-order derivative test.

In dimension above 1, one cannot use the first derivative test any longer, but the second derivative test and higher-order derivative test generalize.

Cautions

A subtle misconception that is often held in the context of Fermat's theorem is to assume that it makes a stronger statement about local behavior than it does. Notably, Fermat's theorem does not say that functions (monotonically) "increase up to" or "decrease down from" a local maximum. This is very similar to the misconception that a limit means "monotonically getting closer to a point".

For "well-behaved functions" (which here mean continuously differentiable), some intuitions hold, but in general functions may be ill-behaved, as illustrated below.

The moral is that derivatives determine infinitesimal behavior, and that continuous derivatives determine local behavior.

Continuously differentiable functions

If f is continuously differentiable ($C^1$) on a neighborhood of $x_0,$ then $f'(x_0) > 0$ does mean that f is increasing on a neighborhood of $x_0,$ as follows.

If $f'(x_0) = K > 0$ and $f \in C^1,$ then by continuity of the derivative, there is a neighborhood $(x_0-\epsilon_0,x_0+\epsilon_0)$ of $x_0$ on which $f'(x_0) > K/2.$ Then f is increasing on this interval, by the mean value theorem: the slope of any secant line is at least $K/2,$ as it equals the slope of some tangent line.

However, in the general statement of Fermat's theorem, where one is only given that the derivative at $x_0$ is positive, one can only conclude that secant lines through $x_0$ will have positive slope, for secant lines between $x_0$ and near enough points.

Conversely, if the derivative of f at a point is zero ($x_0$ is a stationary point), one cannot in general conclude anything about the local behavior of f – it may increase to one side and decrease to the other (as in $x^3$), increase to both sides (as in $x^4$), decrease to both sides (as in $-x^4$), or behave in more complicated ways, such as oscillating (as in $x^2(\sin(1/x))$, as discussed below).

One can analyze the infinitesimal behavior via the second derivative test and higher-order derivative test, if the function is differentiable enough, and if the first non-vanishing derivative at $x_0$ is a continuous function, one can then conclude local behavior (i.e., if $f^{(k)}(x_0) \neq 0$ is the first non-vanishing derivative, and $f^{(k)}$ is continuous, so $f\in C^k$), then one can treat f as locally close to a polynomial of degree k, since it behaves approximately as $f^{(k)}(x_0) (x-x_0)^k,$ but if the kth derivative is not continuous, one cannot draw such conclusions, and it may behave rather differently.

Pathological functions

Consider the function $\sin(1/x)$ – it oscillates increasingly rapidly between $-1$ and $1$ as x approaches 0. Consider then $f(x)=(1+\sin(1/x))x^2$ – this oscillates increasingly rapidly between 0 and $2x^2$ as x approaches 0. If one extends this function by $f(0) := 0,$ then the function is continuous and everywhere differentiable (it is differentiable at 0 with derivative 0), but has rather unexpected behavior near 0: in any neighborhood of 0 it attains 0 infinitely many times, but also equals $2x^2$ (a positive number) infinitely often.

Continuing in this vein, $f(x)=(2+\sin(1/x))x^2$ oscillates between $x^2$ and $3x^2,$ and $x=0$ is a local and global minimum, but on no neighborhood of 0 is it decreasing down to or increasing up from 0 – it oscillates wildly near 0.

This pathology can be understood because, while the function is everywhere differentiable, it is not continuously differentiable: the limit of $f'(x)$ as $x \to 0$ does not exist, so the derivative is not continuous at 0. This reflects the oscillation between increasing and decreasing values as it approaches 0.

1. ^ This intuition is only correct for continuously differentiable ($C^1$) functions, while in general it is not literally correct – a function need not be increasing up to a local maximum: it may instead be oscillating, so neither increasing nor decreasing, but simply the local maximum is greater than any values in a small neighborhood to the left or right of it. See details in the pathologies.