A proof of Jensen's inequality can be provided in several ways. Here three different proves are given, each proof beeing related to the three different statements above (the finite form, the inequality in measure-theoretic terminology, and the general inequality in probabilistic notation). The first one is obtained by proving the finite form of the inequality first, and then using a density argument; this proof should clear out how is the inequality derived. The second one is the most common proof of Jensen's inequality, and uses some basic ideas of nonsmooth analysis. The third one is just a generalization of the second one, that provides a proof of the general statement for vector--valued random variables. This last proof is the more compact, even if requires a more advanced mathematical level.
Proof 1 (using the finite form)
If are two arbitrary positive real numbers such that , then convexity of implies for any . This can be easily generalized: if are n positive real numbers such that , then
for any . This finite form of the Jensen's inequality can be proved by induction: by convexity hypotheses, the statement is true for . Suppose it is true also for some n, one needs to prove it for n+1. At least one of the is strictly positive, say ; therefore by convexity inequality:
Since , one can apply the induction hypotheses to the last term in the previous formula to obtain the result, namely the finite form of the Jensen's inequality.
In order to obtain the general inequality from this finite form, one needs to use a density argument. The finite form can be re-written as:
Since convex functions are continuous, and since convex combinations of Dirac deltas are weakly dense in the set of probability measures (as could be easily verified), the general statement is obtained simply by a limiting procedure.
Proof 2 (measure theoretic notation)
Let g be a real-valued μ-integrable function on a measure space Ω, and let φ be a convex function on the real numbers. Define the right-handed derivative of φ at x as
Since φ is convex, the quotient of the right-hand side is decreasing when t approaches 0 from the right, and bounded below by any term of the form
where t < 0, and therefore, the limit does always exist.
Now, let us define the following:
Then for all x, . To see that, take x>x0, and define t = x − x0 > 0. Then,
as desired. The case for x < x0 is proven similarly, and clearly .
φ(x0) can then be rewritten as
But since μ(Ω) = 1, then for every real number k we have
Proof 3 (general inequality in probabilistic notation)
Let be a random variable that takes value in a real topological vector space T. Since is convex, for any , the quantity
is decreasing as θ approaches 0. In particular, it is well defined the subdifferential of evaluated at in the direction , defined by:
It is easily seen that the subdifferential is linear in , and since the infimum taken in the right-hand side of the previous formula is smaller than the value of the same term for one gets:
In particular, for an arbitrary sub-σ-algebra we can evaluate the last inequality when to obtain:
Now, if we take the expectation conditioned to on both sides of the previous expression, we get the result since:
by the linearity of the subdifferential in the variable, and well-known properties of the conditional expectation.