# Talk:Jensen's inequality

WikiProject Mathematics (Rated B-class, Mid-importance)
This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Mathematics rating:
 B Class
 Mid Importance
Field: Analysis

## University logo

Jensen's inequality serves as logo for the mathematics department of Copenhagen University.

## In the language of measure theory

The statement in the language of measure theory is true iff $\mu$ is a positive measure. That is, if it is indeed a probability measure. So, I do not see the point of keeping the two different statements (language of measure theory and probability theory). They are exactly the same, with two different notations! And there are several other coherent notations used in measure/probability theory that we could then use here (but of course, the Jensen article is not the right place to discuss about general notations). Therefore, I purpose to delete the language of measure theory section, and just leave the theorem stated with the $\mathbb E$ notation. I will delete it in a few days if I do not receive comments. gala.martin (what?) 18:39, 30 April 2006 (UTC)

## Use of g

In the measure theoretic notation, the use of g is IMHO misleading. We should replace it by x simply. Indeed, the inequality written with x is not less general than the one with g(x), since the generality of the measure mu allows to recover any function with 0 effort. Adding functions instead of the identity in this case is not generalizing, is confusing. I'll try to be clearer. If you want to write a theorem about a random variable, you say let X be a random variable with property A and B, then X has the property C, this is not less general than let X be a random variable, such that g(X) has the property A and B. Than g(X) has the property C. I think that's exactly what we are writing. Am I right? --gala.martin (what?) 09:36, 29 August 2006 (UTC)

## Different proofs

The graphical proof can be made more clear with a concrete example, say $\phi(x)=e^x$, and a particular distribution, say discretely uniform random variable. The abstract proof number 2 using measure theoretic notation can also be illustrated graphically and that it can tie in perfectly with the intuitive graphical argument.

The first proof by induction does not appear simple in the generalization step with the use of delta function and other notions. The third proof appears to have overly complicated notations. The proof idea is unclear at the end, which a summary or conclusion would help clarify. It would be good to point out the difference compared to the second proof, if any, in addition to the notations.

The second proof is concise yet general. A translation to the probability notation should simply involve rewriting the integral as the expectation, and translating the linearity of integration to linearity of expectation. It would be better if it is put first with the following changes:

1. use $X$ instead of $g$ for the random variable
2. point out at the end that any subderivative could have been used in place of the right-handed derivative
3. tie it in with the graphical proof and a concrete example.

--Chungc 05:55, 4 December 2006 (UTC)

The following equality in the third proof fails for $\phi(x)=e^x$, as for, say, y=1 the lim is 1 at 0, while the inf is 0, as one can see for $\theta \to - \infty$:

$(D\varphi)(x)\cdot y:=\lim_{\theta \downarrow 0} \frac{\varphi(x+\theta\,y)-\varphi(x)}{\theta}=\inf_{\theta \neq 0} \frac{\varphi(x+\theta\,y)-\varphi(x)}{\theta}.$

One should really use a subderivative here. On a related note, does anyone know a way to prove the existence of such on an arbitrary vector space without using Zorn's lemma? --Pavel.zorin (talk) 10:14, 10 November 2009 (UTC)

## Image:Jensen_graph.png

This bot has detected that this page contains an image, Image:Jensen_graph.png, in a raster format. A replacement is available as a Scalable vector graphic (SVG) at File:Jensen graph.svg. If the replacement image is suitable please edit the article to use the vector version. Scalable vector graphics should be used in preference to raster for images that can easily represented in a vector graphic format. If this bot is in error, you may leave a bug report at its talk page Thanks SVnaGBot1 (talk) 15:09, 3 July 2009 (UTC)

Also I noticed that in the bottom graph, it says Y(E(X)) when it should actually say $\varphi(E(X))$. It would also be useful to add the X=Y line to the image to more easily see that the Y values are larger than their corresponding X values. Any idea how to correct the image? Toby Dylan Hocking, 4 Feb 2010.
I agree with your suggestions. The file is an SVG so you can just open and change it in an ordinary text editor. For a GUI, see Inkscape, which seems to be what the SVG was made in. --C. lorenz (talk) 11:42, 7 February 2010 (UTC)

## Conditional expectation in Proof 3

Isn't it important to notice that the conditional expectation preserves order? I mean:

$X \geq Y \Rightarrow \mathbb{E}\{ X |\mathfrak{G}\} \geq \mathbb{E}\{ Y |\mathfrak{G}\}.$

The fact is not that obvious in my opinion. André Caldas (talk) 01:09, 4 August 2010 (UTC)

## Reference Missing for Special Result

There is a special form of Jensen's inequality given for probability density functions f ('Form involving a probability density function'):

$\varphi\left(\int_{-\infty}^\infty g(x)f(x)\, dx\right) \le \int_{-\infty}^\infty \varphi(g(x)) f(x)\, dx.$

However, there is no proof or reference for this formula and it does not seem to be so easy to derive it from the standard form. Can someone please add a reference (or a short proof if possible). Thank you, --134.60.10.241 (talk) 10:50, 15 August 2011 (UTC)

Is it sufficient to set r.v. X to g(X) in the standard probabilistic form? Hupili (talk) 14:03, 12 March 2012 (UTC)

## "Subdifferential" in proof 3

The use of the "subdifferential" in Proof 3 is problematic: First of all to make the two definitions (limit vs infimum) agree, the infimum must be restricted to $\theta > 0$. Second, $(D\varphi)(x)$ is not linear in $y$: Consider for example the function defined by $f(x) = |x|$ for $x \geq 0$ and $f(x) = |x|/2$ for $x \leq 0$. Then $(D\varphi)(0)(y) = f(y)$. Why not simply take any subderivative and link to the corresponding article for existence? Xvlcw (talk) 09:38, 17 January 2013 (UTC)

## removed comment in Finite form section

A previous version included this statement in parentheses in the Finite Form section: "the function log(x) is concave (note that we can use Jensen's to prove convexity or concavity, if it holds for two real numbers whose functions are taken)". This statment does not make sense. Jensen's inequality doesn't say the function is concave if and only if the inequality holds. The easiest way to prove that log(x) is concave is to observe that the second derivative is negative as described on the concave wikipedia page. Once you know it is concave, you can then apply Jensen's inequality. John Lawrence (talk) 16:27, 18 April 2013 (UTC)