# Talk:Expected value

WikiProject Statistics (Rated B-class, Top-importance)

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

B  This article has been rated as B-Class on the quality scale.
Top  This article has been rated as Top-importance on the importance scale.
WikiProject Mathematics (Rated B-class, Top-importance)
This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Mathematics rating:
 B Class
 Top Importance
Field:  Probability and statistics
One of the 500 most frequently viewed mathematics articles.

Archived discussion to mid 2009

## what is it with you math people and your inability to express yourselves ??

Your assertion that expected value is the most probably value is false. Your implication that the expected value of the sum of two fair dice roles is equal to 6 is also false for either definition, and can checked by direct calculation. So, if we are to judge things based on the facts, as you suggest, no such changes should be made. — Preceding unsigned comment added by 193.170.138.132 (talk) 13:26, 18 March 2014 (UTC)
I do hope you understand your molecular biology better than your maths. Typical life scientist spouting nonsense.80.47.102.164 (talk) 09:23, 31 March 2013 (UTC)
No, no! In no way! A typical life scientist is much more clever and nice. This is either not a life scientist, or a quite atypical life scientist. Boris Tsirelson (talk) 12:19, 31 March 2013 (UTC)
I completely agree. I've worked as a consultant to casinos training their management teams to understand expected value and standard deviation as it applies to casino gaming so I would hope that I would at least be able to casually read and understand the introduction. As it reads now I feel like an idiot. The standard deviation article, on the other hand, is much easier to read as a layman in advanced mathematics. As an example, please compare the "application" sections of this article and the standard deviation article. Here's an excerpt from http://en.wikipedia.org/wiki/Wikipedia:What_Wikipedia_is_not :
Scientific journals and research papers. A Wikipedia article should not be presented on the assumption that the reader is well versed in the topic's field. Introductory language in the lead and initial sections of the article should be written in plain terms and concepts that can be understood by any literate reader of Wikipedia without any knowledge in the given field before advancing to more detailed explanations of the topic. While wikilinks should be provided for advanced terms and concepts in that field, articles should be written on the assumption that the reader will not or cannot follow these links, instead attempting to infer their meaning from the text.

Absolutely agree. this is a disgracefully written page. — Preceding unsigned comment added by 90.209.72.109 (talk) 19:57, 14 May 2012 (UTC)

Math people expressed themselves on Wikipedia talk:WikiProject Mathematics, see the Frequently Asked Questions on the top of that page. Boris Tsirelson (talk) 05:35, 17 May 2012 (UTC)
I can hack through quantum mechanics but this “math” is beyond reason. Whoever wrote this should be ashamed and whoever keeps it like this (who know the subject well) is even worst. This is the worst article I have seen on this site. I have several science degrees and this is almost too far gone for even me. 66.54.125.35 (talk) 16:38, 25 June 2012 (UTC)

"...most probable value that we can expect..."—excuse me, but this is plainly false, as explained in the last sentence of the second paragraph in the article. So the opening paragraph shall definitely not read like this. Personally I find the article well-written and understandable. bungalo (talk) 20:29, 27 June 2012 (UTC)

Indeed, some of you have a PhD in molecular biology, some have several science degrees, some can hack through quantum mechanics(!). For such a person it should be an execise of several hours (well, maximum, several days), to take an appropriate book (or two), to learn what the expected value is, to return here and make this page satisfactory! Instead, you complain during years. During this time some of you that had no appropriate education at the time of the initial complain should get such education. But no; no one of you returns. What could it mean? Either you are all hopelessly uneducated (contrary to your claims), or when getting educated you start to understand that the article is what it should be. Any other explanation of the phenomenon? Boris Tsirelson (talk) 11:32, 19 March 2014 (UTC)

## Proposition to add origin of the theory

I propose to add the following in a separate chapter:

Blaise Pascal was challenged by a friend, Antoine Gombaud (self acclaimed “Chevalier deM\´er\´e” and writer), with a gambling problem. The problem was that of two players who want to finish a game early and, given the current circumstances of the game, want to divide the stakes fairly, based on the chance each has of winning the game from that point. How should they find this “fair amount”? In 1654, he corresponded with Louis de Fermat on the subject of gambling. And it is in the discussion about this problem, that the foundations of the mathematical theory of probabilities are laid and the notion expected value introduced.

--->PLEASE let me know (right here or on my talk page) if this is would not be ok otherwise I plan to add it in a few days<---

Phdb (talk) 14:00, 24 February 2009 (UTC)

Yes please add. In fact, for quite some time the concept of expectation was more fundamental than the concept of probability in the theory. Fermat and Pacal never mentioned the word "probability" in their correspondence, for example. The probability concept then eventually emerged from the concept of expectation and replaced it as the fundamental concept of the theory. iNic (talk) 01:03, 26 February 2009 (UTC)
Laplace used the term “hope” or “mathematical hope” to denote the concept of expected value (see [1], ch.6):
I wonder what name was used by Pascal (if any), and where did “expectation” came from?  … stpasha »  19:37, 24 September 2009 (UTC)

## Strange

${\displaystyle \operatorname {E} ({\rm {Roll\ With\ 6\ Sided\ Die}})={\frac {1+2+3+4+5+6}{6}}=3.5}$

I not very familiar with statistics, and possibly this is why I can't see any sense in counting the numbers written on the sides of the die. What if the sides of the die would be assigned signs without defined alphanumerical order? What would be the "expected value"? --85.207.59.18 (talk) 12:14, 11 September 2009 (UTC)

In that case you can't really speak of a "value" of a certain side. If there's no value to a side, it's impossible to speak of an expectation value of throwing the die. The concept would be meaningless. Gabbe (talk) 15:52, 22 January 2010 (UTC)
Of course, you could assign values to the sides. The sides of a coin, for example, do not have numerical values. But if you assigned the value "-1" to heads and "1" to tails you would get the expected value
${\displaystyle \operatorname {E} ({\rm {Flipping\ a\ coin}})={\frac {-1+1}{2}}=0}$
Similarly, if you instead gave heads the value "0" and tails the value "1" you would get
${\displaystyle \operatorname {E} ({\rm {Flipping\ a\ coin}})={\frac {0+1}{2}}={\frac {1}{2}}}$
and so forth. Gabbe (talk) 20:34, 22 January 2010 (UTC)

At present (Feb 2010) the article is rated as only "Start", yet supposedly has "Top" priority and is on the frequent-viewing lists. Some initial comments on where things fall short are:

• The lead fails to say why "expected value" is important, either generally or regarding its importance as underlying statistical inference.
• There is a lack of references, both at a general-reader level and for the more sophisticated stuff.
• There is some duplication, which is not necessarily a problem, but it would be if there were cross-referencing of this.
• There is a poor ordering of material, in terms of sophistication, with elementary level stuff interspersed.

I guess others will have other thoughts. Particularly for this topic, it should be a high priority to retain a good exposition that is accessible at an elementary level, for which there is good start already in the article. Melcombe (talk) 13:17, 25 February 2010 (UTC)

"The expected value is in general not a typical value that the random variable can take on. It is often helpful to interpret the expected value of a random variable as the long-run average value of the variable over many independent repetitions of an experiment."

So the expected value is the mean for repeated experiments (why not just say so?), and yet you explicitly tell me that it is "in general not a typical value that the random variable can take on". The normal distribution begs to disagree. Regardless of theoretical justifications in multimodal cases, this is simply bizarre. More jargon != smarter theoreticians. Doug (talk) 18:33, 21 October 2010 (UTC)

What is the problem? The expected value is in general not a typical value. In the special case of the normal distribution it really is, who says otherwise? In the asymmetric unimodal case it is different from the mode. For a discrete distribution is it (in general) not a possible value at all. Boris Tsirelson (talk) 19:22, 21 October 2010 (UTC)
(why not just say so?) — because this is the statement of the law of large numbers — that when the expected value exists, the long-run average will converge almost surely to the expected value. If you define the expected value as the long-run average, then this theorem becomes circularly-dependent. Also, for some random variables it is not possible to imagine that they can be repeated many times over (say, a random variable that a person dies tomorrow). Expected value is a mathematical construct which exists regardless of the possibility to repeat the experiment.  // stpasha »  02:21, 22 October 2010 (UTC)
From the article: «formally, the expected value is a weighted average of all possible values.». A formal definition should refer to a particular definition of weight: probability. As it happens, the Wikipedia article Weighted arithmetic mean refers to a "weighted average" as a "weighted mean". "Mean" is both more precise than the ambiguous "average", and less confusing. The Wikipedia article on the Law of large numbers links to average, which again links to mean, median and mode. Our current article talks about average, but then stresses that it does not refer to a typical, nor even an actual, value—so as to eliminate the other definitions of "average" than "mean". It would be much simpler to just say "mean".
Either just say "mean", or use "mean" when referring to probability distributions, and "expected value" when referring to random variables. That's not standard, though. — Preceding unsigned comment added by SvartMan (talkcontribs) 00:04, 3 March 2014 (UTC)

## Proposition for alternative proof of ${\displaystyle \mathbb {E} (X)=\int _{0}^{\infty }\mathbb {P} (X>x)dx}$

I tried to add the proof below which I believe to be correct (except for a minor typo which is now changed). This was undone because "it does not work for certain heavy-tailed distribution such as Pareto (α < 1)". Can someone elaborate?

Alternative proof: Using integration by parts

${\displaystyle \mathbb {E} (X)=\int _{0}^{\infty }(-x)(-f_{X}(x))\;dx=\left[-x(1-F(x))\right]_{0}^{\infty }+\int _{0}^{\infty }(1-F(x)\;dx}$

and the bracket vanishes because ${\displaystyle 1-F(x)=o(1/x)}$ as ${\displaystyle x\to \infty }$. —Preceding unsigned comment added by 160.39.51.111 (talk) 02:50, 13 May 2011 (UTC)

Actually the end of section 1.4 seems in agreement, so I am reinstating my changes —Preceding unsigned comment added by 160.39.51.111 (talk) 02:54, 13 May 2011 (UTC)
I have removed it again. The "proof" is invalid as it explicitly relies on the assumption ${\displaystyle 1-F(x)=o(1/x)}$ as ${\displaystyle x\to \infty }$ which does not hold for all cdfs (e.g. Pareto as said above). You might try reversing the argument and doing an integration by parts, starting with the "result", which might then be shown to be equavalent to the formula involving the density. PS, please sign your posts on talk pages. JA(000)Davidson (talk) 09:40, 13 May 2011 (UTC)
Let's try to sort this out: I claim that whenever X nonnegative has an expectation, then ${\displaystyle 1-F(x)=o(1/x)}$ (Pareto distribution when alpha < 1 doesn't even have an expectation, so this is not a valid counter-example)
Proof: Assuming X has density function f, we have for any ${\displaystyle c>0}$
${\displaystyle \mathbb {E} (X)=\int _{0}^{\infty }xf(x)dx\geq \int _{0}^{c}xf(x)dx+c\int _{c}^{\infty }f(x)dx}$
Recognizing ${\displaystyle {\bar {F}}(c)=\int _{c}^{\infty }f(x)dx}$ and rearranging terms:
${\displaystyle 0\leq c{\bar {F}}(c)\leq \mathbb {E} (X)-\int _{0}^{c}xf(x)dx\to 0{\text{ as }}c\to \infty }$
as claimed.
Are we all in agreement, or am I missing something again? Phaedo1732 (talk) 19:05, 13 May 2011 (UTC)
Regardless of the validity of the proof, is an alternative proof a strong addition to the page? CRETOG8(t/c) 19:12, 13 May 2011 (UTC)
I think so, because the current proof is more like a trick than a generic method, whereas the alternative proof could be generalized (as shown in Section 1.4) I also think the point of an encyclopedia is to give more information rather than less. Phaedo1732 (talk) 00:49, 14 May 2011 (UTC)
See 6 of WP:NOTTEXTBOOK, and WP:MSM#Proofs. This doesn't seem to be a place that needs a proof at all. What is needed is a proper citation for the result, and a proper statement of the result and its generalisation to other lower bounds. {I.e. the result could be used as an alternative definition of "expected value", but are the definitions entirely equivalent?)JA(000)Davidson (talk) 08:28, 16 May 2011 (UTC)
Clearly the previous editor of that section thought a proof should be given. If anyone comes up with a good citation, I am all for it. Phaedo1732 (talk) 15:31, 16 May 2011 (UTC)

## Simple generalization of the cumulative function integral

Currently the article has the integral

${\displaystyle \operatorname {E} (X)=\int _{0}^{\infty }P(X\geq x)\;dx}$

for non-negative random variables X. However, the non-negativeness restriction is easily removed, resulting in

${\displaystyle \operatorname {E} (X)=-\!\int _{-\infty }^{0}P(X\leq x)\;dx+\int _{0}^{\infty }P(X\geq x)\;dx.}$

Should we give the more general form, too? -- Coffee2theorems (talk) 22:33, 25 November 2011 (UTC)

But do not forget the minus sign before the first integral. Boris Tsirelson (talk) 15:47, 26 November 2011 (UTC)
Oops. Fixed. Anyhow, do you think it would be a useful addition? -- Coffee2theorems (talk) 19:29, 4 December 2011 (UTC)
Yes, why not. I always present it in my courses.
And by the way, did you see in "general definition" these formulas:
• ${\displaystyle \operatorname {E} (g(X))=\int _{a}^{\infty }g(x)\,\mathrm {d} \operatorname {P} (X\leq x)=g(a)+\int _{a}^{\infty }g'(x)\operatorname {P} (X>x)\,\mathrm {d} x}$ if ${\displaystyle \operatorname {P} (g(X)\geq g(a))=1}$,
• ${\displaystyle \operatorname {E} (g(X))=\int _{-\infty }^{a}g(x)\,\mathrm {d} \operatorname {P} (X\leq x)=g(a)-\int _{-\infty }^{a}g'(x)\operatorname {P} (X\leq x)\,\mathrm {d} x}$ if ${\displaystyle \operatorname {P} (g(X)\leq g(a))=1}$.
I doubt it is true under just this condition. Boris Tsirelson (talk) 07:26, 5 December 2011 (UTC)
Moreover, the last formula is ridiculous:
${\displaystyle \operatorname {E} (|X|)=\int _{0}^{\infty }\lbrace 1-F(t)\rbrace \,\operatorname {d} t,}$
if Pr[X ≥ 0] = 1, where F is the cumulative distribution function of X.
Who needs the absolute value of X assuming that X is non-negative? Boris Tsirelson (talk) 07:30, 5 December 2011 (UTC)
And a reference for this from (Papoulis, Athanasios, and S. Unnikrishna Pillai. "Chapter 5-3 Mean and Variance." Probability, random variables, and stochastic processes. Tata McGraw-Hill Education, 2002.) This book derives this form from a frequency interpretation, which should make some people happy. Its form is slightly different as your derivation counts the mass at zero twice (only an issue for discrete and mixed distributions):
${\displaystyle \operatorname {E} (X)=\int _{0}^{\infty }1-F(x)\;dx-\int _{-\infty }^{0}F(x)\;dx}$
Best regards; Mouse7mouse9
Sure. This "your" formula is correct in all cases (including discrete and mixed distributions).
And please sign your messages (on talk pages) with four tildas: ~~~~. Boris Tsirelson (talk) 06:28, 17 June 2015 (UTC)

## "Expected value of a function" seems to be misplaced

Does the text starting with "The expected value of an arbitrary function of ..." really belong to the definition of the expectation, or would it be better to move it to Properties, between 3.6 and 3.7, and give it a new section (with which title?)? I am not entirely sure, but I think one can derive the expected value of a function of a random variable without the need for an explicit definition. After all, the function of a random variable is a random variable again; given that random variables are (measurable) functions themselves, it should be possible to construct $E(g(X))$ just from the general definition of $E$. Any thoughts? Grumpfel (talk) 21:54, 29 November 2011 (UTC)

I agree. Boris Tsirelson (talk) 06:33, 30 January 2012 (UTC)

## Expectation of the number of positive events

If there is a probability p that a certain event will happen, and there are N such events, then the expectation of the number of events is ${\displaystyle pN}$, even when the events are dependent. I think this is a useful application of the sum-of-expectations formula. --Erel Segal (talk) 14:39, 6 May 2012 (UTC)

## notation

In the section on iterated expectations and the law of total expectation, the lower-case x is used to refer to particular values of the random variable denoted by capital X, so that for example

${\displaystyle \sum _{x=2}^{3}x\Pr(X=x)=2\cdot \Pr(X=2)+3\cdot \Pr(X=3).\,}$

Then I found notation that looks like this:

${\displaystyle \operatorname {E} _{X}(x)\,}$

Now what in the world would that be equal to in the case where x = 3?? It would be

${\displaystyle \operatorname {E} _{X}(3),\,}$

but what is that??? This notation makes no sense, and I got rid of it. Michael Hardy (talk) 21:33, 13 August 2012 (UTC)

## Incorrect example?

Example 2 in the definition section doesn't take into account the $1 wager. — Preceding unsigned comment added by Gregchaz (talkcontribs) 22:23, 17 November 2012 (UTC) Isn't it factored into the$35 payout? —C.Fred (talk) 22:25, 17 November 2012 (UTC)

## Formulas for special cases - Non-negative discrete

In the example at the bottom, I think the sum should be from i=1, and equal (1/p)-1. For instance, if p=1, you get heads every time, so since they so carefully explained that this means that X=0, the sum should work out to 0; hence (1/p)-1 rather than (1/p).

Well, imagine that p = 1 but YOU don't know. Then you'll toss the coin, and of course it gives heads at the first try. Other way of explaining: suppose p = 0.9999, and you know it. But then you are not absolutely sure that you will get heads, and you have to toss, with a very high probability of success at the first try. Bdmy (talk) 07:27, 1 June 2013 (UTC)
The OP is correct -- Bdmy has missed the fact that getting heads on the first try with certainty or near certainty is, in the notation of the article's example, X=0 not X=1.. There are several ways to see this: (1) The formula derived above the example says that the sum goes from one to infinity, not zero to infinity. Or, consider (2) if p=1/2, the possible sequences are H (X=0 with probability 1/2); TH (X=1 with probability 1/4); TTH (X=2 with probability 1/8; etc. So the expected value is 0 times 1/2 plus 1 times 1/4 plus 2 times 1/8 plus ... = 0 + 1/4 + 2/8 + 3/16 + 4/32 + .... = 1 = (1/p) -1. Or, consider (3) the OP's example with p=1 is correct -- you will certainly get a success on the first try, so X=0 with certainty so E(X) = 0 = (1/p) - 1. I'll correct it in the article. Duoduoduo (talk) 18:37, 1 June 2013 (UTC)

## Reason for deleting sub-section "Terminology"

I'm deleting the subsection "Terminology" of the section "Definition"for the following reasons. The sections reads

Terminology
When one speaks of the "expected price", "expected height", etc. one often means the expected value of a random variable that is a price, a height, etc. However, the "value" in expected value is more general than price or winnings. For example a game played to try to obtain the cost of a life saving operation would assign a high value where the winnings are above the required amount, but the value may be very low or zero for lesser amounts.
When one speaks of the "expected number of attempts needed to get one successful attempt", one might conservatively approximate it as the reciprocal of the probability of success for such an attempt. Cf. expected value of the geometric distribution.

The first sentence is a pointless tautology. The remainder of the first paragraph doesn't make a bit of sense, but maybe it is attempting to make the obvious point that sometimes "value" means a dollar value and sometimes not. This is obvious and not useful, even if it were well-expressed. The second paragraph doesn't have anything to do with either "Definition" or "Terminology", and in any event it is wrong (the "approximate" value it gives is actually exact. Duoduoduo (talk) 14:51, 2 June 2013 (UTC)

The article seems pretty clearly to have satisfied the criteria at least for C-class quality; I'd say it looks more like B-class at this point. I'm re-rating it to C-class, and I'd love to hear thoughts on the article's current quality. -Bryanrutherford0 (talk) 03:31, 18 July 2013 (UTC)

I agree with B class rating, and a changing it accordinglyy At least for math, there is a B+rating that would be applied if there are more references.Brirush (talk) 03:18, 10 November 2014 (UTC)

## Multivariate formula

The following formula has been added for the expected value of a multivariate random variable:

${\displaystyle \operatorname {E} [X]=\int _{-\infty }^{\infty }\cdots \int _{-\infty }^{\infty }X(x_{1},\cdots ,x_{n})~f(x_{1},\cdots ,x_{n})~dx_{1}\cdots dx_{n}.}$

First, I don't understand what calculation is called for by the formula. Why do we have a multiple integral? It seems to me that since the left side of the equation is an n-dimensional vector, the right side should also be an n-dimensional vector in which each element i has a single integral of ${\displaystyle x_{i}f(x_{1},\dots ,x_{n})dx_{i}.}$

Second, I don't understand the subsequent sentence

Note that, for the univariate cases, the random variables X are taken as the identity functions over different sets of reals.

What different sets of reals? And in what way is X in the general case not based on the identity function -- is ${\displaystyle X(x_{1},\cdots ,x_{n})}$ intended to mean something other than simply the vector ${\displaystyle (x_{1},\cdots ,x_{n})}$ ? Duoduoduo (talk) 17:56, 12 September 2013 (UTC)

I agree, that is a mess. First, I guess that the formula
${\displaystyle \operatorname {E} [X]=\int _{-\infty }^{\infty }\cdots \int _{-\infty }^{\infty }(x_{1},\cdots ,x_{n})~f(x_{1},\cdots ,x_{n})~dx_{1}\cdots dx_{n}.}$
was really meant. Second, I guess, the author of this text is one of these numerous people that believe that, dealing with an n-dim random vector, we should take the probability space equal to Rn, the probability measure equal to the distribution of the random vector, and yes, ${\displaystyle X(x_{1},\cdots ,x_{n})=(x_{1},\cdots ,x_{n})}$. (Probably because they have no other idea.) I am afraid that they can support this by some (more or less reliable) sources. Boris Tsirelson (talk) 18:17, 12 September 2013 (UTC)
Should it be reverted? Duoduoduo (talk) 19:40, 12 September 2013 (UTC)
Maybe. Or maybe partially deleted and partially reformulated? Boris Tsirelson (talk) 21:07, 12 September 2013 (UTC)
I'll leave it up to you -- you're more familiar with this material than I am. Duoduoduo (talk) 22:56, 12 September 2013 (UTC)
I wrote this formula. (I have a Ph.D. in Applied Mathematics, though I can admit that I am wrong if someone proves it) I make the link between the general form and the uni-variate form. This formula is what I meant. This link can help to legitimate the statement http://mathworld.wolfram.com/ExpectationValue.html . Note that if this line is not there it is hard to make the link between the general form and the uni-varite form.
The formula in Wolfram is OK; but why your one is different?
Before deciding whether or not your formula should be here we should decide whether or not it is correct.
In Wolfram one considers expectation of a function f of n random variables that have a joint density P. In contrast, you write "multivariate random variable ${\displaystyle X(x_{1},\cdots ,x_{n})}$ admits a probability density function ${\displaystyle f(x_{1},\cdots ,x_{n})}$". What could it mean? A (scalar) function of n random variables is a one-dimensional random variable, and its density (if exists) is a function of one variable. The random vector ${\displaystyle (x_{1},\cdots ,x_{n})}$ is a multivariate random variable and its density (if exists) is a function of n variables. What could you mean by X? Boris Tsirelson (talk) 13:59, 2 October 2013 (UTC)
You say " In Wolfram one considers expectation of a function f of n random variables that have a joint density P ". I do not agree. Indeed, they don't say that the ${\displaystyle x_{1},\cdots ,x_{n}}$ are RANDOM variable. I would say that their definition is more prudent. More specifically, in this article, I consider in the definition of the multivariate case that the vector ${\displaystyle x_{1},\cdots ,x_{n}}$ belongs to the sample space whereas ${\displaystyle X(x_{1},\cdots ,x_{n})}$ is a random variable which is actually a function of this variable in the sample space.
Notice that (for simplicity) in the case of univariate functions, the variables of your sample space are equal to the observed random variable. I.e. roll one dice you see 5 then the random variable return 5, this is simple.
Let us now consider a multivariate random variable: Choose one longitude, one latitude, and a date (these are the variables of the sample space). Let us now measure something e.g. atmospheric pressure or temperature or simply the sum of longitude and latitude (even if it does not make much sense.) these are multivariate random variables. You just observe numbers and build your statistic like you would do in the section general definition.

(Unindent) Ah, yes, this is what I was fearing of, see above where I wrote: "the author of this text is one of these numerous people that believe that, dealing with an n-dim random vector, we should take the probability space equal to Rn, the probability measure equal to the distribution of the random vector, and yes, ${\displaystyle X(x_{1},\cdots ,x_{n})=(x_{1},\cdots ,x_{n})}$. (Probably because they have no other idea.)"

The probem is that (a) this is not the mainstream definition of a random variable (in your words it is rather the prudent definition, though I do not understand what is the prudence; as for me, the standard definition is more prudent); and (b) this "your" definition does not appear in Wikipedia (as far as I know). Really, I am not quite protesting against it. But for now the reader should be puzzled unless he/she reads your comment here on the talk page. In order to do it correctly you should first introduce "your" approach in other articles (first of all, "Random variable") and only then use it here, with the needed explanation. And of course, for succeeding with this project you need reliable sources. Boris Tsirelson (talk) 16:40, 2 October 2013 (UTC)

And please do not forget to sign your messages with four tildas: ~~~~. :-) Boris Tsirelson (talk) 16:44, 2 October 2013 (UTC)

1° Just to make things clear, I am not talking about random vector.
2° For me the definition of a random variable is the same as in the section "Measure-theoretic definition" of the article "Random Variable" and is what is actually used in the section "General definition" of the article Expected value.
3° What I am trying here is to fill the gap between the "univariate cases" and the "General definition". "univariate cases" are simplified cases of "General definition". It was not easy to see at first, so I am trying to fill the gap. My contribution is simply to consider the "General definition" with ${\displaystyle \Omega =\mathbb {R} ^{n}}$ and then say that, if ${\displaystyle n=1}$, then for simplicity one often consider the ${\displaystyle X(x_{1})=x_{1}}$ as done in the univariate cases.
212.63.234.4 (talk) 11:38, 3 October 2013 (UTC)
But your formula is not parallel to the univariate case (as it is presented for now):
"If the probability distribution of X admits a probability density function f(x), then the expected value can be computed as
${\displaystyle \operatorname {E} [X]=\int _{-\infty }^{\infty }xf(x)\,dx.}$"
You see, nothing special is assumed about the probability space; it is left arbitrary (as usual), and does not matter. What matters is the distribution. Not at all "${\displaystyle X(x_{1})=x_{1}}$". If you want to make it parallel, you should first add your-style formulation to the univariate case: "It is always possible to use the change-of-variable formula in order to pass from an arbitrary probability space to the special case where (you know what) without changing the distribution (and therefore the expectation as well)", something like that. Also your terminology... what you call a multivariate random variable is what I would call a univariate random variable defined on the n-dimensional probability space (you know which). What about sources for your terminology? Boris Tsirelson (talk) 12:46, 3 October 2013 (UTC)
1° (Just to be aware of what we talk about) How would you define formally a "Univariate random variable" ? Note that this term is not in the article "Random variable".
2° Don't you agree that there is a gap that needs to be filled between the univariate definitions and the general definition ? I totally agree if someone helps me to improve my possibly inadequate modification.
3° As far as terminology is concern, here is a reference for bivariate (multivariate is a similar extension) http://books.google.be/books?id=lvF19OwEFekC&lpg=PA29&ots=UNfSi10t3l&dq=%22Univariate%20continuous%20random%20variable%22&pg=PA29#v=onepage&q=%22Univariate%20continuous%20random%20variable%22&f=false
212.63.234.4 (talk) 14:25, 3 October 2013 (UTC)
Ironically, the book pointed by you confirms my view and not yours! There I read (page 29): "bivariate continuous random variable is a variable that takes a continuum of values on the plane according to the rule determined by a joint density function defined over the plane. The rule is that the probability that a bivariate random variable falls into any region on the plane is equal..."
Exactly so! (a) Nothing special is assumed about the probability space; moreover, the probability space is not mentioned. Only the distribution matters. (b) it is exactly what I called a random vector (since a point of the plane is usually identified with a pair of real numbers, as well as a 2-dim vector). I do not insist on the word "vector"; but note: bivariate means values are two-dimensional (values! not the points of the probability space, but rather their images under the measurable map from the probability space to the plane). Accordingly, expectation of a bivariate random variable is a vector (well, a point of the plane), not a number! And the formula "${\displaystyle X(x_{1})=x_{1}}$" is neither written nor meant.
How would I define formally a "Univariate random variable"? As a measurable map from the given probability space to the real line, of course. Boris Tsirelson (talk) 18:34, 3 October 2013 (UTC)
Surely it would be nice, to improve the article. But please, on the basis of reliable sources, not reinterpreted and mixed with your original research. Boris Tsirelson (talk) 18:50, 3 October 2013 (UTC)
Actually, I do agree that my contribution is not good enough. If you see any way to make it right, do not hesitate to transform it. Otherwise, just remove it. Thank you for your patience and involvement in this discussion.212.63.234.4 (talk) 11:32, 7 October 2013 (UTC)
OK, I've moved it to a place of more appropriate context, and adapted it a little to that place. Happy editing. Boris Tsirelson (talk) 14:43, 7 October 2013 (UTC)

## Intuitively?

What is intuitively supposed to mean in this instance? Hackwrench (talk) 01:15, 24 October 2015 (UTC)

There's only one use of "intuitively" in the article, so you must refer to the first sentence:
In probability theory, the expected value of a random variable is intuitively the long-run average value of repetitions of the experiment it represents.
It's "intuitively", because this isn't how it is defined for a variety of reasons. Wikipedia articles usually start with a definition of the thing they're about, so some word is needed to convey that what looks like a definition and is in place of a definition actually isn't a definition. The actual definition is currently at the end of the second paragraph:
The expected value of a random variable is the integral of the random variable with respect to its probability measure.
I actually opened the intro with this once upon a time, but some people weren't happy with it, even though the less technical definitions immediately followed and the next paragraph explained the interpretation as a long-run average. Go figure.
It's also "intuitively", because this tends to be how one thinks about expected value in statistics. "It's just like the mean, except for an infinite number of things, using the obvious limit at infinity." It's also couched in empirical statistical terms like "long-run", "repetitions", "experiments". The concrete (and real-world relatable) tends to be more intuitive than the abstract. -- Coffee2theorems (talk) 19:07, 11 December 2015 (UTC)

## "Unconscious" or "subconscious" statistician?

Maybe somebody can clarify this rather trivial point. Many American sources refer to the Law of The Unconscious Statistician (or LoTUS). However the reputable British probabilists G. Grimmett and D. Welsh refer to it as the Law of the subconscious statistician (see Grimmett, Welsh: Probability. An Introduction, 2nd Edition, Oxford University Press, 2014.)

To me, as an adjective, the word "subconscious" seems more appropriate, even though the acronym LoTSS does not sound as appealing as LoTUS. (The odds that any unconscious statistician would compute any expectation are very low. Moreover, if you googled the term "unconscious person" you would be sent to pages explaining how to administer first aid.) Does anybody know if there is a general agreement on this terminology, or this is one of those discrepancies between British English and American English?

---

In response:

On LOTUS: The term appears to have been coined by Sheldon Ross (it has been around many years now - I believe I first heard it about 30 years ago), and contains a deliberate pun; the substitution of "subconscious" for "unconscious" obliterates the joke. Glenbarnett (talk) 11:37, 28 September 2017 (UTC)

## Article needs a re-write (toned down from: WTF is this \$hit)

It's.... there is some stuff in the talk page about it being crap and I wont flaunt unprovable credentials around but it's fucking wank. Scrolling one-third down (I give you the TOC is large) we have finally covered summations, infinite summations and CRVs, great! WHERE'S THE MEASURE THEORY side, these are the same concept! I searched for "probability measure" it only occurs once in the second paragraph which looks more copied and pasted than anything with the last sentence being a note saying what I just said, the finite, countably finite and uncountable cases are special cases. Anyway the general definition is only given once, right after that someone just sticks +/- infinity as the integral limits and thinks they have it covered. WAY DOWN AGAIN we have "it's useful in quantum mechanics", Like 5/6ths down we hit "expectation of a matrix" which is a definition, then down to "special" cases.

• I have written better things when not on ritalin and immediately after a cocane-in-solution enema.

Sometimes things are beyond repair. I think the entire thing needs to be archived and it re-written. Also if it is "B" class, I'd hate to see a "C" class.

Wow! We are used to complaints that our mathematical articles are too advanced, that is, rather inaccessible (just look at the first item above, and "Frequently Asked Questions" on the top of Wikipedia talk:WikiProject Mathematics: "Are Wikipedia's mathematics articles targeted at professional mathematicians?", "Why is it so difficult to learn mathematics from Wikipedia articles?" etc etc). But it is the first time that I see a (quite passionate) complaint that a mathematical article is not enough advanced, that is, too accessible! Regretfully, this complaint is not signed; someone may suspect that it is written by a mathematician, and the ostentatiously vulgar language is intended to mask this. Boris Tsirelson (talk) 18:43, 17 March 2016 (UTC)
Why would one mask being a mathematician? I am just extremely disappointed with the disjoint mess the article is. I was just wondering something and there's just so much crap to wade though and there's not even very much on what should be the proper definition! I get making it accessible, I've been dealing with expectation since A-levels, but this article would help neither me back then nor now. I would re-write it myself but let's be honest, the Jesus' mathematician cousin would get his attempt reverted and then some tit who never ventures into mathematics (let alone that part of Wikipedia) but with (larger number) of edits under her belt would be like "here are some rules, reverting!" and nothing would change. I hope that you go back to the article and think "Whoa, that ADHD thing was perfect, it looks like it was written by crack-using ADHD suffer in a room with a device that made different sounds at random intervals" 90.199.52.141 (talk) 19:28, 17 March 2016 (UTC) (signed ;-) )
You sound rather immature. I'll attempt to take you seriously anyway.
Beginning with a too-general formulation obstructs readers in three ways. First, they may not understand the general formulation. Remember that very few people (in an absolute sense) ever hear of the Lebesgue integral, and fewer still understand it. For them, it is no help to say that all expected values can be put in a unified framework because that unified framework is incomprehensible to them. (If it is obvious to you, then good; but remember that you are exceptional.) Second, most readers are interested in applying expected values to some situation they have at hand. That situation is usually discrete (gambling, opinion surveys, some medical experiments) or univariate real (most real-world measurements). Proceeding from the general formulation to a formulation that can be applied in practice requires a little bit of effort. While it is a good exercise to derive specific formulas from the general one, Wikipedia is not a teaching tool. It is a reference, and a good reference includes formulas for important special cases. Third, people's understanding proceeds from special to general. Even young children can grasp discrete probability theory; nobody without significant experience grasps the Lebesgue integral. Even a reader whose goal is generality for its own sake cannot get reach maximum generality without grasping special cases. A reader who does not yet understand the general theory benefits from seeing special cases, like the discrete and univariate real cases, worked out in detail. Once they are both understood, the general case is more approachable.
For these reasons, I think the approach taken in the article is, in broad outline, excellent. That does not make the article perfect, but I would have a difficult time improving on the outline of its initial sections. Ozob (talk) 01:03, 18 March 2016 (UTC)
I do not understand the motivation for defining expected value as one type of average value alone. That would suggest that there is no expectation of a value for the Cauchy distribution, whereas the censored mean or median 24% of the distribution is asymptotically more efficient than the median as a measure of central tendency location. That means, logically, that one either expands the concept of expected value, or one defines some other measure of location that is sometimes expected value, and sometimes not. Let us take the example of the beta distribution, when it has a single peak (is not U-shaped), the median is a better location of "tendency" than the mean. Now if we do not mean "tendency" when we are describing "expectation" then we wind up with semantic gibberish. I do not see a way out of this quagmire and ask for help on this.CarlWesolowski (talk) 17:37, 27 June 2016 (UTC)
I am not quite understanding you. Yes, definitely, there is no expectation value for the Cauchy distribution. In such a heavy-tail case, yes, median is much better than expectation... if you want to determine the location parameter. But if you want to predict the sample mean in a sample of 1000 values (not censored!), then the median is cheating: no, that mean will not be close to it (and the expectation says the "sad" truth: you really cannot predict the sample mean). Boris Tsirelson (talk) 21:05, 27 June 2016 (UTC)

Thank-you for responding. Part of my problem was completely not understanding what statisticians do to English. Now I do a bit better. Expectation is not always a valid measure of location for a random variable. As a pure mathematical fact if we are taking about the Cauchy distribution as a continuous function, and not as a random variable, then it has a "mean" but obviously as a Cauchy distributed random variable it has no such thing. Proof: Take the x CD(x) integral of the continuous function from median minus k to plus k, that always exists and is both a censored mean and the median identically. Now, let k be as large as desired. It makes no difference to the Cauchy distribution as a continuous function what one calls the peak. It only makes a difference when a random variable is Cauchy distributed. A fine point to be sure. But when you talk in shorthand and say a Cauchy distribution has no expected value you leave out words like "Cauchy distributed random variable." which for me was a hurdle. CarlWesolowski (talk) 04:14, 9 September 2016 (UTC)

I see. Yes, expectation (=mean) may be thought of probabilistically or analytically. For me, the probabilistic approach is the first, since it is the source of motivation. But even analytically, if we define expectation as the integral of x times the density, then we see an improper integral, and it diverges (for Cauchy distribution, I mean). However, it has Cauchy principal value, and this is what you mean (as far as I understand). Still, I would not like such terminology as "expectation of a random variable" and "expectation of its distribution" defined to be nonequivalent. Boris Tsirelson (talk) 04:59, 9 September 2016 (UTC)
I think that the probabilistic and analytic interpretations are the same. It is only when one replaces the integral used in defining expectation with the Cauchy principal value that one runs into confusion. This problem is more acute if one uses Riemann integration instead of Lebesgue integration; then all integrals defined on R must be defined as improper integrals and the Cauchy principal value arises accidentally. Ozob (talk) 03:17, 10 September 2016 (UTC)

## Wrong formula in "General definition"

It was

${\displaystyle \operatorname {E} [g(X)]=\int _{-\infty }^{\infty }g(x)\,\mathrm {d} \mathrm {P} (X\leq x)=}$${\displaystyle {\begin{cases}g(a)+\int _{a}^{\infty }g'(x)\mathrm {P} (X>x)\,\mathrm {d} x&\mathrm {if} \ \mathrm {P} (g(X)\geq g(a))=1\\g(b)-\int _{-\infty }^{b}g'(x)\mathrm {P} (X\leq x)\,\mathrm {d} x&\mathrm {if} \ \mathrm {P} (g(X)\leq g(b))=1\end{cases}}}$

and then (after an edit by User:Rememberpearl)

${\displaystyle \operatorname {E} [g(X)]=\int _{-\infty }^{\infty }g(x)\,\mathrm {d} \mathrm {P} (X\leq x)=}$${\displaystyle {\begin{cases}g(a)\mathrm {P} (X\geq a)+\int _{a}^{\infty }g'(x)\mathrm {P} (X>x)\,\mathrm {d} x&\mathrm {if} \ \mathrm {P} (g(X)\geq g(a))=1\\g(b)\mathrm {P} (X\leq b)-\int _{-\infty }^{b}g'(x)\mathrm {P} (X\leq x)\,\mathrm {d} x&\mathrm {if} \ \mathrm {P} (g(X)\leq g(b))=1\end{cases}}}$

but both versions are evidently wrong. Indeed, the values of the function g for x<a matter in the left-hand side, but do not matter in the right-hand side. Probably, more assumptions on g are needed. Boris Tsirelson (talk) 17:15, 26 September 2016 (UTC) Moreover, I wrote it already in 2011, see #Simple_generalization_of_the_cumulative_function_integral above. Boris Tsirelson (talk) 17:21, 26 September 2016 (UTC)

Now correct, thanks to User:Hgesummer. Boris Tsirelson (talk) 06:03, 14 October 2016 (UTC)

## Basic properties section is bloated and no longer so basic

The section on basic properties was once actually useful to point beginning students to who were coming to grips with basic issues like linearity of expectation.

Now the "Basic properties" section is brim full of stuff that is utterly useless to the beginner - it's not even in a useful order. This is not a mathematics text. It's not an article on "all the things I managed to prove when I learned about expectation". It's an article that should be as accessible as possible to a fairly general audience.

The section is now frankly worse than useless as a reference. The entire section should simply be rolled back to before the addition of all the extra crud. Take it back a year and keep it clean. If you must add 50 properties that 99% of people reading the article will never use even once, put all that crud in a new section called "Further properties" or something.

This used to be a half-decent article. Now I'm embarrassed to link to it. Glenbarnett (talk) 11:30, 28 September 2017 (UTC)

What is the problem? You feel you know which properties are basic and which are "further". Just start the "further" section and divide the properties. Does anyone object? Boris Tsirelson (talk) 12:16, 28 September 2017 (UTC)
Hi Glenbarnett,
1. Which exactly basic properties are not "useful" for beginners, in your opinion? Basic properties are basic in the sense that they derive from the corresponding property of Lebesgue integral. It might be worth adding a sentence explaining this, to set the expectations right.
2. I, personally, don't like the "Non-multiplicativity" section: IMHO, it's not very informative, and its content is redundant. Should we get rid of it?
3. Regarding "all the things I managed to prove", I'm sure you've noticed that not every property has been proved but only those whose proof shows something important about the field. Methodology, that is.
4. The article is already "as accessible as possible to a fairly general audience". No one wants to make things harder than they ought to be.
Cheers. StrokeOfMidnight (talk) 23:45, 28 September 2017 (UTC)

## Proving that X=0 (a.s.) when E|X|=0

@Tsirel: I changed the proof because the one based on M's inequality has a logic gap: the first sentence ("For non-negative random ...") is false. We prove that P(X>a)=0, for every a>0, but don't transition to P(X≠0)=0, which is what needs to be proven. By the way, it is not at all harder to prove this fact directly, so why even bother with M's inequality. StrokeOfMidnight (talk) 15:52, 2 October 2017 (UTC)

Well, I do not insist; do it if you insist, but do not be astonished if someone else will object. For a mathematician maybe the direct proof is a bit more illuminating, but for others (the majority!) it is unnecessary longer (I feel so). (And more generally, your writings tend to smell advanced math, which irritates the majority; we are not on a professional math wiki like EoM, and the expectation is of interest for many non-mathematicians.) About the problem of a>0 you surely see how to correct this error readily; this not a reason to avoid Markov inequality. Boris Tsirelson (talk) 20:28, 2 October 2017 (UTC)
First, your points are well taken. I think, the simplest way not "to irritate the majority" is to make proofs "collapsible". I will look into that, but if I'm too busy, someone else can do that too. Second, the only reason I don't want to use Markov's inequality is that it doesn't make this particular proof shorter. Surely, in different circumstances, this inequality would be indispensable. StrokeOfMidnight (talk) 21:07, 2 October 2017 (UTC)
Update. So, I've made two proofs (incl. the contentious one) hidden by default. Will this, in your opinion, address the majority crowd? If so, what else should be hidden? StrokeOfMidnight (talk) 21:39, 2 October 2017 (UTC)
Somewhat better. However, proofs are generally unwelcome here (and by the way, on EoM as well). If in doubt, ask WT:MATH. A proof is included only if there is a special reason to make exception for this proof. If you want to write a less encyclopedic, more textbook-ish text, you may try Wikiversity (WV). Yes, it is much less visited than WP. However, it is possible to provide a link from a WP article to a relevant WV article (if the WP community does not object, of course); this option is rarely used, but here is a recent example: the WP article "Representation theory of the Lorentz group" contains (in the end of the lead) a link to WV article "Representation theory of the Lorentz group". Boris Tsirelson (talk) 05:49, 3 October 2017 (UTC)

## Infinite expectation

In many phrases of this article it is meant that existence of expectation does not automatically imply that it is finite. Well, one often say "this expectation is plus infinity" (or "minus infinity"). Nevertheless for some authors "existence of expectation" excludes infinite cases. Anyway, the article does not explain in detail, what is meant by infinite expectation, and is not consistent in this aspect. In particular, Section "Finite case": what about a random variable that is equal to plus infinity with probability 1/2 and to minus infinity with probability 1/2? Section "Countably infinite case": "If the series does not converge absolutely, we say that the expected value of X does not exist" — really? Section "Basic properties": "${\displaystyle \operatorname {E} |X|}$ exists and is finite" — but if the infinite case is included into "exists", then the word "exists" here is superfluous. And so on. Boris Tsirelson (talk) 15:28, 8 October 2017 (UTC)

You hit a nail on the head. I haven't been using it consistently, and, yes, I am aware of this. Now that you have pointed this out, I will fix it. The convention I follow is this: if ${\displaystyle X}$ is measurable, then ${\displaystyle \operatorname {E} [X]}$ (by def.) exists unless ${\displaystyle \operatorname {E} [X_{+}]=\operatorname {E} [X_{-}]=+\infty }$. I think, this is the best approach. Otherwise, functions can be legally infinite but expectations can't be. When a function is infinite, no one says the function doesn't exist. Note that conditional expectations are functions, too. So, once we've legitimized infinity in one place, we may want to legitimize it everywhere.
On the verbal side, one can unambiguously say "e. v. exists (and possibly infinite)" instead of just "e. v. exists". And "e. v. is finite" is an obvious shortcut for ${\displaystyle \operatorname {E} [X_{+}]<\infty }$ and ${\displaystyle \operatorname {E} [X_{-}]<\infty }$.
I may not have much time left this weekend to handle this, but about a week from now I should. StrokeOfMidnight (talk) 16:34, 8 October 2017 (UTC)
Nice. I support the same convention "exists unless ${\displaystyle \operatorname {E} [X_{+}]=\operatorname {E} [X_{-}]=+\infty }$". I just want the article to be consistent. Measurability should not be mentioned more than once, since a non-measurable function is never called a random variable. And maybe it should be noted that some sources use another convention: "exists unless ${\displaystyle \operatorname {E} [X_{+}]=+\infty }$ or ${\displaystyle \operatorname {E} [X_{-}]=+\infty }$". Boris Tsirelson (talk) 17:39, 8 October 2017 (UTC)
By the way, see Talk:Student's t-distribution#Undefined expectation versus infinite expectation versus non-existent expecations. Boris Tsirelson (talk) 14:16, 10 October 2017 (UTC)
The best thing to do, I think, is to say "finite", "infinite", and "neither finite nor infinite". If we really want to say "exists" or "doesn't exist", we should give an immediate clarification in parentheses, as in: "if X has a Cauchy distribution, then E[X] does not exist (finite or infinite)"; or "if a moment is not finite, then neither are the higher ones". That way, we don't need a special conventions for the meaning of "exists", and even if we use this word inconsistently, we will be forgiven because we always explain ourselves immediately and unambiguously. StrokeOfMidnight (talk) 20:30, 11 October 2017 (UTC)

Okay, this is how I handle one of the problems you pointed out. First, the definition must be simple, so no fiddling with ${\displaystyle \operatorname {E} [X_{+}]}$ and ${\displaystyle \operatorname {E} [X_{-}]}$. On the other hand, it must be consistent with the main definition. The solution is to restrict the notion of "Countably infinite case" to the situations where the series converges absolutely. This simple definition does not extend to the cases when there is no absolute convergence even if the number of outcomes is countably infinite. StrokeOfMidnight (talk) 21:36, 13 October 2017 (UTC)

Regarding "exist"/"does not exist" terminology, the article should be consistent now. StrokeOfMidnight (talk) 23:03, 13 October 2017 (UTC)

Yes, now it is consistent, at the expense of excluding the infinity. The reader may think that the negation of "finite expectation" is "infinite expectation", which is an oversimplification. Boris Tsirelson (talk) 09:51, 14 October 2017 (UTC)
This last one can be fixed by explaining what "non-finite" means. E.g., "${\displaystyle \operatorname {E} [X]}$ is non-finite, i.e. ${\displaystyle \max(\operatorname {E} [X_{+}],\operatorname {E} [X_{-}])=\infty }$" which would have to be done every time. StrokeOfMidnight (talk) 11:36, 14 October 2017 (UTC)
Yes... but one may ask: so, what exactly is called a non-finite expectation? Clearly, not a real number; but is it ${\displaystyle +\infty ,}$ or ${\displaystyle -\infty ,}$ or ${\displaystyle \pm \infty ,}$ or something else?
"${\displaystyle \operatorname {E} [X]}$ is neither finite nor infinite, i.e. ${\displaystyle \operatorname {E} [X_{+}]=\operatorname {E} [X_{-}]=\infty }$"; so, ${\displaystyle \operatorname {E} [X]=?}$
Also, "If the probability distribution of ${\displaystyle X}$ admits a probability density function ${\displaystyle f(x)}$, then the expected value can be expressed through the following Lebesgue integral: ${\displaystyle \operatorname {E} [X]=\int _{-\infty }^{\infty }xf(x)\,dx}$"; does it mean that this Lebesgue integral exists necessarily? and what happens otherwise? Boris Tsirelson (talk) 17:49, 14 October 2017 (UTC)
On the first one, do you like "undefined" better? On the second one, I think it says it all: both sides are finite, infinite, or undefined simultaneously. StrokeOfMidnight (talk) 21:16, 14 October 2017 (UTC)
First: yes, "undefined" and "does not exist" are equally OK with me.
Second: maybe; but should the reader guess? Generally, "a=b" often means "both defined, and equal" or even "both finite, and equal". Would you like, for example, the equality ${\displaystyle \int _{-\infty }^{+\infty }x\,dx=0^{0}}$ ? Boris Tsirelson (talk) 17:44, 15 October 2017 (UTC)
On the first point, both "undefined" and "doesn't exist" are fine with me. Or, we could just say point blank ${\displaystyle \operatorname {E} [X]=\infty -\infty }$. That would answer your question "${\displaystyle \operatorname {E} [X]=}$?" On the second one, I think that the editor who added this formula got it exactly right, but what wording do you suggest? Am I missing some hidden subtlety here? Anyhow, I'll deal with all this (and more) next weekend. StrokeOfMidnight (talk) 21:03, 15 October 2017 (UTC)
On the second: I bother about mathematical maturity of the reader, not the writer. Wording: "The two sides may be both finite, both ${\displaystyle +\infty ,}$ both ${\displaystyle -\infty ,}$ or both undefined." On the first: the set ℝ ∪ {–∞, +∞} is the well-known extended real number line, but the set ℝ ∪ {–∞, +∞, ∞-∞ } may be criticized as WP:OR. Boris Tsirelson (talk) 05:30, 16 October 2017 (UTC)
If that's the case, then we could DEFINE ${\displaystyle \operatorname {E} [X]=\int _{-\infty }^{+\infty }xf(x)\,dx}$ in the section called "absolutely continuous case". Then, in "General case", we would remark that the general definition is consistent with those given earlier for special cases. We may also want to prove the formula, to show that the sub-definition is indeed consistent.
There are two advantages in doing it this way: first, this keeps the definition simple; second, we've already used this approach in finite and countably infinite cases, so why not be methodologically consistent? StrokeOfMidnight (talk) 20:48, 16 October 2017 (UTC)
OK with me. Boris Tsirelson (talk) 06:46, 17 October 2017 (UTC)

## Absolutely continuous case

The recently added remarks are strange. I never saw such Lebesgue integral taken over ${\displaystyle [-\infty ,+\infty ],}$ but always over ${\displaystyle (-\infty ,+\infty ).}$ For a finite interval, yes, it is appropriate to say that the endpoints are a negligible set. But infinities are never (well, maybe seldom) included; and a function (as the density) is never (well, maybe seldom) defined on ${\displaystyle [-\infty ,+\infty ];}$ the sumbol ${\displaystyle f(+\infty )}$ usually means the limit, not the value, of the function at infinity. And by the way, a density need not have a limit at infinity. If it has, the limit is zero, of course. But a density may have narrow peaks accumulating at infinity. Boris Tsirelson (talk) 07:36, 21 October 2017 (UTC)

Regarding the third sentence you wrote: finite or infinite (in the geometric sense), Lebesgue integral doesn't care. From the Lebesgue integral's point of view, everything is finite in our case, as ${\displaystyle \operatorname {P} (\Omega )=1}$.
Re the fourth sentence (first part): we don't choose whether infinities are included. It's up to the random variable to decide. I wish we could change the reality on a whim. :)
Re the fourth sentence (second part): you are right; density function may or may not be explicitly defined for any ${\displaystyle a\in [-\infty ,+\infty ]}$, and it's of no consequence, as ${\displaystyle \operatorname {P} (X=a)=0}$ (for absolutely continuous random variables), meaning that ${\displaystyle a}$ does not contribute to ${\displaystyle \operatorname {E} [X]}$.
Of course there are meaningful cases when the density function has no limit at infinity.
So, the definition in the article goes like this: first, we define ${\displaystyle \operatorname {E} [X]}$ using some notation ${\displaystyle \textstyle \int _{+-\infty }^{+\infty }}$ which no one knows what that notation means. Then we clarify that, in fact, the outcome of the integration doesn't change whether we integrate over ${\displaystyle [-\infty ,+\infty ],}$ ${\displaystyle (-\infty ,+\infty ],}$ ${\displaystyle [-\infty ,+\infty ),}$ or ${\displaystyle (-\infty ,+\infty ),}$ so we may as well integrate over ${\displaystyle (-\infty ,+\infty )}$. For those unfamiliar with Lebesgue integration, we quickly point out that they can often view it as a Riemann integral. By the way, the notation ${\displaystyle \textstyle \int _{+-\infty }^{+\infty }}$ is novice-friendly, which is what we try to achieve. The novice will instantly assume that the integral is over ${\displaystyle (-\infty ,+\infty )}$ which is fine. StrokeOfMidnight (talk) 11:47, 21 October 2017 (UTC)
UPDATE. I just thought of something. For the absolutely continuous case, we could ASSUME that the random variable takes its values in ${\displaystyle (-\infty ,+\infty ).}$ This will satisfy the practitioners and avoid playing too much with infinity. StrokeOfMidnight (talk) 12:24, 21 October 2017 (UTC)
Do we ever consider ${\displaystyle f(X),}$ the composition of the random variable and its density? Probably, not. The density is the Radon-Nikodym derivative of the distribution of X w.r.t. Lebesgue measure, right? I never saw Lebesque measure on ${\displaystyle [-\infty ,+\infty ],}$ but always on ${\displaystyle (-\infty ,+\infty ).}$ (No problem to define it, of course; but who needs it?) And if so, then the density need not be defined at infinity, even if X takes this value (on a negligible set). And if it does so with positive probability, then we have no idea of its "density", right? Boris Tsirelson (talk) 15:05, 21 October 2017 (UTC)
And anyway, the Radon-Nikodym derivative is not quite a function, but rather an equivalence class, right? Thus, it need not be defined on the negligible set (of the two infinities), even if they belong to our space... Boris Tsirelson (talk) 15:27, 21 October 2017 (UTC)
I added the assumption that ${\displaystyle X}$ only takes on finite values. StrokeOfMidnight (talk) 15:39, 21 October 2017 (UTC)
Yes, but it is not clear what is required: finiteness almost surely, or finiteness surely? It is unusual to say "surely" when it is possible to say "almost surely". Boris Tsirelson (talk) 05:47, 22 October 2017 (UTC)
About linearity: yes, now it is as general as possible, but... it is no more linearity! Linearity should be linearity of a mapping from one linear space to another. Here it is ${\displaystyle L_{1}(\Omega )\to \mathbb {R} .}$ Your more general property could be better called additivity.
And generally, you are at risk of "original research" accusation; you write what you like, in the form you like, while you should write what is written in textbooks. And again, proofs are generally unwelcome. Boris Tsirelson (talk) 05:47, 22 October 2017 (UTC)
Re linearity: ${\displaystyle L_{1}\to \mathbb {R} }$ is too restrictive. There are very important cases that must be accounted for when ${\displaystyle \operatorname {E} [X]=\infty .}$ I guess, we could have two versions of this. First, we could state the linearity for ${\displaystyle L_{1},}$ then say "more generally" and give the more general version which, incidentally, comes from a textbook. (By the way, random variables are regular functions, so we aren't dealing with ${\displaystyle L_{1}}$ here, but I know what you mean).
On the original research theme, oh please. This subject (def. and basic properties of E) has been beaten to death, so you couldn't do original research here even if you wanted to. And, yes, about 95% of the stuff I write comes form textbooks. Challenge is: some textbooks define Lebesgue integrals differently, some have occasional gaps in proofs, etc. So, writing about something methodically is not a slam-dunk exercise, no matter where the information comes from.
Regarding proofs, yes, some of them are unwelcome (like the proof of Fermat's Last Theorem), and some are not. There is no consensus in the discussion you linked to a week or two ago. Plus, nobody knows who is going to read this particular article, and all the proof are hidden by default, so it's beyond me who would be bothered by them.
Bottomline, if someone else can cover this better, clearer, more consistently and methodically, DO IT. WP is collaborative and thrives on that. StrokeOfMidnight (talk) 12:25, 22 October 2017 (UTC)

You've made an edit that confuses me. So, you want to allow ${\displaystyle X}$ to be infinite (although it seemed like you were initially against it)? No problem. That's exactly what I did two days ago, and you disagreed. With the latest edit in mind, what is ${\displaystyle \textstyle \int _{-\infty }^{+\infty }}$? Since ${\displaystyle X}$ may be infinite, you are now integrating over ${\displaystyle [-\infty ,+\infty ]}$. I used to have a remark saying just that: it doesn't matter if you include the endpoints. I can, of course, put that remark back, but that would unnecessarily complicate the definition. I thought, we agreed that we should ASSUME that all our random variables are finite. And then you changed your mind.

Can you explain what exactly are you trying to do, so we don't make conflicting edits? StrokeOfMidnight (talk) 20:23, 22 October 2017 (UTC)

Yes, sure I can explain, with pleasure. But I am confused, too, since I believe I did already! Once again: we integrate over the distribution of the random variable. A negligible set does not influence the distribution at all. It is still concentrated on ${\displaystyle (-\infty ,+\infty ).}$ On this space we take its density (that is, Radon-Nikodym derivative w.r.t. Lebesgue measure). And then we use this density in order to calculate the expectation. No need to consider ${\displaystyle [-\infty ,+\infty ]}$ (this is needed only when the infinity is of positive probability). Absolute continuity means absolute continuity w.r.t. Lebesgue measure; Lebesgue measure is concentrated on ${\displaystyle (-\infty ,+\infty );}$ thus absolute continuity requires "almost surely finite"; but it does not require "surely finite". Do not hesitate to ask more questions whenever needed. Boris Tsirelson (talk) 21:12, 22 October 2017 (UTC)
Sorry, not true. Absolute continuity does not require that X is finite (a.s.). The fact that X is finite (a.s.) follows from absolute continuity. In other words, once we've assumed that the cumulative distribution function F is absolutely continuous, then it follows that X is finite (a.s.). Therefore, assuming up front that X is finite (a.s.) is at best redundant, and actually breaks the logic chain by confusing assumption with conclusion. By the way, I used to have a remark explaining to that effect.
I understand that you're a very beginner. (And I don't say this in a bad way at all; in fact, you probably see when there are readability issues better than I do). But wouldn't you be better off getting a decent probability and/or real analysis book and reading it? StrokeOfMidnight (talk) 18:34, 23 October 2017 (UTC)
Not quite a very beginner; see Boris Tsirelson. (And if you suspect that he is not me, find a link to my user page here on "his" professional page.)   :-)   I understand that you are not inclined "to google" your collocutor... Boris Tsirelson (talk) 18:58, 23 October 2017 (UTC)
Back to the business. I am not a native English speaker; maybe this is why I do not see a difference between "A requires B", "A implies B", and "B follows from A". But till now my math texts were usually understood by others. Well, on an article feel free to correct my English as needed. But on a talk page, I think, we may be less exacting, and trying harder to understand each other. Boris Tsirelson (talk) 19:06, 23 October 2017 (UTC)
Thanks for explaining this. Sorry if I came across as being overly aggressive. That wasn't my intention. This is supposed to be a constructive discussion.
Anyhow, what I'm going to do is re-add this remark and remove the current wording about X being finite (a.s.) from the first sentence. The result will look like this version, with "probability distribution" replaced with "cumulative distribution function". StrokeOfMidnight (talk) 19:51, 23 October 2017 (UTC)
Your words: This subject has been beaten to death, so you couldn't do original research here even if you wanted to. And, yes, about 95% of the stuff I write comes form textbooks. My question: How many of these textbooks mention (Lebesgue) integration over the extended real line ${\displaystyle [-\infty ,+\infty ]}$ (a) at all, (b) in relation to the formula ${\displaystyle \textstyle \operatorname {E} [X]=\int _{-\infty }^{+\infty }xf(x)\,dx?}$ Boris Tsirelson (talk) 20:46, 23 October 2017 (UTC)
Actually, these are unfair questions. To answer them, I would have to recall how each textbook I ever read sets up its notations. With that said, I do have an answer for you. (Two actually). First, some textbooks write ${\displaystyle \textstyle \int xf(x)\,dx}$ with no integration limits. Second, every time a textbook writes ${\displaystyle \textstyle \int _{\Omega }X\,dP,}$ it means ${\displaystyle \textstyle \int _{[-\infty ,+\infty ]}x\,dF,}$ as these two are one and the same.
My question now is: for every Lebesgue integral, one must specify the exact set being integrated over, but ${\displaystyle \textstyle \int _{-\infty }^{+\infty }}$ doesn't do that. So, how, please tell, are you going to explain to an astute reader what the integration domain, in this particular case, is? StrokeOfMidnight (talk) 18:12, 24 October 2017 (UTC)
On your first paragraph: you did not let me write "requires" instead of "implies"; and now you write "it means" instead of "it is equal to"? The equality between these integrals is a special case of the measure change theorem, not at all a definition of the former integral. Terminology aside, I still think that your equality is of course true, but seldom (if at all) written in textbooks. I understand that you do not recall how each textbook you ever read sets up its notations. But, unless/until you find some examples of using the extended real line as the domain of integration in probability textbooks, I think this is quite unusual.
On your second paragraph: true, ${\displaystyle \textstyle \int _{-\infty }^{+\infty }}$ doesn't do that, unless one adds a clarification that this means (as usual) integration over the real line ${\displaystyle (-\infty ,+\infty ).}$. Is this bad for the astute reader? Boris Tsirelson (talk) 18:50, 24 October 2017 (UTC)
True, this approach does not serve "random variables" that fail to be finite almost surely. But I doubt they should be called "random variables" at all, and the more so, in the context of expectation. I do bother about infinite expectation, but only under finitesess almost surely. Boris Tsirelson (talk) 19:04, 24 October 2017 (UTC)
You may ask: if so, why "finite a.s.", why not just "finite surely"? Here is why. In the Lebesgue theory one often speaks about (measurable) functions, but one usually treats them up to equivalence (that is, equality almost everywhere); one just is lazy to speak about equivalence classes, but these are implicitly meant. Equivalent functions are exchangeable in most cases. Similarly, in probability theory, we define random variables as (measurable) functions, but usually we mean rather their equivalence classes, and avoid notions sensitive to a change on a set of probability zero. There is an exception: the theory of random processes often deals with uncountable families of random variables; there one must be very careful, since a family of equivalence classes is quite different from an equivalence class of families. But in this article we never deal with uncountable families of random variables, and have no reason to ever say "surely" instead of the habitual "almost surely". A (real-valued) random variable has a value, and its value is a real number, almost surely. No one bothers, what at all happens on a set of probability zero (since that set never contributes to probabilities, therefore, to distributions, expectations etc.) Boris Tsirelson (talk) 19:14, 24 October 2017 (UTC)
Much ado about nothing. I think, I understand what you are saying, more or less. I'm going to make a change, and let's see if we agree. StrokeOfMidnight (talk) 19:22, 24 October 2017 (UTC)
OK with me. Boris Tsirelson (talk) 19:43, 24 October 2017 (UTC)

## Circular logic

I've accidentally introduced some circular logic into the proofs, so I'm going to make a few changes to fix that. StrokeOfMidnight (talk) 00:34, 22 October 2017 (UTC)