Welcome to the mathematics section
of the Wikipedia reference desk.

Select a section:

WP:RD/MA

Want a faster answer?

Main page: Help searching Wikipedia

How can I get my question answered?

Select the section of the desk that best fits the general topic of your question (see the navigation column to the right).
Post your question to only one section, providing a short header that gives the topic of your question.
Type '~~~~' (that is, four tilde characters) at the end – this signs and dates your contribution so we know who wrote what and when.
Don't post personal contact information – it will be removed. Any answers will be provided here.
Please be as specific as possible, and include all relevant context – the usefulness of answers may depend on the context.
Note:
- We don't answer (and may remove) questions that require medical diagnosis or legal advice.
- We don't answer requests for opinions, predictions or debate.
- We don't do your homework for you, though we'll help you past the stuck point.
- We don't conduct original research or provide a free source of ideas, but we'll help you find information you need.

Ready? Ask a new question!

How do I answer a question?

Main page: Wikipedia:Reference desk/Guidelines

The best answers address the question directly, and back up facts with wikilinks and links to sources. Do not edit others' comments and do not give any medical or legal advice.

June 20

Approximation

Hi. Just did a show that involving an approximation and I'm not convinced that what I've done is rigorous. Could someone please check it for me?

"Let ${\frac {1}{y}}={\frac {2+3x^{2}}{3(1+x^{2})^{1.5}}}+{\frac {1}{3}}$ . Show that, for large positive x, $y\approx 3-{\frac {9}{x}}$ ."

My method goes like this. for large positive x $1+x^{2}\approx x^{2}$ and so $(1+x^{2})^{1.5}\approx x^{3}$ . Also for large positive x, $2+3x^{2}\approx 3x^{2}$ so ${\frac {2+3x^{2}}{3(1+x^{2})^{1.5}}}\approx {\frac {3x^{2}}{3x^{3}}}={\frac {1}{x}}$ so ${\frac {1}{y}}\approx {\frac {1}{x}}+{\frac {1}{3}}$ . Rearranging this gives $y\approx {\frac {3x}{3+x}}={\frac {3x+9}{3+x}}-{\frac {9}{3+x}}=3-{\frac {9}{3+x}}$ . Finally, for large positive x, $3+x\approx x$ so $y\approx 3-{\frac {9}{x}}.$

Is this a solid argument? Thanks 92.2.16.39 (talk) 10:09, 20 June 2009 (UTC)[reply]

Yes, I think that's spot on.. Rkr1991 (talk) 14:15, 20 June 2009 (UTC)[reply]

Yes, that looks fine to me. The statement "for large positive x,

y\approx 3-{\frac {9}{x}}

" isn't a rigorous statement, so there is no way your "proof" could be rigorous. What you've got is good enough, in my opinion. If the question had been to show that

y=3-{\frac {9}{x}}+O({\frac {1}{x^{2}}})

or something (or whatever the appropriate notation is, I can never remember the differences between big-O, little-O, etc. without looking them up), then you could get a proper rigorous proof (probably along the same lines as the argument you gave). --Tango (talk) 14:26, 20 June 2009 (UTC)[reply]

Hm, either two curves are asymptotic or they're not, meseems. —Tamfang (talk) 04:54, 25 June 2009 (UTC)[reply]

You may prefer to simplify first and approximate afterwards.

y={\frac {3(1+x^{2})^{1.5}}{2+3x^{2}+{(1+x^{2})^{1.5}}}}=3-9x^{-1}\cdot {\frac {2x^{-2}+3}{6x^{-3}+9x^{-1}+3{(1+x^{-2})^{1.5}}}}\approx 3-9x^{-1}

.

Bo Jacoby (talk) 14:37, 20 June 2009 (UTC).[reply]

Car dynamics

Please tell me about the algorithm of finding International Roughness Index(of a road surface)from a longitudinal profile data. —Preceding unsigned comment added by 113.199.158.140 (talk) 12:36, 20 June 2009 (UTC)[reply]

Googleing 'International Roughness Index' leads to the fortran program at http://www.umtri.umich.edu/content/IRIMain.f . Bo Jacoby (talk) 13:51, 20 June 2009 (UTC).[reply]

I wanted to have the algorithm and not the program. —Preceding unsigned comment added by 113.199.170.182 (talk) 02:47, 21 June 2009 (UTC)[reply]

What's the difference? A programming language is just a language for writing algorithms. --Tango (talk) 03:09, 21 June 2009 (UTC)[reply]

The program contains the following comment: 'For more information about the IRI and how this program works, see Sayers, M. W., "On the Calcluation of International Roughness Index from Longitudinal Road Profile." Transportation Research Record 1501, (1995) p. 1-12. See http://spyder.umtri.umich.edu/erd_soft/erd_file.html on the web for a description of ERD files'. So now it is time for the OP to do some homework. Bo Jacoby (talk) 10:20, 21 June 2009 (UTC).[reply]

Necessary and sufficient conditions

I'm a little hazy on N&S conditions so wanted to check my thoughts with people who can tell me if I'm right or wrong. Let $f(x)=x^{2}-2bx+c$ . I have to find N&S conditions on b and c a) for f(x) to have two distinct real roots and b) for f(x) to have two distinct positive real roots.

For a) I get that $c-b^{2}<0$ just from using the fact that if the turning point is below the x axis, you've got two distinct roots.

For b) I again get $c-b^{2}<0$ , for the same reason as before. I then get c>0, which ensures that the roots are of the same sign and b>0, which says the minimum point occurs in the first quadrant (at least I think that's its name; I mean bottom right). So the y-intercept is positive, both the roots are of the same sign and the turning point is in the first quadrant, so between them, they say that the roots are distinct and positive.

Are they necessary and sufficient or not? Thanks 92.4.255.16 (talk) 17:52, 20 June 2009 (UTC)[reply]

"A is necessary for B" means "B implies A". "A is sufficient for B" means "A implies B". Therefore to prove that something is necessary and sufficient you need to prove it in both directions. Sometimes you can do both directions at once, but if you're not sure it is best to do them separately. You need to come up with a sequence of implications from A to B and then a sequence from B to A. While considering the graphs is a good idea in order to get an intuitive answer, your rigorous proof should probably be purely algebraic. --Tango (talk) 18:01, 20 June 2009 (UTC)[reply]

The quadratic formula gives you the roots of f as

x={\frac {2b\pm {\sqrt {4b^{2}-4c}}}{2}}=b\pm {\sqrt {b^{2}-c}}

. The answer to both questions follow naturally from there. For question a you are correct; the expression in the radical must be positive for the radical to be real. For question b, it is sufficient and necessary for the smaller root to be real and positive. I'll let you work the algebra to see where that gets you. --COVIZAPIBETEFOKY (talk) 18:24, 20 June 2009 (UTC)[reply]

~~Your solution to (b) is almost right. The x-coordinate of the vertex is ${\frac {-b}{2a}}$ (note the minus sign). So to be in the fourth quadrant, you need $b<0$~~ . -- Meni Rosenfeld (talk) 19:37, 20 June 2009 (UTC)[reply]

Where has the 'a' come from? Are you possibly confusing the f(x) I gave above with the general form of a quadratic equation? 92.7.54.6 (talk) 19:41, 20 June 2009 (UTC)[reply]

Yes, COVIZAPIBETEFOKY was correct, and Meni is correct to say that you meant the fourth quadrant, and your conclusions were also correct, but see Tango's suggestion for a rigorous proof. Dbfirs 00:12, 21 June 2009 (UTC)[reply]

Of course, silly me. I was thinking about the general case, and didn't notice the differences in your case, most importantly that your b is already negated. So indeed it should be

b>0

. -- Meni Rosenfeld (talk) 08:50, 21 June 2009 (UTC)[reply]

Maximizing the volume in a certain class of sets

Has anybody already seen an optimization problem like this one:

Find the maximum value of the 3-dimensional volume of the subset of the unit cube:

{\hat {A}}:=\{(x,y,z)\in [0,1]\times [0,1]\times [0,1]\,:(x,y)\in A,(y,z)\notin A\}

among all measurable subsets $\textstyle A$ of the unit square $[0,1]\times [0,1]$ .

Incidentally, I know the answer (it is hidden here in case somebody prefers not to see it), but I'd like to settle this and other similar but more difficult problems into a general theory/method, if there is any available. --pma (talk) 21:17, 20 June 2009 (UTC)[reply]

Cute problem. I don't recall having seen it before. Maybe I'll try to work out where it came from. I wonder if thinking of it in the language of probability could shed any light. Michael Hardy (talk) 17:48, 21 June 2009 (UTC)[reply]

In the most general sense, this seems like calculus of variations. But that answer is probably useless. 67.122.209.126 (talk) 08:53, 22 June 2009 (UTC)[reply]

June 21

Finding the median of a continuous random variable

I am told that I need to solve $\int _{m}^{\infty }f(x)\,dx=0.5$ , and in the textbook example they get a polynomial with only m⁴, m² and m⁰ terms in it, which lets them solve a quadratic in m², take a square root and compare the results to the range which you are given at the start to find the right one, but I keep getting equations like $\int _{m}^{3}{\frac {2}{9}}x\left(3-x\right)\,dx=0.5$ , where the p.d.f f(x) of the continuous random variable X was the above for $0\leq x\leq 3$ , which eventually gets me to ${\frac {2}{27}}m^{3}-{\frac {1}{3}}m^{2}+0.5=0$ , but I don't know how to solve this. In another example, I had the p.d.f. f(y) of the continuous random variable Y being $12y^{2}\left(1-y\right)$ for $0\leq y\leq 1$ , which got me to $\int _{m}^{1}12y^{2}\left(1-y\right)\,dy=0.5\Rightarrow 3m^{4}-4m^{3}+0.5=0$ , but, again, I don't know what to do with this. It Is Me Here ^{t / c} 08:58, 21 June 2009 (UTC)[reply]

Well,

\int _{0}^{3}{\frac {2}{9}}x\left(3-x\right)\,dx=1

, and the symmetry of

x\left(3-x\right)

suggests that

\int _{0}^{3/2}{\frac {2}{9}}x\left(3-x\right)\,dx=\int _{3/2}^{3}{\frac {2}{9}}x\left(3-x\right)\,dx=0.5

so try m = 3/2 in your first problem. Your second problem is more difficult, as

y^{2}\left(1-y\right)

doesn't have an obvious symmetry to exploit. Gandalf61 (talk) 12:21, 21 June 2009 (UTC)[reply]

What's worse, Mathematica seems unable to find a representation of the solution simpler than direct substitution in the quartic formula. Is there any chance they want you to find a numerical solution (which is 0.614272...)?

By the way, the first problem can also be solved by the rational root theorem. -- Meni Rosenfeld (talk) 13:21, 21 June 2009 (UTC)[reply]

See root-finding algorithm for methods to solve equations numerically. Or draw the conclusion that the median is not that interesting anyway and use the mean value instead! Your examples are beta distributions. By the way, 0.614272 is an approximate solution to

3m^{4}-4m^{3}+0.5=0

. The J program p. 1 0 0 _8 6 does the trick. Bo Jacoby (talk) 13:42, 21 June 2009 (UTC).[reply]

OK, the actual question states: "The continuous random variable Y has p.d.f. f(y) defined by f(y) = 12y²(1 - y) for 0 ≤ y ≤ 1; f(y) = 0 otherwise. Show that, to 2 decimal places, the median value of Y is 0.61." It Is Me Here ^{t / c} 15:09, 21 June 2009 (UTC)[reply]

That's easy then. You know that

F(y)

(the cumulative distribution function) is continuous and increasing, so you only need to show that

F(0.605)<0.5

and

F(0.615)>0.5

. The intermediate value theorem will then say that there is a root

0.605<y<0.615

, which is thus equal to 0.61 to two decimal places. -- Meni Rosenfeld (talk) 18:47, 21 June 2009 (UTC)[reply]

Induction

Not a massive fan of induction and so I would like to check my solution to a problem with you guys.

"Let $S_{n}(x)=e^{x^{3}}{\frac {d^{n}}{dx^{n}}}(e^{-x^{3}})$ . Prove by induction on n that $S_{n}(x)$ is a polynomial."

Assume that $S_{n}(x)$ is a polynomial for n=k.

$e^{x^{3}}{\frac {d^{k}}{dx^{k}}}(e^{-x^{3}})=f(x)$

So ${\frac {d^{k}}{dx^{k}}}(e^{-x^{3}})=e^{-x^{3}}f(x)$

Differentiate the above wrt x.

${\frac {d^{k+1}}{dx^{k+1}}}(e^{-x^{3}})=e^{-x^{3}}f'(x)-3x^{2}e^{-x^{3}}f(x)$

Factorise

${\frac {d^{k+1}}{dx^{k+1}}}(e^{-x^{3}})=e^{-x^{3}}(f'(x)-3x^{2}f(x))$

$f'(x)-3x^{2}f(x)$ is still a polynomial for any f(x), so let us call it g(x).

So we have

${\frac {d^{k+1}}{dx^{k+1}}}(e^{-x^{3}})=e^{-x^{3}}g(x)$

So if the result is true for k, it is also true for k+1. Now let n=0 (the question doesn't actually state which set of numbers 'n' belongs to but I assume it's the non-negative integers), which gives us the case where $S_{0}(x)=1$ . So by induction, $S_{n}(x)$ is a polynomial for all integers n, n≥0.

Is that airtight? Thanks. asyndeton talk 14:15, 21 June 2009 (UTC)[reply]

Yes. —JAO • T • C 14:42, 21 June 2009 (UTC)[reply]

It looks fine to me, although I would have started with n=1. While defining the zeroth derivative to be the identity makes sense, I would normally think of n be positive in the context of n-th derivatives. --Tango (talk) 14:56, 21 June 2009 (UTC)[reply]

well, I think yours is mainly a psycological limit. But if you work on it a bit, you too will be able to start with n=0 ;-) --pma (talk) 19:11, 21 June 2009 (UTC)[reply]

It's obvious that is does work with n=0, I just don't normally think of the identify being a derivative. It's a matter of definition, rather than any mathematical, of course. --Tango (talk) 00:05, 22 June 2009 (UTC)[reply]

The definition A⁰ = 1 apply very generally. See exponentiation. And so

\scriptstyle {\frac {d^{0}}{dx^{0}}}=1

is what you should normally think. What happens if you don't differentiate? The same thing as if you multiply by one. Nothing happens! Bo Jacoby (talk) 05:08, 22 June 2009 (UTC).[reply]

When you say

\scriptstyle {\frac {d^{0}}{dx^{0}}}=1

, do you mean this is true in general no matter what it is you're differentiating or is it just true for what I gave above? asyndeton talk 07:44, 22 June 2009 (UTC)[reply]

Yes, in general, for a linear operator A, the useful notation is A⁰=I (the identity operator, that we think to as the multiplication by 1 and write indeed 1) and A^k=A...A k-times for all positive integers k. --pma (talk) 10:06, 22 June 2009 (UTC)[reply]

In case there is any confusion, note that it is

\scriptstyle {\frac {d^{0}}{dx^{0}}}=1

, not

\scriptstyle {\frac {d^{0}}{dx^{0}}}f(x)=1

. It's the identity operator, not a constant operator. --Tango (talk) 15:09, 22 June 2009 (UTC) [reply]

I don't see the difference between the two. asyndeton talk 16:54, 22 June 2009 (UTC)[reply]

The second one is

\scriptstyle {\frac {d^{0}}{dx^{0}}}f(x)=1\cdot f(x)=f(x)

. Bo Jacoby (talk) 17:05, 22 June 2009 (UTC). [reply]

The first is an operator, the second is an operator acting on a function. When we say an operator is "1" we mean it is the identity - it maps everything to itself. That's very different to saying the result of the operator acting on a function is always equal to the number "1". --Tango (talk) 17:08, 22 June 2009 (UTC)[reply]

N-deck hearts

Do the rules of double-deck cancellation hearts generalize well to hearts games with N decks and 4N-2 to 4N+2 players? Neon Merlin 23:46, 21 June 2009 (UTC)[reply]

Well my first thought is that there are two possible ways in which the game could be generalised, either we stick with the rule that if a second identical card is played then the two cards cancel, which leads to a game where if odd numbers of an identical card are played the last player to play that card (assuming it is the strongest one, otherwise it makes no difference) takes the trick whereas if even numbers are played the cards are ignored, this seems slightly artificial to me and as if it would lead to a confusing game.

Alternatively the other way in which the game could be generalized is to only apply the cancellation rule when all N of an identical card are played in one trick, otherwise if say K (<N) identical cards were played in one trick then the last player to play the identical card would count as having played it and the other players who played it would be ignored. This to some extent removes the excitement that double-deck cancelation hearts has, since especially when N is large the chance of a cancellation occuring are very slim.

Both possibilities have their pros and cons, when N is small (say 3 or 4) I think both variations would make interesting playing. —Preceding unsigned comment added by 86.129.82.211 (talk) 00:52, 22 June 2009 (UTC)[reply]

June 22

Linear topological space

What is the exact definition of a linear topological space (the formal definition). I am unable to find it on the internet. Thanks. —Preceding unsigned comment added by 58.161.138.117 (talk) 04:42, 22 June 2009 (UTC)[reply]

The same thing as a topological vector space. Bo Jacoby (talk) 04:53, 22 June 2009 (UTC).[reply]

The gambler's fallacy

I am an EMT and far too often, I hear my co-workers say, "They didn't get a single [emergency] call during the last shift, we're going to get slammed now". Clearly, each 9-1-1 call we receive is completely independent of the next, making this statement an example of the Gambler's fallacy; however, at what point does the law of large numbers take over? We typically get 15 calls a day. Say, for example, suddenly we get 0 calls over 5 days. Those five days have no bearing on what the 6th day will bring (or the 7th, 8th, etc.). However, is it fair to say that it is probable that we will get a number of calls that "make up" for that five day stretch over the course of the rest of the year? Even this seems to imply a causation effect, but really it's just looking at statistical data and assigning probabilities to it, right? -- MacAddct1984 ^{(talk  contribs)} 05:09, 22 June 2009 (UTC)[reply]

It's important to note that the law of large numbers only provides stability of the mean, not the total. Assume you normally get 6 calls a day. If you get zero calls the first five days, your per-day average is 0. But you should still expect to get on average 5 calls a day for the rest of the year, bringing you to an average of (360*6)/365, which is 5.92. Over an even longer period, the expected average would be even closer to the normal 6, but the expected total will still be 30 calls short—it's just divided by a larger number of days. (This is all assuming that the probabilities really are independent.) —JAO • T • C 05:46, 22 June 2009 (UTC)[reply]

Ah, that makes more sense to me, thanks. -- MacAddct1984 ^{(talk  contribs)} 13:53, 22 June 2009 (UTC)[reply]

Why do you think the 911 calls are independent of each other? Say it's snowing heavily. Everyone stays indoors keeping warm, and you don't get any calls. Then the snow stops, those same people go outside to shovel their driveways or drive someplace, a lot of them have heart attacks or road accidents, and the 911 board lights up. That's hypothetical of course, but I don't have too much trouble with the idea that a flurry of calls is correlated with an earlier lull in them. 67.122.209.126 (talk) 06:55, 22 June 2009 (UTC)[reply]

Assuming independence, the number of 911 calls on a shift is likely to have a poisson distribution. (This is called the law of small numbers, by the way). Your description, that you 'typically get 15 calls a day. Say, for example, suddenly we get 0 calls over 5 days' is very unlikely for a poisson distribution, but unlikely events are remembered very well. You need better historical data. Don't you have a record of the total number of calls over a large number of shifts? Bo Jacoby (talk) 07:21, 22 June 2009 (UTC).[reply]

67.122, that seems true for the most part^{[original research]}, but even in that case the calls can still be considered independent of each other, no? The fact that Joe calls 911 on one side of town after a snowfall still has no bearing on whether Jane, another resident, will.

Bo, we do keep track of call statistics. We've typically seen a 5-10% increase in call volume each year. Assuming the sample size is large enough, is it possible/logical to make predictions based off of that data? -- MacAddct1984 ^{(talk  contribs)} 13:53, 22 June 2009 (UTC)[reply]

No, in the case described different people calling are not independent. The fact that Joe has just called makes it more likely that now is a time of higher-than-usual call density, which makes it more likely that Jane will call. Algebraist 13:57, 22 June 2009 (UTC)[reply]

Perhaps you are getting a tad confused about the meaning of 'independent'. It does not necessarily describe a cause-effect relationship. Obviously, Joe's calling 911 does not cause Jane to call 911 around the same time (at least, not usually), but as Algebraist explained, the knowledge that Joe called does affect the probability that Jane will call about the same time. And that is what it means for the two events to be 'dependent', as opposed to independent. --COVIZAPIBETEFOKY (talk) 14:25, 22 June 2009 (UTC)[reply]

Where the probability is a gauged value from the frame of reference of an observer (us), rather than a descriptor of Jane's actual propensity to make the call. —Anonymous Dissident^Talk 14:54, 22 June 2009 (UTC)[reply]

Have you checked against the TV listings? I wouldn't put it past people to wait till a soap is over to ring up about their heart attack. Dmcq (talk) 12:43, 22 June 2009 (UTC)[reply]

Let the number of calls on day number i be X_i. Compute the number of days S₀ = ΣX_i⁰, the number of calls S₁ = ΣX_i¹, and the square sum S₂ = ΣX_i². Compute the mean value μ = S₁/S₀ and the variance σ² = (S₂/S₀)−(S₁/S₀)². Now, if the number of calls are really independent of one another and of time, the distribution is poisson and μ = σ². If this is not the case, but for example μ < σ², then the distribution may be the negative binomial distribution. (See the article on cumulant for explanation). If the historical data fit reasonably well into a mathematical model, then it is reasonable to make predictions based on that model. Bo Jacoby (talk) 14:30, 22 June 2009 (UTC).[reply]

I think the key thing to remember here is that how many calls you got over the last 5 days isn't a random variable, it's a known value. It's meaningless to talk about the probability distribution for it because it doesn't have one - there is a 100% chance that you saw what you saw and a 0% chance that you didn't. If you expect 15 calls a day then at the start of the year you would expect 365*15 calls that year, but once you've experienced a few days you can improve that estimate by replacing the expected number of calls on days past with the actual number. --Tango (talk) 14:49, 22 June 2009 (UTC)[reply]

Not all calls will be independent. A house starts on fire, and 3 different neighbors could very well call 911 over the same event. Or are such incidents not counted? 65.121.141.34 (talk) 18:49, 22 June 2009 (UTC)[reply]

We're after that step, the calls have already been filtered out by the time we get them. Very rarely will we get a call for, say, a stabbing, and then 30 minutes later we'll get another call because they found the other party involved who's also injured. -- MacAddct1984 ^{(talk  contribs)} 13:28, 23 June 2009 (UTC)[reply]

question on stats

Assume I have a giant jar of jelly beans (say 500 individual beans) that is half red and half green. Without being able to see what I am doing, I randomly select one. I replace it and select another. I note if I have drawn 2 red, a red and a green, or two greens and replace the bean I just drew. Now if I get 50 pairs, Logically I should have 25 red/green, 12.5 red/red and 12.5 green/green (ok so 12 of one and 13 of the other). But I know that this exact outcome by itself is somewhat unlikely. How do I determine the probability of the various combinations, (22rg, 10rr, 18gg), (28rg, 11 rr, 11gg) etc?

Our article on the urn problem is a good starting point. You may also find the article on the binomial distribution helpful. Ray^Talk 21:54, 22 June 2009 (UTC)[reply]

The probability of the combination (22rg, 10rr, 18gg), is $\scriptstyle {\frac {50!\,0.5^{22}\,0.25^{10}\,0.25^{18}}{22!\,10!\,18!}}=0.00385358$ . Bo Jacoby (talk) 08:33, 23 June 2009 (UTC).[reply]

Door codes

In order to get into the building I live in, I need to type in a four-digit code, and today, I noticed something that leads to a problem that I don't know how to solve.

I had assumed that if I typed in an incorrect digit that I would have to wait five seconds or thereabouts so that the pad-thing would reset itself, and then try again. (Trust me, this leads to the problem). So, in this situation, in order to type in every possible combination, I would need 40,000 keystrokes, since there are 10,000 codes, from 0000 to 9999, and each code has four digits. Correct me if I'm wrong, but I believe that this is the minimum number of keystokes to type in every combination.

What I noticed today is that I don't need to wait for the pad-thing to reset itself; I can simply start over without waiting. That is, if the code is 1234, and I type 51234, the door will unlock. As far as I can tell, this means that, as far as the pad-thing is concerned, typing 51234 is essentially the same as entering two codes, to wit, 5123 and 1234. The problem that I cannot solve is: What is the minimum number of keystrokes needed to enter all 10,000 codes if they can be chained up like this. And, if you would continue to humor my curiosity, could you describe what (one of) the minimal patterns would be?

Thank you for your attention to my question. 84.97.254.29 (talk) 23:48, 22 June 2009 (UTC)[reply]

You obviously need at least 10,003 keystrokes, and in fact this is sufficient. See De Bruijn sequence. Algebraist 23:57, 22 June 2009 (UTC)[reply]

That was fast. Thanks. 84.97.254.29 (talk) 00:05, 23 June 2009 (UTC)[reply]

I disagree -- but I may not have understood the question correctly.

My understanding of these machines is that they only store (or pay attention to) the last four digits entered. When you were successful by entering 51234, you did NOT test 5123; rather, the 5 was simply discarded.

(Further, if you know that you've made a keying error before you push enter, you need not wait any time at all. Just continue entering digits; as long as the last four entered are correct, you're in!)

The devices are far less complex than you may be giving them credit for. --DaHorsesMouth (talk) 22:39, 23 June 2009 (UTC)[reply]

There isn't always an "enter" key - when I set or unset the alarm system in my house I just have to press the correct 4 digits. AndrewWTaylor (talk) 22:55, 23 June 2009 (UTC)[reply]

You're wrong, DaHorsesMouth. OP talks about system with no Enter key, so if he enters 51234, he tests 5123 (and the test fails), then enters 4 which causes dropping the first digit, appending 4 at the end and testing a new 1234 code. Consider it as a 4-position shift-register, filled with keystrokes, which tests itself against a pre-set value each time a new digit is entered. CiaPan (talk) 05:36, 24 June 2009 (UTC)[reply]

June 23

Limits

Hi. I haven't done much work on limits and so need a bit of help with this problem.

Given that $\lim _{t\to \infty }e^{-mt}t^{n}=0$ where m is a positive integer and n is a non-negative integer, show that $\lim _{x\to 0}x^{m}(lnx)^{n}=0$ .

Do you just make the substitution $t=-lnx$ ? Thanks 92.7.54.6 (talk) 16:04, 23 June 2009 (UTC)[reply]

Yes, that's one way to look at it, and the method the problem seems to be setting you up for.Ray^Talk 17:25, 23 June 2009 (UTC)[reply]

So do you just make the substitution and then divide by (-1)^n?. On a related note I had to determine

\lim _{x\to 0}f(x)

where

f(x)={\frac {x}{e^{x}-1}}

. Am I correct to say this is 1? After that I had to determine

\lim _{x\to 0}f'(x)

, which I got as -1 (using the approximation

e^{x}\approx x+1

) but then when I graphed it, this was wrong. Any ideas?

You can do that one with L'Hôpital's rule (I'm not a fan of the rule, but I think it is the best method for that limit) and the answer is, indeed, 1. --Tango (talk) 17:50, 23 June 2009 (UTC)[reply]

I've never learnt any rules for limits. I'm working solely with algebraic skills and approximations for small x. 92.7.54.6 (talk) 17:54, 23 June 2009 (UTC)[reply]

For the first question - yes, you should divide by

(-1)^{n}

, which is just a nonzero constant and thus does not change the zero limit.

For the second - the approximation

e^{x}\approx 1+x

is just not accurate enough for calculating

\lim _{x\to 0}f'(x)

here. Using

e^{x}\approx 1+x+{\frac {x^{2}}{2}}

will give the correct result,

-1/2

.

As you may have noticed, knowing when is an approximation good enough is not always trivial. That's why rules like L'Hôpital's are useful. I think proving this limit without it is quite a challenge. -- Meni Rosenfeld (talk) 18:04, 23 June 2009 (UTC)[reply]

Do you only know it's not good enough because you used L'Hôpital's rule first? So was it only luck that I got the limit of f(x) right then? If you used the terms in x^3 or higher powers in the approximation of e^x, would you not get a different answer because you've become more accurate? 92.7.54.6 (talk) 18:08, 23 June 2009 (UTC)[reply]

A simpler questions perhaps; I hadn't heard of L'Hôpital's rule before this thread and looking through the page isn't helping me see how to apply it here. Could someone show me how it is used for the limits of f(x) and f'(x) please? Thanks. 92.7.54.6 (talk) 18:51, 23 June 2009 (UTC)[reply]

I know it's not good enough because it's "just enough" for

\lim _{x\to 0}f(x)

(if you try using

e^{x}\approx 1

you'll obviously fail), and thus you'll need more for

\lim _{x\to 0}f'(x)

(since taking a derivative amplifies errors). Another way is to add a hypothetical term and see if it changes the result (e.g., if you take

e^{x}\approx 1+x+\alpha x^{2}

you'll see that the result depends on

\alpha

). In a more formal proof you would probably use a remainder term with a Big O notation. But if you don't want this kind of sophisticated thinking then yes, you can only know an approximation is not good enough if you see it gives a different result from an exact method (be that using L'Hôpital's rule or otherwise).

So in a sense, yes, using it for the first limit without giving a justification was "lucky".

Using more terms than necessary will not give a different result, because we are taking the limit at

x\to 0

. These higher order terms would only make a contribution which is proportional to x and thus vanishes. If you try it for simpler cases, like

\lim _{x\to 0}{\frac {e^{x}-1-x}{x^{2}}}

you'll see what I mean.

As for using L'Hôpital's rule - for the first case, express f as a ratio

f(x)={\frac {g(x)}{h(x)}}

where

g(x)=x

and

h(x)=e^{x}-1

. Then verify that these functions vanish at the limit:

\lim _{x\to 0}g(x)=\lim _{x\to 0}x=0

,

\lim _{x\to 0}h(x)=\lim _{x\to 0}e^{x}-1=0

. Then calculate

g'(x)=1

and

h'(x)=e^{x}

, and finally

\lim _{x\to 0}f(x)=\lim _{x\to 0}{\frac {g(x)}{h(x)}}=\lim _{x\to 0}{\frac {g'(x)}{h'(x)}}=\lim _{x\to 0}{\frac {1}{e^{x}}}=1

.

For the second case you will first need to find an expression for

f'(x)

. -- Meni Rosenfeld (talk) 20:51, 23 June 2009 (UTC)[reply]

Enumerating Rationals

Hi, I am working on qualifying exam problems, so I may start asking a lot of analysis type questions, mostly real analysis. For now, I need to construct a bijection from the natural numbers onto the rationals... in other words, I need to construct a sequence of all rationals without repeating. That's easy. But, I need more. Say g(k) is this function. I need an enumeration such that

$\sum _{k=1}^{\infty }g(k)^{k}$

diverges. And, I also need to know if there exists any such enumeration such that it converges.

My first thought was to just try the obvious enumeration and see if I get anywhere: 0, 1, 1/2, 1/3, 2/3, 1/4, 3/4, 1/5, 2/5, 3/5, 4/5... . But, then I could not show that series converged or diverged. I tried root test but you get a limsup of 1 which is inconclusive. Ratio test, limit must exist... or you can use the one in Baby Rudin but it does not work either. I thought about switching it up a bit to make it larger... just for each denominator, write the fractions in decreasing order instead of increasing. But, same thing. Any ideas? Thanks. StatisticsMan (talk) 19:50, 23 June 2009 (UTC)[reply]

Why does the sequence need to converge? DJ Clayworth (talk) 19:52, 23 June 2009 (UTC)[reply]

If you assume that the sequence is a bijection, there's no way it converges. The terms do not go to 0, as after any finite number of steps, there are always infinitely many that are greater than any arbitrary constant. Ray^Talk 20:04, 23 June 2009 (UTC)[reply]

Do you want all rationals or just those in [0,1]? If the former, then what Ray said. But if the latter, you should be able to construct

g(k)

so that

g(k)^{k}<=1/k^{2}\,\!

(since

\lim _{k\to \infty }(1/k^{2})^{1/k}=1

). -- Meni Rosenfeld (talk) 21:01, 23 June 2009 (UTC)[reply]

That is, let

h(k)

be any enumeration of these rationals. Define

g(1)=1

g(k)=

The first (according to h) number which is at most

(1/k^{2})^{1/k}

and has not yet been picked.

Since you have arbitrarily small numbers, this is well-defined. Furthermore, for every

x\in \mathbb {Q} \cap [0,1)

, there is some minimal

K

such that

(1/K^{2})^{1/K}\geq x

. If

x=h(n)

, then

x=g(k)

for some

K\leq k\leq K+n

. -- Meni Rosenfeld (talk) 21:12, 23 June 2009 (UTC)[reply]

For one that diverges, use

g(2m-1)=

The first number which was not yet picked

g(2m)=

The first number which is greater than

(1/2)^{1/(2m)}\,\!

and was not yet picked

-- Meni Rosenfeld (talk) 21:25, 23 June 2009 (UTC)[reply]

Given your study objective, I think it is best to spend quite a long time thinking about this sort of problem before seeking help. There doesn't seem to be any technical machinery involved; it's just a matter of sharpening your ability of attacking such problems with elementary methods. 208.70.31.186 (talk) 23:13, 23 June 2009 (UTC)[reply]

Yea, sorry, the point is the rationals in [0, 1]. I think I get what you're saying Meni. Thanks for the help. As far as the last comment, I disagree. I have only limited time. I do not think I would have ever figured this out. But, now that I have seen a probable solution, and now that I will go through and make sure it makes sense to me and write it up in detail, then I will understand it and have a bit of intuition I didn't have before. If I can do this with lots of problems, then I built lots of little bits of intuition, as well as much technique and theory. As far as spending hours on a problem, I just spent the last two semesters doing that on most homework assignments for real analysis. Once we, often as a group, figured out a problem, in the writing up process is when I actually learned things. Thinking about a problem for hours and not getting anywhere is not helpful to me. I am trying to know how to do as many problems as possible before the test. That will be very helpful. StatisticsMan (talk) 02:26, 24 June 2009 (UTC)[reply]

Your dilemma is understandable. It is often the case that exams, although they test one's ability, are often not very useful. In particular, I think that one should not sacrifice a good understanding for good grades (if I have a perfect understanding and a D grade, I would be happier than if I had a poor understanding and an A grade), but I am not the one in your position! I think that you should attempt a problem for around 30 minutes, and if you cannot solve it, seek help. Often, reading something (such as a solution) does not aid in one's understanding. For instance, if I were memorizing some words from the dictionary, unless I write them down, I would probably forget the words in due time; even if I had looked at the word for at least 30 minutes. Therefore, if you at least attempt a problem (and then ask for the solution), not only will it help your memory (you will not go blank in the exam), but it will also, if your attempt is successful, provide you with another approach to a problem that we may not be able to give you. I also think that successfully solving many problems before an exam boosts your confidence as well as helps your intuition more than a mere solution we may be able to provide you. However, I am probably not the one to tell you what to do, so you should probably continue what you feel as best preparation for the exam. --PS T 10:33, 24 June 2009 (UTC)[reply]

Yea, I agree with you actually. I think it is good for me to try problems for a while. I think it bad for me to get stuck on one problem for hours. So there has to be a balance. We have a study group set up at our school and we basically break up the exams and do our part and then present solutions to each other. I learn a bit but not a huge amount from someone else presenting, unless it's a subject I understand pretty well already. But, in taking their description and rewriting up the whole solution, making sure I understand every detail, then I think I learn a lot. Plus, I am doing some problems on my own. And, I will probably reread the solutions at some point as well to refresh myself on various details. Perhaps, later, I will try to redo some of the problems without looking at the solutions again. StatisticsMan (talk) 00:02, 25 June 2009 (UTC)[reply]

Numerically, the sum for the standard enumeration (the one you gave in the original post) seems to converge to 1.83019344534825... . To prove that it converges I would try to find an upper bound on

d(k)

, the denominator of

g(k)

. Since

g(k)<1-1/d(k)

(for

k>2

), this should allow you to bound the sum by a convergent sum. -- Meni Rosenfeld (talk) 11:50, 24 June 2009 (UTC)[reply]

Inequality

How would you go about showing that $e^{t}(1-t)$ ≤1? I don't know where to start. Thanks Scanning time (talk) 20:24, 23 June 2009 (UTC)[reply]

Never mind - just found the single turning point at t=0 and then found the value of the second derivative at this point, which is negative so it must be a maximum. Scanning time (talk) 20:28, 23 June 2009 (UTC)[reply]

Right. It may be of interest for you to recall that for all real x the sequence (1+x/n)ⁿ converges to e^x and it is increasing as soon 1+x/n ≥ 0, that is, as soon as n ≥ -x. This gives you a lot of inequalities, for instance, yours 1-t ≤ e^-t (that of course also holds for all t>1) --pma (talk) 09:58, 24 June 2009 (UTC)[reply]

You can just start the inequality

e^{x}\geq 1+x

, substitute

x=-t

to get

e^{-t}\geq 1-t

, then divide by

e^{-t}

(which is always positive) to get

1\geq e^{t}(1-t)

. --76.91.63.71 (talk) 18:14, 24 June 2009 (UTC)[reply]

Empty mean

I'm reading a programming book in which the author gave an example method which takes an array of doubles and returns their mean. If the array is empty, the function returns 0, which the author explains with the comment "average of 0 is 0". This annoyed me quite a bit, since the author is obviously confusing the mean of no terms with the mean of a single zero term. IMO the function should return NaN (or throw an exception), since an empty mean is 0/0.

Does anyone have any further input on the matter? More importantly, is anyone aware of an online resource discussing this (my search in Wikipedia and Google didn't come up with anything)? Thanks. -- Meni Rosenfeld (talk) 22:59, 23 June 2009 (UTC)[reply]

The standard definition certainly breaks down and gives 0/0. There might be a standard convention for how to define it (similar to the conventions for various other empty things), but I don't know what it is and can't think of anything obvious. I think it should probably be left undefined. Consider the average of n 5's. That's obviously 5 for all n>0, so should be 5 for n=0. Similarly the average of n 3's should be 3 for n=0, but no threes and no fives are the same thing. --Tango (talk) 23:07, 23 June 2009 (UTC)[reply]

Yes, saying that the mean of an empty list is 0 certainly sounds wrong. That the sum of an empty list is 0 is standard, but the length is also 0, and 0/0 = NaN. 208.70.31.186 (talk) 23:21, 23 June 2009 (UTC)[reply]

The mean is not just a linear combination of the values; it's an affine combination, i.e. a linear combination in which the sum of the coefficients is 1. As John Baez likes to say, an affine space is a space that has forgotten its origin. It has no "zero". The point is that if you decide, e.g., that the zero point is located HERE, and measure the locations of the points you're averaging relative to that, and I decide the zero point is located somewhere else, and likewise compute the average, then we get the same result! That's not true of sums, and it's not true of linear combinations in which the sum of the coefficients is anything other than 1. If the "empty mean" were "0", then there would have to be some special point called "0". Michael Hardy (talk) 23:30, 23 June 2009 (UTC)[reply]
...Just for concreteness: Say we're measuring heights above sea level, and you get 0, 1, and 5 (after measuring the heights at high tide). Your average is (0 + 1 + 5)/3 = 2 feet above sea level. I measure the heights at low tide when the water is 4 feet lower, so I get 4, 5, and 9. My average is (4 + 5 + 9)/3 = 6 feet above sea level. You got 2 feet. I got 6 feet, when the water is 4 feet lower. So we BOTH got the same average height. But if it were correct to say the average of 0 numbers is 0, then should that 0 be your sea level, or mine? There's no non-arbitrary answer. So it doesn't make sense to say the "empty average" is 0. Michael Hardy (talk) 23:36, 23 June 2009 (UTC)[reply]

I completely agree that the mean value of zero observations is generally undefined. Consider however that you want to estimate a probability by a relative frequency. The probability of success is (the limiting value of) the relative frequency of success for a large number of trials. P = lim i/n. Even if the number of trials is not large it is tempting (and very common) to estimate the unknown probability by the relative frequency: P ~ i/n. This is the mean value of a multiset of i ones (for success) and n−i zeroes (for failure). The case n=i=0 gives P ~ 0/0 which is undefined. However, we know better than that: we know that 0 ≤ P ≤ 1, and that is all we know. So the likelihood function f is f(P)=1 for 0 ≤ P ≤ 1 and f(P)=0 elsewhere. This function has mean value 1/2 and standard deviation about 0.29. So the mean value of zero (0,1)-observations is 1/2. Generally the mean value of the likelihood function, (which is a beta distribution), is (i+1)/(n+2), rather than i/n. The estimate P ~ i/n is a tempting (and very common) error. Note that for large number of observations it doesn't matter: lim (i+1)/(n+2) = lim i/n. Bo Jacoby (talk) 07:01, 24 June 2009 (UTC).[reply]

I think this is equivalent to saying that if we want to use a sample to estimate the population mean, then with no data this will be just the mean of our prior. While certainly worth noting, it is inconsequential with regards to calculating the sample mean. -- Meni Rosenfeld (talk) 08:55, 24 June 2009 (UTC)[reply]

Yes Meni. It often goes without saying that the purpose of computing a sample mean is to estimate the population mean. But the mean of the posterior is not always equal to the mean of the sample. And the mean of the prior probability can never be computed by the undefined mean of the empty sample. Bo Jacoby (talk) 09:22, 24 June 2009 (UTC).[reply]

Ok, thanks for the answers. Looks like we all agree, with different ways to look at it. Perhaps someone will be interested in adding a note about this to one of our articles (personally I've stopped editing articles)? -- Meni Rosenfeld (talk) 08:55, 24 June 2009 (UTC)[reply]

I got the impression that wikipedians have a peculiar taste for trivial objects and trivial cases in mathematics. I remember interesting threads about

\scriptstyle \inf \emptyset

and

\scriptstyle \sup \emptyset

; empty sums and products; empty limits and co-limits in category theory as well as terminal and initial objects, and their ubiquity; the empty function and empty permutation; 0⁰, 0!,

\scriptstyle {0 \choose 0}

; 0-th order iterations of several constructions; derivatives and antiderivatives of order 0; orientations of a zero-dimensional manifold; 0 as a limit ordinal,... and what I'm forgetting? Certainly the empty average deserves a place among them. Moreover, these topics seem to be recurrent here, which is maybe a good enough reason for writing a short article, e.g. Mathematical trivialogy. --pma (talk) 11:12, 24 June 2009 (UTC)[reply]

Are you sure it's only Wikipedians? I am of the opinion that

Properly acknowledging the trivial\empty\limit cases can go a long way towards simplifying notation, definitions and proofs, and
If someone doesn't understand these cases, they don't truly understand the general case either.

This being the case, it is healthy for any mathematician to ponder these matters, and I bet most do. -- Meni Rosenfeld (talk) 11:41, 24 June 2009 (UTC)[reply]

Indeed. I have even seen respect for trivial cases described (in the context of history of mathematics) as one of the characteristic features of the modern trend towards generalization and abstraction. Algebraist 11:49, 24 June 2009 (UTC)[reply]

Hey... I have the feeling that you may have taken my preceding post as hyronic. It was definitely not. I am really thinking to this article as a possibly nice and useful reference, for reasons that you have already expressed (foundational, hystorical, didactic). What maybe causes the misunderstanding is the use of the word "trivial", that sounds reductive, and does not make justice to these fundamental concepts in mathematics (as usual, the most basic and simple ideas came the last). There is one further reason I think. The aesthetic taste, humor included, for them, is one of those things that make a difference between mathematicians and the rest of the world, and one of the most difficult to communicate --I guess, because the elegance of ideas in mathematics relies mostly on how they work, which is of course something that a working mathematician feels better (how to explain to a non-mathematician the comfort of constructing n-spheres starting form the 0-sphere, or the ordinals from the empty set? &c)--pma (talk) 14:23, 24 June 2009 (UTC)[reply]

I understood that you were being serious - the thing is that your usage of words like "Wikipedians", "peculiar", "recurrent here" seemed to suggest that you think dealing with trivial cases is neither common nor particularly desirable. -- Meni Rosenfeld (talk) 19:21, 24 June 2009 (UTC)[reply]

Maybe I should have said we wikipedians --well, I did not for shyness. Yes, I do not see it as a common habit, deserving particular attention to trivial cases in the discussion of definitions and theorems. Though it is desirable for the sake of complete understanding, as we said, and I myself partecipate with pleasure to these debeats. That said, there is also a kind of peculiar humoristic aspect in looking at zero-cases, that I suppose you feel too. --pma (talk) 08:16, 25 June 2009 (UTC) [reply]

Actually, my point is that trivial cases should not receive "special treatment". The definition should be phrased from the get-go to seamlessly include the trivial case. Only when the application of the general definition to the trivial case is not clear does it deserve "particular attention".

My other point is that virtually all mathematicians prefer such seamless definitions and understand how they apply to trivial cases. Does "I do not see it as a common habit" mean you disagree with this? -- Meni Rosenfeld (talk) 11:59, 25 June 2009 (UTC)[reply]

Well, talking about didactics, the fact that many people here are asking for explanations about e.g. the topics I quoted above suggests to me that their teachers did not have time to clarify these points enough (what's understandable). Of course, a theorem or a definition is generally stated so as to cover all cases. But to see the colour it takes in all contexts, trivial ones included, often wants a certain effort. --pma (talk) 00:09, 26 June 2009 (UTC)[reply]

Taking the mean of an empty set could be an arbitrary convention, and it is possible that there are circumstances where this would be useful. On the other hand I think it is more likely to arise as a result of a fundamental difficulty which many (perhaps most) people have in connection many ideas related to zero and empty sets. I have had considerable difficulty in trying to convey to non-mathematical people that a value not existing is not the same as the value existing and having the value zero (for example, how many daughters has the king of Germany?). Likewise conveying the distinction between an empty set and no set. A final example is people who are convinced that 12÷0 must be 0. It seems that to many people it is very difficult to conceive of anything involving 0 which produces anything other than 0. On the other hand it once took me a considerable time to persuade someone (of fairly average intelligence in most respects) that 6×0 was 0. He said "if you start with 6 and then do nothing you must still have 6". I eventually persuaded him that it was possible to meaningfully have an operation involving 0 which did not mean "do nothing", but it was hard work. Another example is people who feel 0!=1 is completely unnatural. Zero is a surprisingly confusing and difficult concept for many people. JamesBWatson (talk) 11:47, 24 June 2009 (UTC)[reply]

(indentation confusion: this is to JamesBWatson) In some senses, your question about Germany has to have the answer 0, because the set {girls having a parent that is king of Germany} has no elements. I'm not sure why (aside from having an easier/less-ambiguous formulation in terms of set theory) that question is answerable when "How old is the king of Germany?" isn't. --Tardis (talk) 14:40, 24 June 2009 (UTC)[reply]

Because you've twisted the question. The question was not 'how many girls are there whose parent is a king of Germany' but rather 'how many daughters does the king of Germany have'. The former makes sense if there is no king of Germany (or indeed if there are many), but the latter (by its use of a definite article) presupposes that there is exactly one king of Germany. Algebraist 14:49, 24 June 2009 (UTC)[reply]

Sure, the mean is not really 0 if you have no values, but you have to start from some number that's not NaN or else the recursion algorithm to compute the mean won't work. Zero will do just like any other number. – b_jonas 05:08, 25 June 2009 (UTC)[reply]

What recursion algorithm? If you mean

\mu \left(X\cup \{x\}\right)=(|X|\mu (X)+x)/(|X|+1)

, then that's neither efficient nor intuitive, and I see no reason to consider it at all. -- Meni Rosenfeld (talk) 11:59, 25 June 2009 (UTC)[reply]

A difficulty leaning mathematics is that the meaning of the word "many" changes from signifying three or more to signifying zero or more. Asked "how many?" a mathematician may answer "zero" while a nonmathematician answers "no, not many, not even one". To the nonmathematician "empty set" and "number zero" is nonsense because empty is not a set and zero is not a number, as you don't count to zero. Redefining common words must be carefully explained. Bo Jacoby (talk) 06:48, 25 June 2009 (UTC).[reply]

I have never met anyone who would not consider zero to be a number, or would answer a 'how many' question in that strange fashion. Is this some bizarre regional difference? Algebraist 12:07, 25 June 2009 (UTC)[reply]

Well sir, you wrote that you "have never met anyone", not that you did meet zero. So you yourself prefer to express zero by "never anyone". If I said that I knew many more examples, wouldn't you understand "many more" to mean more than zero more? Bo Jacoby (talk) 23:13, 25 June 2009 (UTC).[reply]

Yes, but that's not relevant to either of the claims I questioned above. Algebraist 10:00, 26 June 2009 (UTC)[reply]

I have a better answer than all of the above. People expect the mean to be a kind of average. Like, "what weight girl (average) is at that Linux users group meeting"... Well the word average in that question is ambiguous, but whether the person would find the mean, the median, or the mode the information of greatest interest, for any of the three THEY WOULD IMMEDIATELY understand the meaning of the answer "empty set", and this is the correct result returned. I'm therefore quite sure that this is the function's best response, as anyone asking the question would understand this answer to mean there are't any girls. By contrast 0 implies zero-G conditions (without any implication of nobody being there) while NaN would seem to say "the numbers on the scale don't go up that far"... :) 193.253.141.64 (talk) 21:41, 25 June 2009 (UTC) [reply]

and I have another argument for the empty set being the correct result: when people ask for the AVERAGE, and they mean the mode (most common value), it is perfectly correct to say "3 and 4" (if those two appear the most times, and the two appear an equal number of times). So if you can ASK for the average, and GET the set (3, 4), it means you are ALREADY intuitively prepared to receive a set rather than just an integer. Therefore ASKING for the average, and GETTING the set () would obviously and clearly tell you that there is nothing to choose among. Like, I ask for the mode and get (3,4) so I know there are more 3's and 4's than anything else. I ask for the mode and get (3, 7, 15, 21, 35), and I know those are some values in the data that appear at least once. I ask for the mode and get () then I know there isn't ANY value in the data that appears even once. Really, it is just a historical coincidence that the median is not returned in the same way for data with an even number of values, as it would make sense to ask for the "median" household income in this neighborhood of 10 houses and get (60000, 90000) -- and this answer is more informative than the "official" answer of a median of 75000, as the latter intuitively has the implication of there being a "middle" family at just that level -- however, this is true only for an odd number of data items. As for arithmetic average (ie the mean, what the question is about), if there is ONE data point, you just get the set back -- the average is the whole set. To me it makes perfect sense to extend that to "zero or one data poits" -- you just get the set back, ie either a single value, or an empty set. That's because in the case of 0 or 1 data points, there is nothing to AVERAGE. You dont need to add up items and divide, as the very word ADD makes no sense whatsoever without at least 2 operands. You cant add up the number 7. If I ask you to add up the number 7 and tell me the result, you would feel that this is an ungrammatical question. So you are already cheating on the definition of arithmetic mean, as it does not consist of adding the items together and dividing by the number of items -- it consists of this only when there are at least two. For one item, you just get it back. At least to my lispy, perly, pythony mind, the above explanation implies totally strongly that for a function for calculating the mean which you pass (), you should get back (). Same goes for the median and the mode. --82.234.207.120 (talk) 10:57, 26 June 2009 (UTC)[reply]

There are numerous flaws in the above responses.

the answer "empty set"... correct.... No, the empty set is the correct answer to "who are the girls in that group", not to "what is their average weight".
0 implies zero-G. I have no idea how we suddenly got to the gravitational field in outer space.
NaN would seem to say "the numbers on the scale don't go up that far". No, that's not what NaN is. What you're describing is PositiveInfinity.
Re sets as values for averages: This does not work for the mean, which is a precisely computed single real number. As for medians and modes - this only works if you agree that they returns sets consistently. It doesn't make sense to return a number sometimes and a set at other times. For mode in particular, if you want it to return a set, I would define it "the set of all numbers having the maximum frequency". Thus the empty mode would be $\mathbb {R}$ rather than $\emptyset$ , since all numbers have the maximum frequency, 0. For the median, returning a set is not useful at all, since it is harder to represent and to use in calculations.
ADD makes no sense whatsoever without at least 2 operands. That may be true for high-school arithmetic, but in real mathematics there is no problem at all calculating the sum of 1 or 0 terms.

In short, I strongly disagree with your suggestion of using an empty set as the value of an empty mean. -- Meni Rosenfeld (talk) 12:20, 26 June 2009 (UTC)[reply]

You seem to miss the basic point that we are discussing a function whose return type is a floating-point number (a double). Since the empty set is not a floating-point number, it cannot be returned by the function. The IEEE 754 NaN ("not a number") is designed for exactly this kind of circumstances, and it should be the result of the function, just like the OP said. (0/0 gives NaN for the same reason.) NaN most definitely does not mean "the numbers on the scale don't go up that far", that's what the infinities are for. — Emil J. 12:02, 26 June 2009 (UTC)[reply]

That too, however, personally I'm more interested in an abstract mathematical discussion. -- Meni Rosenfeld (talk) 15:18, 26 June 2009 (UTC)[reply]

Let me address the zero-G comment first, then we could go on to the other parts of my argument if there is no dispute. The zero-G comment is to show that returning zero is absolutely wrong and should be out of the question in all cases - it should not even be considered for a second. I realize this isn't the physics ref desk, so let me illustrate with an even more forceful example. Go ahead and answer the following question:

“

In reflecting over all the times you have shot somebody in the face, what is the average number of warning shots you fired first?

”

What is your answer? I don't think any of us here would answer 0 to that question, and that is because 0 is not an appropriate answer. So as mathematicians, how would YOU answer it? I know when *I* reflect over all the times I have shot somebody in the face, I come up with the empty set! The average distance in centimeters the tip of the barrel was from their face? Empty set. Average number of beers I had prior to shooting them? Empty set. Average number of times I shot each person? Empty set. Why... What are YOUR answers to these questions?

Can we agree on this much: zero is totally inappropriate for any of the above questions, and more generally, for the average of an empty set... --82.234.207.120 (talk) 13:52, 26 June 2009 (UTC)[reply]

My response to that question would be to refuse to answer, saying that a wrong question cannot have a right answer. Since "average" is undefined for empty sets, asking about the average of shots assumes that there were any shots, which is wrong, thus making the question void.

Returning to the empty mean, the correct response is either no answer at all (programmatically, an exception) or, if we agree to define 0/0 as an entity called "NaN", it would be it. So 0 is incorrect. But - if we are forced to choose a real number as an answer, 0 would probably be the least wrong. IIRC, defining 0/0=0 causes less contradictions than other arbitrary choices. -- Meni Rosenfeld (talk) 15:18, 26 June 2009 (UTC)[reply]

I think you're making a mistake in thinking that the mean of an empty set has no sensible value. Usually when you compute a mean what you're aiming for is a best approximation to some quantity of interest based on noisy measurements. The mean of an empty set is an approximation based on no data. There is an ideal answer in that case, namely the actual value of the quantity, but in the absence of data you can't guess that, so you return "I don't know". That's what a NaN in IEEE floating-point arithmetic represents: an unknown value. It doesn't really mean "not a number"—there might well be a correct floating-point answer, but the calculation failed to produce it for some reason (consider

{\sqrt {-1}}^{2}

for example). The correct return value here is not "no possible answer" but "could be anything". It might be appropriate in certain cases to return zero in the absence of any data. Probably, though, if you wanted a bias toward zero in the absence of data you would want the bias to disappear gradually as you added more data points, not immediately when the first data point arrived, so you wouldn't really be computing the mean. -- BenRG (talk) 15:25, 26 June 2009 (UTC)[reply]

This brings us back to my exchange with Bo. I'll emphasize again the difference between the sample mean - which is precisely calculated by some formula, and the population mean - which has some unknown value we would like to estimate using, among other things, the sample mean. If the data is

(1,3)

, then the answer to "what is the population mean?" is "could be anything, but based on the available information it is likely to be around 2", while the answer to "what is the sample mean?" is "it is 2". If there is no data, the answer to "what is the population mean?" is either "could be anything, and I don't have enough information to even begin to know it" or "could be anything, but based on my prior assumptions it is likely to be around 0 [or something else]". But there is no answer to the question "what is the sample mean?". -- Meni Rosenfeld (talk) 15:49, 26 June 2009 (UTC)[reply]

June 24

Normal distribution curves from two values.

Is it possible to create a normal distribution curve if you only know, say, two of the standard deviation values are?

For example, let's say I know two values, X and Y, and I also know that between these two values 95% of the other values exist; is it possible to figure out what the rest of the standard deviations would be for the graph, and other information? I would thing it would be, but perhaps I'm wrong.

On a side note, does anyone know if there's a way of getting Excel to generate Normal distribution curves? It's been like 5 years since I've had to even think about drawing them and the wikipedia article is confusing me. (I know that's more of a computer sciences question, but I figure Mathematicians probably know the answer to that question as Excel seems very math/statistics oriented)

Thanks in advanced for your help! --Honeymane_{Heghlu meH QaQ jajvam} 03:27, 24 June 2009 (UTC)[reply]

Can't help you with the Excel, but the answer for your question is no. The problem is that a normal distribution is characterized by two parameters (the mean and the variance), and your information only supplies one equation characterizing the relationship between the two (specifically, that 95% of the distribution lies between X and Y). Thus, there are infinitely many possible normal distributions meeting your criteria. You need an additional piece of information. Ray^Talk 05:08, 24 June 2009 (UTC)[reply]

So it's not possible to derive the mean from the part of the standard deviation? --Honeymane_{Heghlu meH QaQ jajvam} 05:35, 24 June 2009 (UTC)[reply]

Well, in your example, the mean will be around midway between X and Y, you just won't be able to figure out exactly where it is between those values. That assumes your "95%" means "exactly 95%" and not "at least 95%". 208.70.31.186 (talk) 06:39, 24 June 2009 (UTC)[reply]

No, that's wrong. There's no reason to think the mean would be half-way between the two values. For example, one of them could be the 99th percentile of the distribution, and the other the 4th percentile. Or one could be the 99.99th percentile (a substantially larger number) and the other the 4.99th percentile---that still adds up to 95% between the two. That's why the answer is not unique. Michael Hardy (talk) 20:49, 24 June 2009 (UTC)[reply]

The anon said around midway but you can't be exact, which is perfectly accurate. The mean is going to be reasonably close to the midpoint. --Tango (talk) 21:04, 24 June 2009 (UTC)[reply]

No, it's not accurate at all. If the upper percentile is close to the 100th percentile (e.g., what if it's the 99.999999999th percentile) then the it will be a very very large number, whereas the lower one will be close to the 5th percentile, not so large at all. The mean will be far closer to the lower endpoint than to the upper one. Michael Hardy (talk) 05:57, 25 June 2009 (UTC)[reply]

There's no reason to believe that unless we know that the distribution has reasonably high variance. If all you know about a normal distribution is that exactly 95% of its mass is between a and b, then the mean could be anywhere in (a,b). Algebraist 21:07, 24 June 2009 (UTC)[reply]

To the OP: Your way of using words is so non-standard as to render your first sentence incomprehensible. Standard deviation is a precisely defined term, and what it means is not at all what you seem to mean. A normal distribution cannot have two different standard deviations; it has only one. Michael Hardy (talk) 20:51, 24 June 2009 (UTC)[reply]

Just for concreteness, for the standard normal distribution, 95% of the probability is between 2.3263 and −1.7507, but also 95% is between 3.719 and −1.6458. In the second case, you certainly don't have the mean halfway between the two, since the mean is 0. Michael Hardy (talk) 20:54, 24 June 2009 (UTC)[reply]

Perhaps I can rephrase then. Is it possible to figure out what the other 'values' in a normal distribution curve are, if you only have access to a limited amount of information? Take this diagram for example, if I know that -2σ is equal to 5, and 2σ is equal to 5, can I figure out what the mean (μ) and Standard Deviation would be?

To use the example from the SD article, if I told you that 95% of adult men are between 64 and 76 inches in height, is it possible to figure out that the mean value would be 70 inches with a mean of 3 inches? Given Ray's answer, I'm guessing the answer is no.

I realize I may not be using standard math language, it's never been my strong suit.--Honeymane_{Heghlu meH QaQ jajvam} 02:00, 25 June 2009 (UTC)[reply]

A normal distribution is determined by two values, the mean and the standard deviation. That means you cannot uniquely determine a normal distribution with less than two pieces of information. That 95% of the population are in a certain range is one piece of information, so it is not enough. The other example you give is different information, and is contradictory - -2σ and 2σ cannot both equal 5. That diagram is rather poor, anyway, the labels on the x-axis should be "μ+2σ", etc., not just "2σ". If you know that, say, μ+2σ=5 and μ-2σ=-5 then you can just solve the simultaneous equations to get the mean (μ) and standard deviation (σ). Note that there are two equations there, that is two pieces of information so it enough to uniquely determine the normal distribution. (The two pieces of information need to be independent of each other, if one implies the other then you only really have one piece of information.) --Tango (talk) 02:28, 25 June 2009 (UTC)[reply]

To graph a normal distribution in Excel:

Put -4 in A2, -3.9 in A3 and autofill the column to 4 in A82.
Put =EXP(-A2*A2/2)/SQRT(2*PI()) in B2 and autofill the column to B82
Highlight region from the A2 to B82 and click on the chart wizard

--RDBury (talk) 04:46, 25 June 2009 (UTC)[reply]

Bezier curve with width

I want to draw a bezier curve with an arbitrary thickness. How is this done?

The original curve has one control point (making it quadratic, I guess, although I barely know what that means). My idea is to draw two curves either side of it and fill between them. It's easy enough (if I'm doing it right) to find the offset start and end points for the two side-curves, but how can I find suitable control points for them? Offsetting the original control point by the desired width produces curves which are pinched in the middle.

Here's screenshot of this happening. (You have to ignore that the surface is tilted, but I think you can see what's going on.)

I thought of measuring the distance between the start and end points of each of the side-curves, comparing that with the same distance on the original curve, and using the result to scale the offset of the control points. That's just a guess, though, and I think it wouldn't work. 213.122.47.84 (talk) 09:56, 24 June 2009 (UTC)[reply]

The offset curve of a (nontrivial) Bézier curve is not a Bézier curve itself, so you cannot do it in such a way at all. If by "drawing" you mean that you are trying to rasterize the curve, then a simple solution is to use whatever method you used to plot the original curve, except that you draw a (filled) circle instead of each pixel. — Emil J. 12:12, 24 June 2009 (UTC)[reply]

I was afraid so. Yes, I mean rasterize (I nearly posted this to the computing desk), and yes that's simple, but highly inefficient. Thanks for the link. I wonder what to do. Use cubic beziers, maybe. 81.131.10.72 (talk) 12:20, 24 June 2009 (UTC)[reply]

Try doing a google with 'bezier width algorithm' and you'll get a number of ways. Most systems will allow you to just specify the width and they'll do something. If you're doing this yourself you need to decide what you mean by a bezier curve with width and whether you want an ideal or real solution. For instance one definition might be what you get if you move a disc along the line - but then you get round ends. Another might be that you move a segment along the line always at right angle to the line. Dmcq (talk) 12:18, 24 June 2009 (UTC)[reply]

I need a solution similar to this: http://alienryderflex.com/polyspline/ ...which does something magical-seeming (to me) with quadratics to provide the answer for where a spline intersects a scanline. By this means I can fill between curves efficiently (not considering any pixel more than once) and perfectly (not approximating any part of the curve with a line). I would settle for looking right over being exactly correct, though. 81.131.10.72 (talk) 12:34, 24 June 2009 (UTC)[reply]

I think the usual procedure with drawing a Bezier curve is to keep subdividing it until each segment is effectively a line. In other words, if the control points are within 1 pixel of the line between the endpoints then you can assume that segment is a line and apply a line generating algorithm. A line with thickness is just a rectangle so if you want your algorithm to draw a thick Bezier curve then draw rectangles instead of lines in the last step.--RDBury (talk) 04:23, 25 June 2009 (UTC)[reply]

I'll consider* that. I had an idea of my own: if I had an algorithm for taking a set of points and producing a smooth string of splines that pass through them, I could find a handful of offset curve points by Dmcq's "segment at a right angle" method and then turn them into a series of splines that approximate the offset curves. (I have edited out what I put here about de Boor's algorithm because I think it's irrelevant.) *i.e. disregard and run back to it later. 81.131.56.235 (talk) 11:11, 25 June 2009 (UTC)[reply]

Find an array of numbers, max_i and max_j being given

Hi. I need to find a nXm array (in fact n≤m≤8) of positive numbers x_ij, with max_jx_ij = a_i and max_ix_ij = b_j for all i and j. The positive numbers a_i and b_j are given and max_ia_i = max_jb_j=m. Apparently there are always such arrays, yes? Have you a recepit for it? Thanxx --84.221.68.243 (talk) 16:06, 24 June 2009 (UTC)[reply]

How about

x_{ij}:=a_{i}+b_{j}-m

--84.221.209.203 (talk) 17:54, 24 June 2009 (UTC)[reply]

Isn't

x_{ij}=\min(a_{i},b_{j})

the correct answer (or at least one of correct answers)? The answer may be not unique. Let's take e.g. m=n=2 and all a=b=2. Then

{\begin{bmatrix}2&p\\q&2\end{bmatrix}}

will be a solution for any positive p and q not exceeding 2, as well as

{\begin{bmatrix}p&2\\2&q\end{bmatrix}}

. --CiaPan (talk) 20:16, 24 June 2009 (UTC)[reply]

poker odds

What are the odds that in Texas Holdem poker with 10 players that 2 or more players effectivly have the same initial hand?

S=spade C=club H=heart D=Diamond.

Example,

10D 10H is effectivly the same as 10S 10C

8S 9S is effectivly 8H 9H, but not 8C 9 D 65.121.141.34 (talk) 16:49, 24 June 2009 (UTC)[reply]

A rough approximation is easy: compute the approximate odds that two hands are effectively the same, then use a birthday paradox calculation to find the likelihood of that occuring in the 45 possible pairings of 10 players. Getting everything exactly right sounds messy and if you're mostly interested in the actual number that comes out the problem, it may be easiest to run a computer simulation with a few million random deals and count how many times those ties occur. 67.122.209.126 (talk) 19:39, 24 June 2009 (UTC)[reply]

[ec] I think the probability for two particular players is

{\frac {97}{20825}}\approx 0.466\%

. The calculation is complicated by the fact that the events are not independent for multiple players, but the final result may be close to 20%. -- Meni Rosenfeld (talk) 19:43, 24 June 2009 (UTC)[reply]

June 25

two questions

I have two questions

The internal dimention of an open concrete tank are 1.89 m long,0.77 m wide and 1.23 m high. If the copncrete side walls are 0.1 m thick find in cubic metres ,the volume of concrete used.Give answer to 3 sigficant figures. Answer on book is .704 cubic m . but i have solved it and find the answer 0.79989 by multiplying surface area with thickness.
Two trains A and B , are scheduled to arrive at a station at certain time .The time is in seconds after the scheduled time for each of the 40 days was recorded , mean and standard deviation are as

A: mean 0.528 S.D 0.155

B: mean 0.498 S.D 0.167 , My question is which train is more consistent in arriving late .why ?

and which train is more punctual on the whole ?why?. —Preceding unsigned comment added by True path finder (talk • contribs)

1. Some of the volume in your method of calculation is added twice, specifically the edges where the sides meet, so you have to calculate total volume minus the volume of the empty space in the "tank", which is more like a tube since it doesn't have a top or bottom. Doing so, I ended up with 0.70356 m^3. --Wirbelwindヴィルヴェルヴィント (talk) 16:24, 25 June 2009 (UTC)[reply]

Thank U ,I have found it

Chain Rule for Matrices

{\frac {d\mathbf {F} (\mathbf {G} (x))}{dx}}=\mathbf {F} '(\mathbf {G} (x))\mathbf {G} '(x)

or

{\frac {d\mathbf {F} (\mathbf {G} (x))}{dx}}=\mathbf {G} '(x)\mathbf {F} '(\mathbf {G} (x))

76.67.79.85 (talk) 18:11, 25 June 2009 (UTC)[reply]

Neither of the rules are correct (assuming that you are using the standard vector-matrix notation). Note that a derivative of a "matrix function of a matrix" is not a matrix, so you will have to carefully define what you mean by:
- $\mathbf {F} '(\mathbf {G} (x))$ , the derivative of matrix F with respect to matrix G. (lets call this H)
- multiplying H (on the left or right) by the matrix $\mathbf {G} '(x)$ .

before we can write down the chain rule succinctly. Abecedare (talk) 19:45, 25 June 2009 (UTC)[reply]

OP here, F is a function from a matrix to another matrix (in my case, it is the matrix logarithm, but I want a general answer). By multiplying two matrix-valued functions, I mean that the two matrices should be multiplied according to the normal rules of the matrix product. 70.24.38.23 (talk) 21:20, 25 June 2009 (UTC)[reply]

Here is how you can proceed (with some abuse of notation I am not distinguishing between a matrix and a matrix valued function):

Let

x\in \mathbb {R} ;\;\mathbf {G} \in \mathbb {R} ^{M\times N};\;\mathbf {F} \in \mathbb {R} ^{P\times Q}

Then define:

$\mathbf {D} :={\frac {d\mathbf {F} (\mathbf {G} (x))}{dx}}\in \mathbb {R} ^{P\times Q}$ with $\mathbf {D} _{pq}={\frac {d\mathbf {F} _{pq}(\mathbf {G} (x))}{dx}}$ , and
$\mathbf {H} :={\frac {d\mathbf {F} (\mathbf {G} )}{d\mathbf {G} }}\in \mathbb {R} ^{M\times N\times P\times Q}$ with $\mathbf {H} _{mnpq}:={\frac {\partial \mathbf {F} _{pq}(\mathbf {G} )}{\partial \mathbf {G} _{mn}}}$

Now using the above definitions and standard chain rule for functions of several variables:

\mathbf {D} _{pq}=\sum _{m,n}\mathbf {H} _{mnpq}{\frac {d\mathbf {G} _{mn}}{dx}}

You can write the above sum as a matrix-vector product by defining those appropriately; but note that H is not a matrix, and that the matrix D itself is not obtained through a simple matrix product. Hope that helps. Abecedare (talk) 22:08, 25 June 2009 (UTC)[reply]

Thank you, I get it now. 70.24.38.23 (talk) 22:43, 25 June 2009 (UTC)[reply]

Integral is 0 on any interval

Say we have an integrable function, f. If the integral of f is 0 over any interval, then must f be 0 almost everywhere? It seems pretty obvious that it is true, but I'm not quite getting it. And, what if f is some weird function that is 1 half the time and -1 half the time, with the points mixed up in such a way that there are the same amount of each type in every interval? Obviously that's not possible because the integral of the positive part would be infinity and then f would not be integrable. But, perhaps some wierd thing like that could happen, perhaps it's decreasing toward 0 for the positive part and increasing toward 0 for the negative part, both at the same rate. Any help would be much appreciated. Thanks. StatisticsMan (talk) 19:27, 25 June 2009 (UTC)[reply]

Your objection that f would not be integrable is not a problem, since we can just define f to be your weird thing in [0,1] and constantly zero everywhere else. So all we need to make your idea work is a measurable subset of [0,1] that contains exactly half of every interval. Algebraist 19:50, 25 June 2009 (UTC)[reply]

Does the Regularity theorem for Lebesgue measure prevent that weird thing (assuming we are talking about Lebesgue integrability)? 208.70.31.206 (talk) 20:54, 25 June 2009 (UTC)[reply]

It does. Almost all points x of a measurable set S are points of density 1, meaning that | [x-ε, x+ε]\S | = o(ε) as ε→0. --pma (talk) 14:37, 26 June 2009 (UTC)[reply]

You mean μ([x − ε, x + ε] \ S) = o(ε), right? Except that this is Lebesgue's density theorem. I don't see how the regularity theorem has anything to do with it. — Emil J. 14:52, 26 June 2009 (UTC)[reply]

Yes, it's the density thm, that's what I linked. In fact I only read the word "Lebesgue" on the preceding post and concuded. I tend to assume correctedness.--pma (talk) 15:27, 26 June 2009 (UTC)[reply]

Ahh, I didn't bother to click on your link, and didn't realize it redirects to the density theorem. — Emil J. 16:03, 26 June 2009 (UTC)[reply]

The typical argument along these lines starts with the fact that ~~characteristic~~ simple functions over intervals are dense in the space of integrable functions, measured with the $L^{1}$ metric. Ray^Talk 20:41, 25 June 2009 (UTC)[reply]

On further reflection, I realize that was unhelpful and wrong (since corrected the wrong part). Sorry. I shall try to make amends. If the function is not zero a.e., there must exist some positive value, call it M, such that $m(\{x:f(x)\geq M\})\geq \eta >0$ . Since the function is integrable, for every $\epsilon$ there must exist a $\delta$ such that $\int _{A}|f(x)|dx\leq \epsilon$ whenever $m(A)\leq \delta$ . Pick $\epsilon \leq {\frac {M\eta }{2}}$ , say. Let us approximate the set where $S(M)=\{x:f(x)\geq M\}$ by a containing open set $O_{\delta }$ which contains all but $\delta$ of the mass of the set S(M). Open sets on the real line are just collections of open intervals, thus $S(M)=(\cup I_{i})\setminus A$ , where $m(A)\leq \delta$ . Thus

M\eta \leq \int _{S(M)}f(x)dx=\sum _{i}\int _{I_{i}}f(x)dx-\int _{A}f(x)dx=-\int _{A}f(x)dx\leq \epsilon \leq {\frac {M\eta }{2}}

which is a contradiction. I'm not really satisfied with this argument, and I think there's a better measure-theoretic way of going about it, but ... Ray^Talk 21:39, 25 June 2009 (UTC)[reply]

You can prove it in plenty of ways; here is another one. Recall that for an integrable function f the integral mean of f on each interval [x-ε, x+ε] defines a certain function f_ε(x) that converges to f in L¹ as ε tends to 0. (This is very easy; btw, a bit less elementary fact is that you also have a.e. convergence, which is the content of the Lebesgue differentiation theorem: anyway that's not necessary here). On the other hand, the condition on f states that f_ε is identically zero, so you conclude that f itself is 0 a.e., as limit of 0= f_ε in the L¹ norm.
Another proof: take a uniformly bounded sequence (φ_k) of step functions (that is, linear combinations of characteristic functions of intervals) converging a.e. to sgn(f). The integral of φ_kf is zero by the assumption; moreover by the Lebesgue dominated convergence theorem the integral converges to || f ||₁, so f=0 a.e. We can rephrase the preceding argument saying that by assumption f is in the annihilator of the subspace of L^∞ of all step functions; therefore f=0 as element of L¹ because step functions are weakly* dense in L^∞ (this easily follows form the fact that they are also dense in L¹....&c. Notice that Ray's first hint is correct and useful.).
It is interesting that the same conclusion holds if you ask the condition only on the intervals of length 1. Indeed, this implies, by the sigma-additivity of integral, that the integral is 0 on all half-lines. By subtraction then the integral on intervals of any length also vanishes, and you are lead to the preceding case.
Still true, but less elementary, the following generalization to integrable functions on Rⁿ : if the integral of f over any translated U+x of a given bounded set of positive measure U vanishes, then again f=0 a.e. Indeed, the assumption is equivalent to say that f * g = 0, where g is the characteristic function of -U. Applying the Fourier transform you have that the pointwise product of f^ and g^ is 0 a.e. But g^ is a non-zero analytic function by the Paley-Wiener theorem, so it is a.e. different from 0, hence f^ is 0 a.e., and since the Fourier transform is injective, f is 0 a.e. --pma (talk) 23:15, 25 June 2009 (UTC)[reply]

36 standard deviations away from the mean on a normal distribution

What would be an example of 36 standard deviations away from the mean in nomally distributed data? I'm thinking like an iq of 650 or being a normal human who happens to be 15 feet tall, but neither of these is something I can imagine... Any better (real) examples, not imaginary? Also how about in terms of dice, like throwing sixes in a row with a fair dice? How many sixes in a row would happen about as frequently as something 36 standard deviations away from the mean? Thanks! 193.253.141.81 (talk) 21:16, 25 June 2009 (UTC)[reply]

It doesn't happen, if the data is actually normally distributed. 36 standard deviations is well outside the "happens once in N times the lifetime of the universe," where N is a large number. Ray^Talk 22:05, 25 June 2009 (UTC)[reply]

So it's like throwing how many sixes in a row with a fair die?193.253.141.64 (talk) 22:48, 25 June 2009 (UTC)[reply]

Ok, let's work it out. The number of sixes in N throws follows a binomial distribution. The expected number of sixes is, of course, N/6. The variance of a binomial is np(1-p), in this case that is N*1/6*5/6=5N/36. The standard deviation is the square root of that, which is

{\frac {\sqrt {5N}}{6}}

. So we want N such that N-N/6=

6{\sqrt {5N}}

. That is N=259.2, so we'll round up to 260 throws. The chance of that happening is about 1 in 10²⁰² (for comparison, the age of the universe is about 4x10¹⁷ seconds). I think I've done that right, can someone check me? --Tango (talk) 00:35, 26 June 2009 (UTC)[reply]

Let me give a different approach, letting the variable be

{\bar {x}}_{n}

, the sample mean of rolling a die n times in a row, rather than N, the number of sixes, as Tango did (I don't feel like I have the background to check Tango's work). It shouldn't be terribly surprising that we get significantly different answers, since we used different variables.

\mu _{{\bar {x}}_{n}}=3.5

is the mean, and

\sigma _{{\bar {x}}_{n}}={\frac {\sigma }{\sqrt {n}}}={\frac {\sqrt {\frac {35}{12}}}{\sqrt {n}}}={\sqrt {\frac {35}{12n}}}

is the standard deviation, since the standard deviation for a single die is

{\sqrt {\frac {35}{12}}}

by direct calculation. Since for sufficiently large n (and we're talking about at least, say, 5 here; not a very stringent requirement), the distribution is approximately normally distributed, we need only look for an n which satisfies:

{\frac {{\bar {x}}-\mu _{{\bar {x}}_{n}}}{\sigma _{{\bar {x}}_{n}}}}\approx 36

Where we want

{\bar {x}}

to be 6. We can solve this:

{\frac {6-3.5}{\sqrt {\frac {35}{12n}}}}\approx 36

{\frac {(2.5){\sqrt {12n}}}{\sqrt {35}}}\approx 36

{\frac {5{\sqrt {3n}}}{\sqrt {35}}}\approx 36

{\sqrt {n}}\approx {\frac {36{\sqrt {35}}}{5{\sqrt {3}}}}

n\approx {\frac {36^{2}35}{5^{2}3}}

n\approx {\frac {3024}{5}}=604.8

So the smallest n such that n rolls of 6 would be more than 36 standard deviations from the mean is 605.

Since this distribution is approximately normal, we can give a rough approximation of the probability of a normal distribution exceeding 35 standard deviations by calculating the corresponding probability in rolling 605 dice, and we get the unsurprisingly tiny number

6^{-605}\approx 10^{-471}

. A better approximation can be achieved by calculating the corresponding probability for a larger number of rolls. --COVIZAPIBETEFOKY (talk) 01:21, 26 June 2009 (UTC)[reply]

I'm not convinced by your method. It seems that you would get a different answer if we talked about rolling lots of 1's in a row, rather than 6's, which doesn't fit with the spirit of the question being asked. Also, I'm not sure the normal approximation is valid for any number of rolls, since you are examining what happens at the maximum and a normal distribution doesn't have a maximum so clearly the exact distribution and the approximate distribution differ significantly at that point. --Tango (talk) 10:53, 26 June 2009 (UTC)[reply]

I agree on both points, although ironically, in your first point, you picked an example that would actually get exactly the same answer (if you do it right), by symmetry around the mean. But if you're talking about lots of 2's, 3's, 4's or 5's in a row, then you're right; my analysis won't work. In part, that's because

{\bar {x}}=5

does not imply a long streak of 5's, as it does for 6. --COVIZAPIBETEFOKY (talk) 12:09, 26 June 2009 (UTC)[reply]

So who's right? Is it 260 or 605 throws of sixes in a row with a fair die?193.253.141.64 (talk) 06:54, 26 June 2009 (UTC)[reply]

We've answered different questions, so we could both be right. --Tango (talk) 10:53, 26 June 2009 (UTC)[reply]

It seems to me that neither of you has answered the question the OP was asking. The probability of a normal variable being greater than

\mu +36\sigma

is

4\cdot 10^{-284}

. This is also the probability that, if you throw a die 364 times, they will all turn out 6.

I'll also point out that as far as I know, the normal approximation of the binomial does not work for values so far from the mean. -- Meni Rosenfeld (talk) 11:38, 26 June 2009 (UTC)[reply]

But the OP didn't ask about a normal variable, they asked about rolling dice. As you say, the normal approximation doesn't apply, so you have to do it exactly, which is what I attempted (potentially incorrectly). There is also a matter of what variable you are measuring. I did the number of sixes in N rolls (for fixed N), you could also do the number of sixes before you get a different number. I would imagine you would get different answers (since I have no reason to assume they would give the same answer). My statistics training is rather minimal, so I can't think what distribution the latter would have... --Tango (talk) 11:54, 26 June 2009 (UTC)[reply]

But the OP did ask about a normal variable (see title), specifically about the probability that it will be 36 SD from the mean. He wanted to express this probability in dice.

The number of throws until you get something other than 6 is distributed geometrically, which is a special case of the negative binomial distribution. -- Meni Rosenfeld (talk) 12:32, 26 June 2009 (UTC)[reply]

Implementing the Q-function#Bounds:

{\frac {x}{1+x^{2}}}\cdot {\frac {1}{\sqrt {2\pi }}}e^{-x^{2}/2}<Q(x)<{\frac {1}{x}}\cdot {\frac {1}{\sqrt {2\pi }}}e^{-x^{2}/2}

in the J (programming language) gives

  x =. 36              NB. x is 36
  a =. % %: o. 2       NB. a is the reciprocal of the square root of 2 pi
  b =. ^ - -: *: x     NB. b is the exponential of minus half square of 36
  6 ^. % b * a % x     NB. base 1/6 logarithm of upper bound for Q(36)
364.169

So it's like throwing 364 sixes in a row with a fair die. Bo Jacoby (talk) 11:59, 26 June 2009 (UTC).[reply]

June 26

Small World Experiment

Many people have heard something along the lines that, "You are 6 handshakes away from the president." See http://en.wikipedia.org/wiki/Small_world_phenomenon

However, I have found nothing relating specifically to wikipedia. With all its internal links, my question is if there is any research or answer on the statistical "closeness" between any two pages on Wikipedia. For example, the number for the "closeness" between small world experiment and Deborah Sampson is 2 as one can navigate to Deborah Sampson by clicking on the link "Sharon, Massachusetts" on the small world experiment page and then on "Deborah Sampson"

174.18.97.68 (talk) 20:46, 26 June 2009 (UTC)[reply]

Studies into this have certainly been done. I'll try and find one for you. --Tango (talk) 21:23, 26 June 2009 (UTC)[reply]

Wikipedia:Six degrees of Wikipedia (WP:WHAAOE!). See also, [1]. --Tango (talk) 21:28, 26 June 2009 (UTC)[reply]

June 27