Jump to content

Wikipedia:Reference desk/Mathematics: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Scsbot (talk | contribs)
edited by robot: adding date header(s)
Line 190: Line 190:
:::As for your second question, statistics of low-probability events are notoriously difficult to estimate (in some cases, you can handwave an argument about how the [[central limit theorem]] is a bad approximation on the tails). The natural hypothesis would be that your house or its inhabitants have nothing particular and thus it has the same statistics as the ensemble-average house. But if (for instance) the witch-doctor gives you a powerful anti-fire charm, your house may well have different statistics (it is more likely to catch fire because you will cook carelessly). By how much, it is not data from other (non-witched) houses that will tell it... [[User:Tigraan|<span style="font-family:Tahoma;color:#008000;">Tigraan</span>]]<sup>[[User talk:Tigraan|<span title="Send me a silicium letter!" style="color:">Click here to contact me</span>]]</sup> 10:17, 9 March 2017 (UTC)
:::As for your second question, statistics of low-probability events are notoriously difficult to estimate (in some cases, you can handwave an argument about how the [[central limit theorem]] is a bad approximation on the tails). The natural hypothesis would be that your house or its inhabitants have nothing particular and thus it has the same statistics as the ensemble-average house. But if (for instance) the witch-doctor gives you a powerful anti-fire charm, your house may well have different statistics (it is more likely to catch fire because you will cook carelessly). By how much, it is not data from other (non-witched) houses that will tell it... [[User:Tigraan|<span style="font-family:Tahoma;color:#008000;">Tigraan</span>]]<sup>[[User talk:Tigraan|<span title="Send me a silicium letter!" style="color:">Click here to contact me</span>]]</sup> 10:17, 9 March 2017 (UTC)
:If a variable ''i'' has a poisson distribution with mean value ''m'', then the standard deviation is √''m''. So estimate ''i''≃''m''±√''m''. If you do know the value, ''i'', but not the mean value ''m'', then ''m'' has a gamma distribution, ''m''≃(''i''+1)±√(''i''+1). So if you observe the value ''i''=0, estimate ''m''≃1±1. [[User:Bo Jacoby|Bo Jacoby]] ([[User talk:Bo Jacoby|talk]]) 09:47, 9 March 2017 (UTC).
:If a variable ''i'' has a poisson distribution with mean value ''m'', then the standard deviation is √''m''. So estimate ''i''≃''m''±√''m''. If you do know the value, ''i'', but not the mean value ''m'', then ''m'' has a gamma distribution, ''m''≃(''i''+1)±√(''i''+1). So if you observe the value ''i''=0, estimate ''m''≃1±1. [[User:Bo Jacoby|Bo Jacoby]] ([[User talk:Bo Jacoby|talk]]) 09:47, 9 March 2017 (UTC).

= March 9 =


= March 10 =
= March 10 =

Revision as of 12:06, 10 March 2017

Welcome to the mathematics section
of the Wikipedia reference desk.
Select a section:
Want a faster answer?

Main page: Help searching Wikipedia

   

How can I get my question answered?

  • Select the section of the desk that best fits the general topic of your question (see the navigation column to the right).
  • Post your question to only one section, providing a short header that gives the topic of your question.
  • Type '~~~~' (that is, four tilde characters) at the end – this signs and dates your contribution so we know who wrote what and when.
  • Don't post personal contact information – it will be removed. Any answers will be provided here.
  • Please be as specific as possible, and include all relevant context – the usefulness of answers may depend on the context.
  • Note:
    • We don't answer (and may remove) questions that require medical diagnosis or legal advice.
    • We don't answer requests for opinions, predictions or debate.
    • We don't do your homework for you, though we'll help you past the stuck point.
    • We don't conduct original research or provide a free source of ideas, but we'll help you find information you need.



How do I answer a question?

Main page: Wikipedia:Reference desk/Guidelines

  • The best answers address the question directly, and back up facts with wikilinks and links to sources. Do not edit others' comments and do not give any medical or legal advice.
See also:


March 3

Excepting February alone ...

Overheard in a shop on Tuesday afternoon:

Customer - Today is the last day of February?

Shopkeeper - No - yes, next in four years' time!

What do people know about the lengths of the months, which years are leap years and the distance between them? This question was asked last year, but the discussion was derailed when one responder told another to get off my lawn. 86.128.236.125 (talk) 23:07, 3 March 2017 (UTC)[reply]

See perpetual calendar. You can tell which day will be which date for as long as we use our current calendar, because it follows a strict 400-year cycle of 14 possible years. --Jayron32 00:42, 4 March 2017 (UTC)[reply]


March 4

Find the flaw

This appears to prove it's impossible to draw a perfect isosceles right triangle:

The ratios of the sides are 1-1-sqrt(2). Sqrt(2) is irrational. However, in the real world, all lengths are an integer multiple of the Planck length, proving that the sides must all be rational. Any flaw?? Georgia guy (talk) 00:19, 4 March 2017 (UTC)[reply]

False precision. In the real world, no measurement is infinitely reducible. You can draw any arbitrarily perfect isosceles triangle given the precision of the measuring device you use to verify it. That is, no real measuring device can tell the difference between a perfect and imperfect isosceles right triangle beyond a certain level of precision, which is good enough for the real world. --Jayron32 00:40, 4 March 2017 (UTC)[reply]
That's not what false precision is, as the article states. My bathroom scale that lists my weight to the tenth of a pound, but is only accurate to a pound, is an example of that. Then there's the joke: "He mister, how old is that dinosaur skeleton ?", "Well, it was 200 million years old when I started working at the museum, so now it would be 200 million and 7 years old". StuRat (talk) 00:46, 4 March 2017 (UTC)[reply]
This discussion has been closed. Please do not modify it.
The following discussion has been closed. Please do not modify it.


Every time you speak, you verify your intelligence. --Jayron32 02:09, 4 March 2017 (UTC)[reply]
The same is true of you. As the first sentence in your link states: "False precision ... occurs when numerical data are presented in a manner that implies better precision than is justified; since precision is a limit to accuracy, this often leads to overconfidence in the accuracy as well." This applies to my examples, but not the the OP, as no numeric data has been presented, and there is no overconfidence in accuracy resulting from this. StuRat (talk) 04:48, 4 March 2017 (UTC)[reply]
If you've got a ruler with markings as small as a Planck length, I'd love to see it. --Jayron32 05:03, 4 March 2017 (UTC)[reply]
You still don't get it, so I will explain further. Had somebody reported the length of the edges of a drawn triangle in units beyond what they could measure, then that would be an example of false precision. But that has absolutely nothing to do with this question, as nobody made such a claim. StuRat (talk) 05:09, 4 March 2017 (UTC)[reply]
So, you mean like trying to draw a triangle and reporting it's size in units of Planck length, right? Like the OP asked?--Jayron32 13:22, 4 March 2017 (UTC)[reply]
No, they didn't ask about that at all. That would be false precision, as Planck length is LESS than the actual accuracy of a drawing, but they were complaining that Planck length is MORE than zero. StuRat (talk) 14:43, 4 March 2017 (UTC)[reply]
What is this level of precision; specifically its base 10 logarithm?? Has anyone proposed technology that will strengthen it?? Georgia guy (talk) 00:44, 4 March 2017 (UTC)[reply]
No flaw, it is impossible to draw one exactly. A bit silly to worry about it, though. StuRat (talk) 00:46, 4 March 2017 (UTC)[reply]
Why do you believe "in the real world, all lengths are an integer multiple of the Planck length"? (Not that freeing yourself from that common misunderstanding will allow for construction of infinitely precise physical objects.) -- ToE 05:40, 4 March 2017 (UTC)[reply]
That is more of a 'fat finger' type limit and applies to the sides of the triangle as well as the hypotenuse. One can't meaningfully point more accurately than that, and one can't add up some zillions of fat finger widths to get some precise length! Dmcq (talk) 12:06, 4 March 2017 (UTC)[reply]

It's also a problem in pure mathematics. Count Iblis (talk) 19:46, 4 March 2017 (UTC)[reply]

No it's not. Wildberger is a known crank. Deacon Vorbis (talk) 18:41, 5 March 2017 (UTC)[reply]
and Gregory Chaitin and Emile Borel, so he is in good company. Count Iblis (talk) 01:20, 6 March 2017 (UTC)[reply]
Nothing in the linked article supports even vaguely a statement like "[something to do with the square root of two] presents a problem in pure mathematics." And it is deeply implausible that Borel would have agreed with any such statement. --JBL (talk) 01:34, 6 March 2017 (UTC)[reply]
In fact, Chaitin rejects the sort of limited thinking that you appear to be advocating quite explicitly: "In spite of the fact that most individual real numbers will forever escape us, the notion of an arbitrary real has beautiful mathematical properties and is a concept that helps us to organize and understand the real world. Individual concepts in a theory do not need to have concrete meaning on their own; it is enough if the theory as a whole can be compared with the results of experiments." --JBL (talk) 01:57, 6 March 2017 (UTC)[reply]

theory of decomposition spaces

Can somebody tell me what this is and if we have an article or field about it on hand? It comes from this article: Georg Aumann. Thanks. scope_creep (talk) 12:56, 4 March 2017 (UTC)[reply]

Try Manifold decomposition. --RDBury (talk) 21:36, 5 March 2017 (UTC)[reply]

Psi Weights and ARIMA models

In this class lesson page, the second example asks, "Suppose that an AR(1) model is x_t=40+.6(x_t-1)+w_t." How does one find this formula from an AR model? If I have an ARIMA(1,0,1), how do I create a formula like the one on the page, in the form "x_t=..." Furthermore, how does one arrive at w_t and the variance of w_t?

My goal is construct the prediction intervals, but I am new to ARIMA models, so I might have missed some of the fundamentals in my learning. I appreciate any guidance in this regard. Schyler (exquirere bonum ipsum) 22:05, 4 March 2017 (UTC)[reply]

The general form of the equation might be just assumed. The specific numerical values of the parameters are estimated from a data set. We have pretty good articles: Autoregressive model and Autoregressive integrated moving average. If reading them leaves you with more specific questions, I'll try to answer them. Loraof (talk) 00:21, 5 March 2017 (UTC)[reply]
Also Order of integration. Loraof (talk) 00:44, 5 March 2017 (UTC)[reply]

Okay, I think it's identifying the lag operator is the fundamental I'm missing. How do I find the lag operator of a time series? Schyler (exquirere bonum ipsum) 01:09, 5 March 2017 (UTC)[reply]

I assume you mean how do you find the maximum lag (the lag operator L or sometimes B is just a function that moves your focus back one period: so Lxt = xt–1, and L2xt = xt–2). For the choice of a maximum lag, see Autoregressive model#Choosing the maximum lag and the wikilink therein. Loraof (talk) 01:19, 5 March 2017 (UTC)[reply]
Okay, I don't think that's it, then. I'm asking when I don't know what I don't know. So,

defines ARIMA (p, d, q). My ARIMA is of order (1,0,1), so that means my model is formulated by

Correct? But then how do I identify phi, L, delta, and theta? I appreciate your consideration of this problem I am having. Schyler (exquirere bonum ipsum) 01:50, 5 March 2017 (UTC)[reply]

The symbol L is not something to be identified; it is the name of a function. So means Thus your last equation can be written equivalently (noting that ) as
You need to estimate and For the AR parameter by itself you would go by the lengthy section Autoregressive model#Calculation of the AR parameters. For the MA parameter we have the very short section Moving-average model#Fitting the model. For your case of a combined AR and MA model (called an ARMA model) all we have is Autoregressive integrated moving average#Software implementations. Maybe the manual for one of the software packages would tell you how it is estimated, or maybe you would be satisfied to just tell the package to give you the results. All I can remember about it is that ordinary least squares can be used for AR models but not for MA models, which typically use maximum likelihood. Loraof (talk) 03:33, 5 March 2017 (UTC)[reply]
Also Autoregressive moving average model#Implementations in statistics packages. Loraof (talk) 04:23, 5 March 2017 (UTC)[reply]
Here, I'll take a leap of faith. Using MLE, I can identify the coefficients of an ARMA (1,1) model. If the coefficients of my model at ar=0.35 and ma=0.7, then I can graph said model of a time series with the equation . Yes? Also, psi weights are given by . Therefore, the first two psi weights of my equation are . Is my faith rewarded? Finally, I'm still unclear as to how to get the . I really appreciate the detailed guidance here. It's a difficult topic for me. Schyler (exquirere bonum ipsum) 19:23, 5 March 2017 (UTC)[reply]
With an ARMA(1, 1) model and with AR coefficient 0.35 and MA coefficient 0.7, the equation would be (don't forget the current error term). (Note that and are two different alternative notations for the same thing, so I don't know what you mean by ) Here your AR order (longest lag) is 1, so there is no such thing as I don't know how an estimate of the variance of the error term is obtained. Loraof (talk) 20:43, 5 March 2017 (UTC)[reply]
They say repeatedly that . I understand why that's true, but I'm not sure if that might be helpful. Furthermore, there is a given, that the standard error of .
This thread is approaching a singularity. I am greatful for your help in this. By the way, this is for a paper I am preparing on an educational intervention. Schyler (exquirere bonum ipsum) 01:41, 6 March 2017 (UTC)[reply]
Two quick points: (1) Note that their notation is defined in their notation section at the start as the m-period ahead forecast from time n (and not as x raised to a power, which is what it looks like). (2) They don't say which would be impossible since the variance is positive. Instead, they say that Loraof (talk) 02:08, 6 March 2017 (UTC)[reply]

March 6

Asking questions

How did yesterday's section end up empty? 32ieww (talk) 02:28, 6 March 2017 (UTC)[reply]

Questions are organised by the date they are asked but on some days nobody asks any questions. 80.5.88.48 (talk) 07:12, 6 March 2017 (UTC)[reply]
Can we use mathematics to predict which days are those where nobody dares to ask any questions? 148.182.26.69 (talk) 05:29, 7 March 2017 (UTC)[reply]
No. -- Jack of Oz [pleasantries] 06:21, 7 March 2017 (UTC)[reply]
We can't make absolute predictions about how many questions will be asked on a given day. But we could gather data from the past and use it to construct a statistical model to forecast the number of questions on a future day. For example, looking back at February archive of this desk, we can see that there were 12 days with no questions, 10 days with 1 question, 2 days with 2 questions, 2 days with 3 questions and 2 days with 4 questions. At the simplest level this might lead us to forecast that on most days there will either no questions or 1 question. For a more sophisticated model, we might try to match a probability distribution to the distribution of questions. Or we could build a model that takes into account the day of the week (are questions more or less likely at weekends ?). Or if we thought the number of questions on one day might depend on the number of questions on the day before then we could test this hypothesis and possibly build a Markov process model ... Gandalf61 (talk) 09:29, 7 March 2017 (UTC)[reply]
Was it empty before you came here? Count Iblis (talk) 08:58, 7 March 2017 (UTC)[reply]
I checked the Mathematics desk back to January 2016 and there is no correlation between the number of questions asked and the day of the week. You can easily work out the average time between successive blank days and, from the number of questions asked over a period, the average number of questions asked each day. 80.5.88.48 (talk) 10:28, 7 March 2017 (UTC)[reply]
One method that won't work is to figure that Q's are randomly distributed to days, since they often come in "runs". That may either be because one person asks several Q's in short order, or because one Q inspires other related Q's. StuRat (talk) 15:28, 7 March 2017 (UTC)[reply]
Both explanations of the runs are plausible, but runs are also consistent with randomness (indeed, the absense of runs would contradict randomness). See the lead of Statistical randomness. Loraof (talk) 03:42, 8 March 2017 (UTC)[reply]
  • By the way, there's a flaw in the archive: when no questions are asked on March 5, the forward link from the March 4 archive goes to the current page rather than to March 6. —Tamfang (talk) 03:48, 8 March 2017 (UTC)[reply]

I went to the Mathematics archive and the link from the final archive page for February did indeed link to the current desk as Tamfang says (4 March is still on the current desk). However, when I went back there the link from the final February page correctly pointed to the March archive (could this be something to do with the fact that the bot has only this morning begun archiving March?) Especially on the Humanities page you will often find that a run of questions is caused by current events. 80.5.88.48 (talk) 08:20, 8 March 2017 (UTC)[reply]

March 7

going through all edges of the Regular polyhedra?

Is there an easy way to figure out what the minimum number of edges that a path would have go through to trace out all of the edges of each of the Regular Polyhedra? The Octahedron has a Hamiltonian Cycle, so that one is 12, but for the others they all have an odd number of edges at each vertex.Naraht (talk) 15:58, 7 March 2017 (UTC)[reply]

Well, there's a path of length 7 for a tetrahedron, which is one short of being Hamiltonian, so that one must be minimal, but I don't know about the others. Deacon Vorbis (talk) 17:14, 7 March 2017 (UTC)[reply]
Oops, that's an Eulerian path that you're describing, not Hamiltonian. Deacon Vorbis (talk) 17:18, 7 March 2017 (UTC)[reply]
It seems like this is a special case of the route inspection problem. That talks about closed circuits though, and I'm not sure how dropping that restriction would affect the answer (or which case you're looking for). Deacon Vorbis (talk) 18:03, 7 March 2017 (UTC)[reply]
OK progress so far. For the Tetrahedron, the answer is 7, round the bottom, up to the apex and then down and back up one of the other sides and then the last. For the Cube it is between 12 (the number of edges) and 15, as a hamiltonian cycle on the vertices will cover 8 of the edges, with the ability to go out and back on the other four, but one edge will be at the beginning/end so it will only have to be done once. I think that the same logic as the cube can be used on the Dodecahedron which means that the number is between 30 (the number of edges) and 39. Not sure on a whether an icosahedron has two hamiltonian cycles that don't share an edge.Naraht (talk) 16:32, 8 March 2017 (UTC)[reply]
Let's say you have a path which visits each edge at least once. For each time the path follows the same edge twice, split the edge to make two edges, you now have an Eulerian path on graph with more edges. A graph with an Eulerian path can have at most two vertices with odd degree. So if you're doubling edges to make the graph Eulerian, and there are 2k vertices of odd degree, you're going to need at least k-1 additional edges. So for the tetrahedron you need 1 additional edge, cube 3 additional edges, dodecahedron 9 edges, icosahedron 5 edges. (I'm skipping the octahedron because it's already done.) So the minimum length of a path is tetrahedron 6+1=7, cube 12+3=15, dodecahedron 30+9=39, icosahedron 30+5=35. The procedure you give establishes an upper bound for the minimum eldge which is equal to the values given. --RDBury (talk) 18:56, 9 March 2017 (UTC)[reply]
I'm not sure that this quite works. The "at least k-1 additional edges" means that the values that you've set are minima. That seems very different than showing that there is a 35 edge solution for the icosahedron.Naraht (talk) 19:48, 9 March 2017 (UTC)[reply]
What is missing is to find a matching in the graph of that size. This is not hard to do case-by-case. --JBL (talk) 21:05, 9 March 2017 (UTC)[reply]

March 8

Local or national statistics as predictor of future events

A is a geographical region (say, a country) with population p (for the sake of simplicity, let's assume that the population is constant). It consists of a number of subregions Bi, each with population pi.

The number of times a certain event occurs each year in A follows Po(a), where a is an unknown constant.

The number of times a certain event occurs each year in Bi follows Po(bi), where bi is an unknown constant. We can assume that (this is not exactly true, but probably the best available model – maybe a binomial distribution would be just as correct).

Statistical data are available: the number of times the event occurred in A during n years are α0, α1, ..., αn–1, and the number of times the event occurred in Bi during the same years are βi,0, βi,1, ..., βi,n-1.

The best estimate of a clearly is . But what is the best estimate of bi?

Intuitively, local statistics should be better at predicting local events. But if pi << p, the values of βi,j can be very small (perhaps even many of them zero) and subject to relatively large random fluctuations, so at what point might this insecurity dominate over the difference between subregions? —JAOTC 09:22, 8 March 2017 (UTC)[reply]

The crucial point is your assertion that "We can assume that ", which (as written) does not mean much but I suspect what you meant is , which simply means (i.e. the rate of the event is proportional to population, with the same proportionality constant in all subregions; cf. Poisson_distribution#Sums_of_Poisson-distributed_random_variables).
If you know that from theoretical arguments, then indeed using the global estimate is better because of the law of large numbers. You don't even care about the sampling by subregions.
However, if you are even looking at the reporting by subregions, it is likely that this assertion is merely the null hypothesis waiting to be disproved. In that case, there is a famous quote that applies (Judging Books by Their Covers, Richard P. Feynman):

Nobody was permitted to see the Emperor of China, and the question was, What is the length of the Emperor of China's nose? To find out, you go all over the country asking people what they think the length of the Emperor of China's nose is, and you average it. And that would be very "accurate" because you averaged so many people. But it's no way to find anything out; when you have a very wide range of people who contribute without looking carefully at it, you don't improve your knowledge of the situation by averaging.

The situation is a bit more complex here, but if you have 10 estimates from people who saw the emperor's nose and 1000 from people who did not, adding the last 1000 will not "improve" your estimate by any reasonable meaning of the word "improve". TigraanClick here to contact me 12:15, 8 March 2017 (UTC)[reply]
Actually, Feynman uses the wrong term here. If you have a large sample size, then your answer is very precise, but without good data to sample, the answer would not be very accurate. Precision is the closeness of a set of measurements to their average value; a larger sample size should become closer and closer to an ideal distribution, so larger sample sizes are more precise. Accuracy is the closeness of a set of measurements to the true value (not the average value), so since no one of the billion polled Chinese people actually knew the size of the Emperor's nose, the average is not likely to be very accurate (even if it were very precise). Most introductory textbooks in the sciences or statistics will cover the difference between accuracy and precision, but even very smart people confuse the two concepts, as the usually astute Mr. Feynman does above. --Jayron32 18:31, 8 March 2017 (UTC)[reply]
[[1]]. Bo Jacoby (talk) 21:02, 8 March 2017 (UTC).[reply]
To whom, and in what context, is your self-aggrandizing link being provided? --Jayron32 02:52, 9 March 2017 (UTC)[reply]
To mr JAO, in the context of statistical prediction, which is what his question is about. Bo Jacoby (talk) 06:31, 9 March 2017 (UTC).[reply]
Thank you for your answer. You are quite right that the crucial point is my third line. I had difficulty wording it, which in my experience probably means I had difficulty thinking it. For the record, does not in general hold – there are systematic demographic differences in the subregions. But they are also not uncorrelated. The heart of the problem is that we don't really know the nature or strength of that correlation.
What you are saying makes a lot of sense. But there's still something unsettling about the results, if it leads to predicting that the risk of a certain event is 0 in a subregion just because it hasn't happened yet in that particular subregion. Of course, this can (and should) be alleviated by computing confidence intervals, making it clear that the risk is not exactly 0, so possibly this is not really a problem. But where does the argument end? Presumably, if you've lived in a house since it was built, you know how frequent fires have been in that house historically, which is probably 0 times per year. You have seen the Emperor's nose, but is this really a better measurement of the fire risk in your house than the statistics from your local fire brigade?
Also, your idea of looking at as a null hypothesis may be a better way forward than guessing something about the distribution of bi. If, for a certain i, local statistics disprove this hypothesis, then we know for sure that a is irrelevant. —JAOTC 07:36, 9 March 2017 (UTC)[reply]
there are systematic demographic differences in the subregions, [but] they are also not uncorrelated. The heart of the problem is that we don't really know the nature or strength of that correlation.
All depends also about what "correlated" means. If you have only one data point (how many events) per subregion, you cannot do much - at best you can reject the null hypothesis that the rate is the same everywhere, but you cannot correct the observed occurence in some region to deduce the distribution parameter. (You can compute confidence intervals but you make assumptions along the way - in effect you are usually assuming silently a prior probability distribution.)
As for your second question, statistics of low-probability events are notoriously difficult to estimate (in some cases, you can handwave an argument about how the central limit theorem is a bad approximation on the tails). The natural hypothesis would be that your house or its inhabitants have nothing particular and thus it has the same statistics as the ensemble-average house. But if (for instance) the witch-doctor gives you a powerful anti-fire charm, your house may well have different statistics (it is more likely to catch fire because you will cook carelessly). By how much, it is not data from other (non-witched) houses that will tell it... TigraanClick here to contact me 10:17, 9 March 2017 (UTC)[reply]
If a variable i has a poisson distribution with mean value m, then the standard deviation is √m. So estimate im±√m. If you do know the value, i, but not the mean value m, then m has a gamma distribution, m≃(i+1)±√(i+1). So if you observe the value i=0, estimate m≃1±1. Bo Jacoby (talk) 09:47, 9 March 2017 (UTC).[reply]

March 10