Wikipedia:Reference desk/Mathematics

From Wikipedia, the free encyclopedia
Jump to: navigation, search

The Wikipedia Reference Desk covering the topic of mathematics.

Welcome to the mathematics reference desk.
Want a faster answer?

Main page: Help searching Wikipedia

How can I get my question answered?

  • Provide a short header that gives the general topic of the question.
  • Type ~~~~ (four tildes) at the end – this signs and dates your contribution so we know who wrote what and when.
  • Post your question to only one desk.
  • Don't post personal contact information – it will be removed. We'll answer here within a few days.
  • Note:
    • We don't answer (and may remove) questions that require medical diagnosis or legal advice.
    • We don't answer requests for opinions, predictions or debate.
    • We don't do your homework for you, though we’ll help you past the stuck point.

How do I answer a question?

Main page: Wikipedia:Reference desk/Guidelines

  • The best answers address the question directly, and back up facts with wikilinks and links to sources. Do not edit others' comments and do not give any medical or legal advice.
See also:
Help desk
Village pump
Help manual

February 22[edit]

February 26[edit]

Sample size and confidence levels[edit]

I know that logically I should have the same confidence in a statistical test that uses n=50 and gives a 95% confidence level as one that has n=1000, also with a 95% confidence level. But my gut feeling is to trust the one with the bigger sample size more. Is there any basis for this feeling? Bubba73 You talkin' to me? 09:31, 26 February 2015 (UTC)

No there's no good reason for that though someone might spot a problem with the prior hypothesis with the larger sample. For the test with n=50 the and 95% confidence if you looked at the figures you'd probably think the difference should be blindingly obvious whereas for n=1000 your intuition would say it was still iffy. So you'd probably have the exact opposite gut feeling if you actually looked at the raw data. Dmcq (talk) 11:33, 26 February 2015 (UTC)
One little thing might matter - an error in one data point out of 50 is more likely to change the conclusion than the error in one data point out of 1000. Bubba73 You talkin' to me? 23:03, 26 February 2015 (UTC)
Not necessarily at all. The 95% confidence interval will be much wider with only 50 samples rather than 1000. Dmcq (talk) 16:32, 28 February 2015 (UTC)
You do know that statistics from a sample size of n=50 is different from statistics from a sample size of n=1000 EVEN IF THE CONFIDENCE LEVEL IS EXACTLY THE SAME!!! The statistics from n=1000 has a smaller uncertainty or error interval than the one from n=50.
Just because two statistics have the same confidence level DOES NOT MEAN they have the same error interval. Naturally you want the result from the statistics with the smallest error interval. You would be a fool to choose n=50 over n=1000 unless you do not care about the error interval or if the cost of gathering a sampling point is very very expensive. (talk) 00:55, 27 February 2015 (UTC)
I guess that is what I was getting at - the error interval. Is there an article that talks about the error interval? Bubba73 You talkin' to me? 00:14, 28 February 2015 (UTC)
If you want to read articles about the error, read below
Standard_error (talk) 05:22, 2 March 2015 (UTC)
I would say that one of the underlying issues is the following. Whenever you are applying a statistical test, you are generally going to be making some assumption about the underlying distribution of the data. For example, you might assume the expected values should reflect a constant plus random noise drawn from a normal distribution. When you calculate a 95% threshold you are essentially saying, given the model I expect, how much confidence do I have that my observations conform to that model. However, in the real world, statistical models often prove to be inexact. You might assume random variations that follow a normal distribution, but the truth is a Laplace distribution or something else. If statistics shows that your data doesn't fit the model, is that because you have discovered a physically important signal, or because your understanding of the background noise wasn't very good? With small numbers of data points, one often has to implicitly assume that the underlying model is reasonable (e.g. normally distributed errors), but when you have lots of data you can often test those assumptions and justify more rigorous conclusions. Dragons flight (talk) 00:42, 28 February 2015 (UTC)

Calculation method help[edit]

Need help with correct method for calculating this:

I have membership id (which might have multiple members in it) and member id which represents an individual member of an account. I am trying to calculate average deposit / deposit date for memberships as well as for individual members.

example table below:

Membership ID Member ID Deposit Date Deposit Amount
121 1 23-04-2013 500
121 2 07-04-2013 500
131 46 23-04-2013 100
121 1 01-06-2013 900
131 46 01-06-2013 340
541 91 23-04-2013 500
679 51 23-04-2013 500
679 1 23-04-2013 500

— Preceding unsigned comment added by (talk) 14:11, 26 February 2015

I've answered at the same question on the miscellaneous desk. Please don't post the same question on more than one desk. Dbfirs 23:23, 26 February 2015 (UTC)

February 27[edit]

Box Plot[edit]

How does one draw a box and whisker plot based on a set of numbers ordered from least to greatest and divided into four quartiles?Ohyeahstormtroopers6 (talk) 22:10, 27 February 2015 (UTC)

Our article Box plot seems clear enough.→ (talk) 23:56, 27 February 2015 (UTC)

February 28[edit]


When, in a data set, there are multiple numbers which together make up the median(i.e. 3,6,9,10,12,11), what do you do with those two numbers to determine the median?Ohyeahstormtroopers6 (talk) 01:03, 28 February 2015 (UTC)

As illustrated in the lead of the article Median, for an even number of data points in the data set, the median is the mean of the centre-most pair of data points. —Quondum 03:03, 28 February 2015 (UTC)

Thank You. 2602:306:C541:CC60:6866:CFB1:2D1B:5526 (talk) 05:38, 28 February 2015 (UTC)

Identify partial differential equation[edit]

These are really two and a half questions.

I stumbled on a partial differential equation:  u=u(x,y), v=v(x,y), u_x v_y - u_y v_x =1.

Question one: How can I google or otherwise search the net for such kind of equations?

Question two: does anyone happen to recognize this as a famous and named equation I can search by name?

Question 2 1/2: u(x,y), v(x,y) rings a (probably wrong) bell, making me think of Cauchy-Riemann equation. I guess this would be a special case of the equation, giving me  u_x^2 + u_y^2 = 1. This looks somehow familiar but I neglected math for far too long to see what that would be. (talk) 19:17, 28 February 2015 (UTC)

I don't know the name of the equation, but in classical mechanics, the left hand side is a Poisson bracket for u and v if x and y were canonical coordinates. --Mark viking (talk) 22:28, 28 February 2015 (UTC)
I have to admit that, on other occasions, I had several utterly unsuccessful tries on getting a grasp on Poisson brackets. I guess there is some tiny ugly heuristic clue that I'm to dump to pick up myself and everyone else is too tired to mention. (talk) 23:28, 28 February 2015 (UTC)
I'd call this the "Jacobian determinant equation", or something like that, and Google finds several useful results for this search term. It is one equation for two unknown functions u,v, so it is underdetermined.
If u+iv additionally solves Cauchy-Riemann, the only solutions will be affine functions u+iv(z)=az+b where a is a complex number of modulus 1.
The equation  u_x^2 + u_y^2 = 1 is called the eikonal equation. —Kusma (t·c) 12:19, 2 March 2015 (UTC)

Regression with a graph (machine learning)[edit]

Suppose I have a graph of nodes and connections between nodes. The independent variables are this graph and other numerical values assigned to each node. The dependent variable is a point value for the node. How could I use regression to predict the point value of new nodes, based on their connections in the network? Is regression even the right tool to use here? (talk) 20:55, 28 February 2015 (UTC)

Just trying to understand the problem here. So the nodes on the graph each have a value, and that value is in some way based on the other numerical values assigned to that node and nearby nodes, right ? If the relevant "graph distance" is simply the smallest number of nodes to get to each node, then you might use that in a second regression analysis, after first doing a regression analysis without considering nearby nodes.
For example, let's say each node's dependent value is 90% based on the (single, in this example) independent value of that node, and 10% based on the node(s) one step away. I think it might be better to look at as few variables at a time as possible. I think you would be more likely to find convergence that way.StuRat (talk) 21:19, 28 February 2015 (UTC)
Yes, that's a good statement of the problem. (talk) 21:28, 28 February 2015 (UTC)
This sounds indeed like a regression problem. However "regression" is a name for a type of problems, not a specific tool, so you can't "use regression", you need to choose which regression to use.
Usually, either the problem is simple enough that a standard cookie-cutter technique can handle it, or you have a huge search space and you need human intelligence and domain knowledge to constrain it. Your question seems to belong to the second category - so without a description that is less abstract, I don't think we can really help with a solution that will actually give good results for the problem you have at hand. -- Meni Rosenfeld (talk) 10:47, 1 March 2015 (UTC)

Statistical tests for normality[edit]

If the Kolmogorov-Smirnov and Shapiro-Wilk give significantly different results, how does one resolve this ambiguity? All the best: Rich Farmbrough21:50, 28 February 2015 (UTC).

It is not really an ambiguity. As the articles you pointed to state, these are different tests, with Shapiro-Wilk giving greater power for a test of normality and Kolmogorov-Smirnov being a nonparametric test that could be applicable to many different kinds of distributions. So it is entirely possible that the tests give different results. Without more details, it is hard to say anything more specific. --Mark viking (talk) 22:14, 28 February 2015 (UTC)
My "go to" first test is Jarque-Bera, because it's so intuitive - read the instructions carefully, as you need to be aware of the sample size - but it's certainly not the best. It's usually a good place to start, as you just drop the data into Excel and look at the skewness and kurtosis immediately, and those values on their own are a useful place to start thinking about the data. If you're using K/S you may have accidentally re-invented the Lilliefors Test. My personal preference would be to prefer S/W to K/S, but see our articles on the weaknesses in both tests. Where I have found K/S very useful is in fitting general stable distributions: though it takes some care and attention you might use this approach to check the stability parameter. RomanSpa (talk) 23:40, 28 February 2015 (UTC)
Thanks, both, for the replies. I certainly draw some warm fuzzies from the replies, and will look into these tests in a little more depth. All the best: Rich Farmbrough23:45, 28 February 2015 (UTC).
(My last edit got lost in an edit conflict, so I'll try to repeat it...) Re: fitting stable distributions: it's not a graceful process! Re: other tests: I had a colleague who swore by the Epps-Pulley test, which we don't have an article on; the raw reference is "Epps, T. W., and Pulley, L. B. (1983). A test for normality based on the empirical characteristic function. Biometrika 70, 723–726". I haven't used E/P myself for a long time, but I do remember it was a pain in the ass to code up! E/P leads naturally to the BHEP test, which I don't really know, but have heard mildly positive comment on. I suspect it's also a nuisance to code up, alas! RomanSpa (talk) 23:56, 28 February 2015 (UTC)
There's obviously one other point, which we should really have put first: think about the data in practical terms, and ask yourself whether the underlying experiment is one that's likely to produce a normal distribution. Remember that what you're doing is modelling reality. There is always a model, so think about what the reality is that you're modelling: is a normal distribution a plausible outcome from the imagined/theoretical mechanism of the experiment. RomanSpa (talk) 00:04, 1 March 2015 (UTC)

March 1[edit]

March 2[edit]

Find the missing perspective[edit]

This should be a fairly straightforward problem in 3D representation of 4D objects, but I'm having no luck in wrapping my head around this. At first I thought I could treat each as a cube within a 4-cube (since there should be eight of those), with each cell being adjacent to a corresponding cell of the cube, but I can't get the seven below to line up right, so I now believe that I'm thinking about the problem wrong (i.e., I shouldn't think of it as an unfolded tesseract but as a 2x2x2x2 something being projected in some other way). Any thoughts or advice would be greatly appreciated. I've exhausted all that I can really think of to solve this sort of problem, since it doesn't seem to fold the right way - assuming that's the right way to think about this.

Given the seven 3D projections of a hyper-object, provide the eighth:

(Each █ is a cell, with the left side being one layer and the right being the other layer of the 2x2x2 slice.)

_ _
_ _ _ _
_ _ _
_ _
_ _
_ _ _
_ _
_ _
_ _ _
_ _ _
_ _ _
_ _
_ _
_ _ _

Spability951 (talk) 02:39, 2 March 2015 (UTC)

Since there are only 4 dimensions to squish, if they're listing 8 different projections, they must be including projecting onto opposite faces as different projections. The projections onto opposite faces would be reflections of one another, and indeed 1 is a reflection of 5, 2 is a reflection of 7, and 3 is a reflection of 6. So that leaves 4. I don't see a pattern to how they're doing the reflections, so maybe any reflection of 4 will do? (talk) 02:13, 3 March 2015 (UTC)
Y'know, I think you're right. So these must be projections of the 4 choose 3 dimensions. Question is: do they "line up" properly to mean that actually would work? I've thought about it and I'm not really sure which pairs belong to XYZ, XYW, YZW or XZW yet.Spability951 (talk) 17:19, 3 March 2015 (UTC)

Finding a proper primitive polynomial[edit]

I'm interested in the Massey-Omura cryptosystem at the moment and I need a primitive polynomial for GF(2^{256}) because I want to use a 256-bit long key. Is there a list I can look at or a webpage with an appropriate calculator our there for me to obtain such a polynomial? — Melab±1 06:54, 2 March 2015 (UTC)

(As I mentioned last time, you could use nimber arithmetic; then you don't need a primitive polynomial.) -- BenRG (talk) 07:44, 2 March 2015 (UTC)
It's unclear to me, though, if the polynomial mask is applied after every multiplication/squaring or not. Also, the concept of nimbers isn't made clear enough in the article for me. — Melab±1 20:04, 2 March 2015 (UTC)
This page has an example primitive polynomial for \mathrm{GF}(2^{256}). -- (talk) 02:35, 3 March 2015 (UTC)


About this integral \int_a^b \! f(x)g(x)\,dx/\int_a^b \! g(x)\,dx

Can be equal

\int_a^b \! f(x)\,dx

Of course the two expressions can be equal, but they usually aren't. -- Meni Rosenfeld (talk) 17:24, 2 March 2015 (UTC)
Just to clarify for OP, the expressions are equal if g(x)=k, for any non-zero k, or if f(x)=k. OP might like to refresh their memory on the product rule and integration by parts. SemanticMantis (talk) 18:13, 3 March 2015 (UTC)
Consider an extremely simple example:
 f(x) = 1, g(x) = x
\int_a^b \! f(x)\,dx = b-a
\int_a^b \! g(x)\,dx = \int_a^b \! f(x)g(x)\,dx = (b^2-a^2)/2
\int_a^b \! f(x)g(x)\,dx/\int_a^b \! g(x)\,dx = 1
Tamfang (talk) 03:18, 3 March 2015 (UTC)

March 3[edit]


Help me to understand statics. — Preceding unsigned comment added by Hatchet412 (talkcontribs) 18:52, 3 March 2015 (UTC)

Try starting with our article on Statistics. (It does look as though that article, although not as detailed as Mathematical statistics, is a little too long for an introduction, and an Introduction to statistics might be a useful summary article.) Robert McClenon (talk) 19:17, 3 March 2015 (UTC)
For this, and many other math topics, Simple English Wikipedia is often a good place to start. Their article on statistics [1] is a little easier to get started with. SemanticMantis (talk) 20:56, 3 March 2015 (UTC)

Understanding STATICS is easy. Nothing moves.

Statics is the branch of mechanics that is concerned with the analysis of loads (force and torque, or "moment") on physical systems in static equilibrium, that is, in a state where the relative positions of subsystems do not vary over time, or where components and structures are at a constant velocity. (talk) 01:15, 4 March 2015 (UTC)

March 4[edit]

1.2% + 250 %[edit]

Is 1.2 percent plus 250 percent 2.512 percent or 4.2 percent? — Preceding unsigned comment added by (talk) 03:12, 4 March 2015 (UTC)

That depends on the base:
A) 1.2% (of 100) + 250% (of 100) = 251.2%.
B) 1.2% (of 100) + 250% (of 1.2%) = 4.2%.
Note that neither of my answers matches your first answer.StuRat (talk) 03:22, 4 March 2015 (UTC)
I don't see how to get your choices. Without a context, 1.2% + 250 % = (1.2 + 250)% = 251.2% = 251.2/100 = 2.512. Something else may be implied in a given context, especially if it doesn't literally say "1.2% + 250 %" but just something which could naively be thought to mean that. PrimeHunter (talk) 03:26, 4 March 2015 (UTC)
An example of the second case would be "The US Prime Rate was at 1.2% at the start of the year, but increased by 250% by the end of the year". StuRat (talk) 03:31, 4 March 2015 (UTC)
Right. I haven't seen notation like 1.2% + 250% for that but maybe it's used by some. PrimeHunter (talk) 04:10, 4 March 2015 (UTC)
I've never seen use "base" used to mean anything about percentages, except from Stu. Likewise, if that's what OP really means, then the usage by one is confirmed :) SemanticMantis (talk) 17:11, 4 March 2015 (UTC)