Wikipedia:Reference desk archive/Mathematics/2006 September 19

< September 18 Mathematics desk archive September 20 >
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions at one of the pages linked to above.
< August September October >

September 19

Outlier test for a inverse squared distribution

I'm analyzing the number of inbound links to news articles. The number of links to articles follows an inverse power law; the number with zero is impracticably large to count, the vast majority with any inbound links have only one link, far fewer have two, a rare handful have, say, 10 inbound links. The distribution changes from day to day, on a slow news day there will be few or no outstandingly popular articles, on a fast news day there may be many.

What'd I need is a statistical test for outliers on a given day. The distribution of quantities of inbound links is not normally distributed, and the depth of my statistical knowledge ends with normal distributions. I can't just pick an arbitrary threshold value because of the fast news day effect or other background effects like the internet growing larger, etc.

One crude idea I had was to copy the distribution for a given time period and reflect it over the y-axis, simulating the shape of a normal distribution, then calculate the standard deviation and use a standard test based on that. I can hear the mathematicians cringing.

Can you give me a better way?

--Phig newton 01:41, 19 September 2006 (UTC)

Your distribution is discrete. If you believe that your distribution satisfies a power law, then you want the geometric distribution. Find the mean number of links, q. Set p=1/q. The CDF of this distribution is 1-(1-p)^n. Setting this equal to x, your confidence level (typically x = 0.95, 0.99, or similar number near 1) and solving for n gives ln(1-x)/ln(1-p). This is the number of links required to exceed x "percent" of the population of pages for that set of data. -- Fuzzyeric 18:26, 19 September 2006 (UTC)
I don't see how you go from a power law probability distribution (as observed in scale-free networks) to a geometric distribution. Power law distributions are heavy-tailed, quite unlike geometric distributions. --LambiamTalk 12:08, 20 September 2006 (UTC)
Lambiam, you're right. I plead brain damage. The discrete power law distributions are the zeta distribution and its finitary counterpart the Zipf distribution. These are much more irritating to work with.
The best way to estimate the exponent, s, is to log-log plot your data, and find the slope of the line of best fit through your data. I.e., if your data is {(a,b),(c,d), ... }, find the slope of the line of best fit through {(log(a), log(b)), (log(c), log(d)), ...}, skipping instances where either variable is zero, since you can't take its log. The use of the Riemann zeta function and (generalized) harmonic numbers makes this distribution awkward to work with. If you can estimate your s and your confidence level, x, I can expand the CDF and solve for n to use as your outlier criterion. I recommend doing this at my talk page. -- Fuzzyeric 05:45, 23 September 2006 (UTC)
You might want to have a look at this book: Barnett, V. (1984). Outliers in Statistical Data. New York: John Wiley & Sons. Unknown parameter |coauthors= ignored (|author= suggested) (help) Vectro 18:34, 19 September 2006 (UTC)

Basic Statistics, Lottery Odds

My generally math-capable brain takes a vacation when it comes to statistics. I can handle the first question but the second one is giving me pause. I could probably figure it by poking at the web but I'm much more interested in understanding why and how than what the actual answer is. Could anyone please help?

1. Say there is a lottery in which we have fifty balls, numbered 1 through 50. Five of the balls are selected, at random. What is the probability that an arbitrary set of numbers matches the randomly chosen numbers?

2. Say we have have a similar lottery, except that the arbitrarily chosen numbers much match the randomly chosen numbers in sequence, that is, the player must choose which number is selected first, which is selected second, etc. What are the odds then?

Thanks, VermillionBird 04:06, 19 September 2006 (UTC)

Given a set of 50 balls, choosing 5 of them, there are 50!/(5!(50-5)!) combinations and 50!/(50-5)! permutations, respectively. Given a specific set of 5 balls, there is 1 combination and there are 5! possible permutations. To figure the odds, divide the number of ways it can match the desired results by the total number of possibilities. Black Carrot 05:18, 19 September 2006 (UTC)
Permutations! That's the word I forgot. Thanks. VermillionBird 22:34, 19 September 2006 (UTC)
Please refer to Permutations and combinations
1. basic combination ${\displaystyle {50 \choose 5}}$
2. basic permutation ${\displaystyle {\frac {50!}{45!}}}$

210.49.155.134 11:06, 19 September 2006 (UTC)

As to your question' why': There is exactly 1 chance in 50 that the first ball matches. If it doesn't, we failed; but if it did match, there is exactly 1 chance in 49 that the next ball matches; etcetera.JoergenB 11:31, 19 September 2006 (UTC)
Probability and statistics is largely about learning to count. The usual path to learning is to start with small, discrete examples where we can count explicitly; then move to larger discrete examples where we apply formulae we developed on the small examples; then tackle continuous examples.
An important general rule is how to combine probabilities. Let's try your second example, scaled down. Assume we have 10 balls labeled with the numbers 1 through 10, and that we will draw 3 balls in sequence without replacement. That is, after we draw the first ball we do not replace it in the pool of 10, so it cannot be drawn a second time.
How many different 3-ball sequences can we draw? We have 10 choices for the first ball, so there are exactly 10 "sequences" of 1 ball. Now there are 9 balls left in the pool, any one of which may be the second ball in a 2-ball sequence. Finally, we have 8 balls left to choose from for the third ball in a 3-ball sequence.
To sum up, we observe that for each different first ball chosen, we have a set of 2-ball sequences. Furthermore, for each different second ball we have a set of 8 final choices. Therefore the tally is 10×9×8 distinct sequences of 3 balls chosen from 10.
Explicitly, let's try choosing 2 balls from 3. The possible sequences are
 1 2 1 3 2 1 2 3 3 1 3 2
Our reasoning says we have 3 choices for the first ball, and 2 for the second ball, giving 3×2 possibilities, which agrees with the 6 sequences listed.
So the key idea here is the multiplication of possibilities. This occurs so often we have a "descending product" notation, the factorial function n! = n×(n−1)×(n−2)×⋯×2×1. To chop this after the first k numbers we divide by (nk)!. Thus we have the general result that in choosing sequences of k balls from n distinct balls without replacement, the total number of distinct sequences (or "permutations") is
${\displaystyle P(n,k)=n\times (n-1)\times \cdots \times (n-k+1)=n!/(n-k)!\,\!}$
For the 3-of-10 example, we have P(10,3) = 720 distinct sequences. Your second example essentially asks the question, "What are the chances we will pick the right one at random?". The answer for 3-of-10 is clearly 1 out of 720, or approximately 0.00139, so you'd better be very lucky!
Your first example actually requires a more sophisticated computation. In the tiny case 2-of-3, the equivalent question is, "How many ways can we eliminate 1 ball?", which is, of course, 3. But that won't work for 3-of-10, or for your example. (In the 3-of-10 case the answer is that we can draw 120 different collections, ignoring order.) Anyway, you say you can handle this (though I wonder), so I'll stop here. --KSmrqT 12:54, 19 September 2006 (UTC)
No, you're quite right. I knew 2 should be less likely than 1 but I wasn't sure how. I thought I was underestimating the unlikelyhood of 2 when I was actually overestimating the unlikelyhood of 1. Thanks for your explanation. VermillionBird 22:34, 19 September 2006 (UTC)

Can the sine function be represented as an exponential function?

That is, ${\displaystyle \sin x=N^{x}\,}$

Solve for ${\displaystyle N}$.

--ĶĩřβȳŤįɱéØ 08:15, 19 September 2006 (UTC)

No, but you do have
${\displaystyle \sin {x}={\frac {e^{ix}-e^{-ix}}{2i}}}$.
-- Meni Rosenfeld (talk) 08:26, 19 September 2006 (UTC)
${\displaystyle {\frac {1}{2i}}\det({\begin{pmatrix}e^{i}&0\\0&1+e^{-2i}\end{pmatrix}}^{x})}$
can be constructed. -- Fuzzyeric 18:37, 19 September 2006 (UTC)
The "unhelpful" answer evaluates to ${\displaystyle (e^{i}+e^{-i})^{x}/2i=sin(1)^{x}}$, which is the wrong thing. Here is a "better" unhelpful answer:
${\displaystyle {\frac {1}{2i}}\operatorname {tr} \left[{\begin{pmatrix}e^{i}&0\\0&e^{-i}\end{pmatrix}}^{x}\right]}$
Joke 20:10, 19 September 2006 (UTC)
Nope, this evaluates to ${\displaystyle (e^{ix}+e^{-ix})/2i=\cos {x}/i}$. It's going to be difficult to get the minus sign required for sin. -- Meni Rosenfeld (talk) 20:24, 19 September 2006 (UTC)
I can't believe I wrote that exponentiation distributes over addition. <hangs head in shame>. Teach me to make last-minute "simplifications" before hittin the Save button. However, if it somehow did, this would work...
${\displaystyle {\frac {1}{2i}}\det({\begin{pmatrix}e^{i}&0\\0&1+e^{-2i}\end{pmatrix}}^{x})}$
${\displaystyle ={\frac {1}{2i}}\det({\begin{pmatrix}e^{ix}&0\\0&1+e^{-2ix}\end{pmatrix}})}$ (powers of a diagonal matrix)
${\displaystyle ={\frac {1}{2i}}(e^{ix})(1+e^{-2ix})-(0)(0)}$
${\displaystyle ={\frac {1}{2i}}e^{ix}+e^{-ix}}$
but it doesn't, so... (scribbles offline, and will be back in a bit) -- Fuzzyeric 23:37, 19 September 2006 (UTC)

Oh, that was dumb. At least I got ${\displaystyle -i\cos(x)}$. Not that this exercise has much point. –Joke 20:29, 19 September 2006 (UTC)

Finally remembered... Wrong trick. Should have used ${\displaystyle \sin x={\mbox{Im}}(e^{ix})\approx {\mbox{Im}}(\ (0.540+0.841i)^{x}\ )}$ -- Fuzzyeric 02:24, 20 September 2006 (UTC)

Ummm... anyway... there should be an infinite number of positive solutions for N, I think. Any N for which N^x slopes down, since N^0 starts at 1, and sin(x) continutally returns to 1, there's going to be a solution for x as long as it moves downward. And... x will be negative if N > 1. So, how about N >= 0 for a solution? As for x, there's going to be tons of solutions for it, but eventually they will all approach positions where sin(x)=0. - Rainwarrior 04:51, 20 September 2006 (UTC)

Or are you actually hoping to create a sine with an exponential function? At best it would be a terrible approximation. With several terms having coefficients it could be improved... but, ah, I suppose even a broken clock is correct twice a day, similar with this one. - Rainwarrior 05:00, 20 September 2006 (UTC)

Finding coefficients of a series

Suppose I have a function which can be presented in the form:

${\displaystyle f(x)=\sum _{k=0}^{\infty }a_{k}x^{-k}}$

The coefficients ak are unknown and need to be found. I can (using a different method) evaluate the function numerically for any integer x, but this is quite expensive computationally, especially if x is large. I also know that the ak are rational numbers with reasonable denominators, so if I find that a coefficient is numerically close to a rational number, I can assume it is equal to it. Any ideas about an efficient method to find the coefficients? Thanks. -- Meni Rosenfeld (talk) 13:08, 19 September 2006 (UTC)

You can take
${\displaystyle g(x)=f(1/x)=\sum _{k=0}^{\infty }a_{k}x^{k}}$,
and basically you need to find the coefficients of the Taylor series with the restriction of not evaluating function at small x. Maybe numerical differentiation will be helpful. Conscious 14:26, 19 September 2006 (UTC)
The formula you give is actually a Laurent series, which is similar to a Taylor series, but allows for negative powers as well. I have no idea if it makes any sense in your situation, but if you could extent your function to a holomorphic one, defined on the complex plane (or a subset thereof), perhaps you can use the standard procedure for finding the coefficients of Laurent series. —Bromskloss 15:14, 19 September 2006 (UTC)

Thank you for your response. Unfortunately, in practice this method seems to reduce to picking values x0, ..., xn, and solving the equations (for every 0 ≤ in):

${\displaystyle \sum _{k=0}^{n}a_{k}x_{i}^{-k}=f(x_{i})}$

and the numerical differentiation article does little to suggest which choice for the xi will be optimal. Also, it seems to me that there should be a better method. Any additional ideas will be appreciated. -- Meni Rosenfeld (talk) 15:02, 19 September 2006 (UTC)

A regular grid is usually used for differentiation, so this would suggest ${\displaystyle x_{i}={\frac {N}{i+1}}}$ (N should be divisible by 1..n+1 if you can calculate f for integer arguments only). I'm not sure if it's optimal and if the method is efficient (though you get n coefficients from evaluating f in n points). As Fuzzyeric points out below, you can just use Lagrange polynomial (or any other form of interpolation polynomial) instead of solving the system. Conscious 15:45, 19 September 2006 (UTC)
Interpolation methods require that the function be smooth enough. If you know or assume this is the case, then...
If you can do some additional things to f(), then maybe:
• If you can (repeatedly) integrate and compute the residue of f() at zero, then you can use what amounts to the Taylor series expansion for this Laurent series. (Evaluate at zero, get a0. Integrate. Evaluate at zero, multiply by a constant, get a1. Iterate.)
• If you can analytically take the inverse Fourier transform of f() on a circle on the complex plane near enough to zero that no poles are contained in the circle, then you can do the above more or less all at once by application of Cauchy's integral formula.
• If you know that f() has only a semi-infinite or finite domain of definition, then fitting by polynomials, orthogonal on the domain of definition, is a good idea.
If you are unable to enable to augment the implementation of f(), then...
• You can replace x -> 1/x and use your polynomial interpolation method of choice. E.g. Lagrange polynomial.
• You can numerically sample f() on a circle around zero and, as above, apply Cauchy's theorem.
You may be able to bound the growth of the ak if you find that f() doesn't converge inside a circle centered at the origin. (Equivalent to f(1/x) converges inside such a circle.)
You may be able to constrain the denominators of the ak if you find that f() is simple p-adicly for some p. -- Fuzzyeric 15:16, 19 September 2006 (UTC)

Thanks. Regrettably, most of these suggestions won't work for me, as my ability to manipulate f is very limited - I can only evaluate it numerically for positive integers. Perhaps I was too optimistic about how efficient this process can be. I guess the most useful one will be the Lagrange polynomial, which can streamline the solution of my equations above. Any suggestions on how to pick the interpolation points optimally? -- Meni Rosenfeld (talk) 15:42, 19 September 2006 (UTC)

It sounds like you're limited to evaluating f(n). So to convert the Laurent series to a Taylor series, you'll be working with g(1/n) = f(n). You can only evaluate g() at a sequence of points between 0 and 1 (and accumulating at 0). If you actually had freedom to choose points, I'd recommend the Chebyshev roots, but you don't have this freedom. Since you don't really get to move your points around, I have two recommendations:
Rigorous: Polynomial interpolation refers to the Lagrange form of the interpolation polynomial (which I learned as a ratio of determinants). This is a direct method for acquiring the coefficients.
Less than rigorous: I'd recommend selecting evaluation points to be linearly independent in their prime power vectors. (By the Fundamental Theorem of Arithmetic, there's a bijection between integers and the sequence of exponents of the primes in their prime decomposition. For instance 15 = 2^0 3^1 5^1 <-> {0,1,1, 0,0,0,...}). Make sure these vectors are linearly independent. A way to do this is to select integers that are powers of an irrational. Historically, powers of the golden mean are popular. (Probably due to convenient convergence properties for subdividing root-finding intervals.) -- Fuzzyeric 16:36, 19 September 2006 (UTC)

Okay, I guess this is enough information to get me going. Thank you all for your help. -- Meni Rosenfeld (talk) 17:37, 19 September 2006 (UTC)

Just to chime in...I wonder if you could make use of the z-transform of the function somehow? --HappyCamper 03:05, 20 September 2006 (UTC)

This won't be practical for me, no. -- Meni Rosenfeld (talk) 05:33, 20 September 2006 (UTC)

TI 83 Substitute

I am in need of a website that has functions similar to a TI 83 Calculator...or better, a TI 83 Emulator-type site? Pretty much, I will not have my TI 83 until it gets shipped out here, and I need a better calculator than the damn windows one. Thanks ChowderInopa 22:22, 19 September 2006 (UTC)

Not sure about TI-83 functionality, but G-calc is a nice graphing calculator: [1]. StuRat 22:54, 19 September 2006 (UTC)
There is a flash applet you can download of the TI-83 or 82 out there somewhere. — [Mac Davis](talk) (SUPERDESK|Help me improve)00:08, 20 September 2006 (UTC)
There's an emulator called VTI you can download here. You will need a ROM image for a TI83+/84+, but you can download them. I think they're of dubious legality unless you own one, so I won't link to it. 206.124.138.153 00:19, 20 September 2006 (UTC)

Statistics Q

I have the following table:

                       Unemployed (Y=0)  Employed (Y=1)   Tot
Non Graduate (x = 0)      0.045            0.709          0.754
Graduate (x = 1)          0.005            0.241          0.246
Tot                       0.050            0.950          1


And I am asked for the E(Y), which is the expected value of Y. Is it merely 0.95?

Yes, for the total population. In general, for a discrete random variable X, the expected value E(X) = Σ (v × Pr(X=v)), summing over all possible values v for X. So for a r.v. Y that has a range {0,1}, we find an expected value E(Y) = (0 × Pr(Y=0)) + (1 × Pr(Y=1)) = Pr(Y=1).  --LambiamTalk 11:19, 20 September 2006 (UTC)

Could use some tips for modelling a hot air balloon

I'm trying to make a simple hot air balloon simulator of sorts (Actually, a Morrowind mod). I'd like to make it reasonably realistic. So, I need to find how hot the inside of the balloon is as a function of how long it has been being heated by a flame. I've already found an equation online for calculating the bouyant force, but that requires the temperature. If any physicsy types could help out I'd really appreciate it; even a rough approximation would be great. BungaDunga 00:31, 20 September 2006 (UTC)

Well, as a rough model I suppose you could work with the following ideas: depending on how strong you've got the fire going, say heat is going into the balloon at a constant rate ${\displaystyle k_{1}}$. Heat is also going out of the balloon at a rate proportional to the temperature difference between the air in the balloon and the surrounding air (Newton's law of cooling), and the temperature is roughly a linear function of the temperature, so essentially we have ${\displaystyle {\frac {dT}{dt}}=k_{1}+k_{2}(T-T_{surr})}$. You may have to vary ${\displaystyle T_{surr}}$ depending on the height, though. Confusing Manifestation 00:58, 20 September 2006 (UTC)
Here k2 would be negative. --LambiamTalk 04:22, 20 September 2006 (UTC)
Suppose you meant 'and the temperature is roughly a linear function of the heat', ConMan. --CiaPan 05:45, 20 September 2006 (UTC)
Is that function the derivative of f(T) or something similar? I've yet to learn calculus in school, so I don't have much knowledge of it (though I shall next year). I'm not sure how I would apply that equation. Something like this, maybe?
temp = 5
k1 = 5
k2 = .1
tSurr = 3
xV = 0
onEnterFrame = function(){
temp += k1-k2*( temp - tSurr )
trace(temp)
}