Wikipedia:Reference desk/Archives/Mathematics/2010 April 12

From Wikipedia, the free encyclopedia
Mathematics desk
< April 11 << Mar | April | May >> April 13 >
Welcome to the Wikipedia Mathematics Reference Desk Archives
The page you are currently viewing is a transcluded archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


April 12[edit]

Simple way to weight average ratings by total ratings?[edit]

I've got a website where users can rate items from 1 to 5. The site won't get a lot of traffic, and I expect the "total ratings" range for the items to be as high as 100 and as low as 4 or 5. Each item can be rated on an integer scale of 1 to 5. I'd like to be able to show a top 10 list of best-rated items, but because of the wide variance in total ratings I feel this needs to be weighted in some fashion to favor items with more votes. Unfortunately, I've never taken a stats class and really don't know what to do. Could some of you kind folks please suggest some relatively simple approaches to solving this problem? Thank you! (Calculations will be done in PHP/MySQL if that's relevant, though I doubt it'd matter) 59.46.38.107 (talk) 00:52, 12 April 2010 (UTC)[reply]

You should just use a simple cut-off. For example, list the top rated products of those with 10 or more ratings. StuRat (talk) 00:54, 12 April 2010 (UTC)[reply]
I agree, a simple cut-off is best. The number of ratings doesn't say anything about the quality of the product, just the confidence you can have in the average weighting, so you don't want to take the number of ratings into account for a top 10 list. The products with only a handful of ratings are just as likely (given no additional information) to be underrated as overrated. If you want to do something more complicated, you could do a variable cut-off and choose the cut-off for each product based on the variance of the ratings for that product. If you have had 5 ratings and they have all rated a product 3 then you can probably have more confidence in that rating than the rating of a product with 20 ratings of 1 and 20 ratings of 5. I'm not sure what function of variance would be best. To be honest, I think a simple cut-off is better, a variable one would be being complicated for the sake of it rather than to get significantly better results. --Tango (talk) 01:17, 12 April 2010 (UTC)[reply]
To me, a small number of identical ratings would indicate the likely use of sock-puppets, by one individual. StuRat (talk) 02:10, 12 April 2010 (UTC)[reply]
A product with only a handful of ratings, given what those ratings are, is not as likely to be underrated as overrated. A product with a few high ratings is more likely to be overrated than underrated. This is basically regression towards the mean. In other words, if your prior is that the product is average, the more evidence you have that the product is good, the higher your assessment of its expected quality. Thus a product with many good ratings should be placed higher than a product with a few excellent ratings (depending on the exact numbers, of course).
Personally I would use a parametric model with Bayesian updating for each product, but I guess that falls outside what the OP can easily work with. -- Meni Rosenfeld (talk) 08:34, 12 April 2010 (UTC)[reply]

Thank you so far. If I go with a simple cutoff should that limiter be a function of the total userbase so that as the userbase grows more votes are required? Or should it be a fixed value? 59.46.38.107 (talk) 01:27, 12 April 2010 (UTC)[reply]

I think you might want to increase the cutoff number once you have more reviews. You don't necessarily need to code in a formula for this, though, just have a fixed number which you edit and change from time to time. Using a formula is more likely to introduce problems. If you're determined to use a formula, though, you could only compare those products with more than the median number of reviews. StuRat (talk) 02:08, 12 April 2010 (UTC)[reply]
As a rule of thumb, if you have about 30 values (from a normal distribution), average and standard deviation become somewhat reliable measures. So you can probably stop increasing the cut-off once it reaches 30. This will also give great new products a chance to show up eventually. --Stephan Schulz (talk) 08:40, 12 April 2010 (UTC)[reply]
You might want to look into the IMDB system since they have some of the same issues. As I recall, they use the straight average when the number of votes is high enough and otherwise they use a weighted average. The exact formula is on their website somewhere.--RDBury (talk) 10:12, 12 April 2010 (UTC)[reply]
There's a precise answer to your original question here: [1] I don't know enough statistics myself to check it, though. Paul (Stansifer) 21:38, 14 April 2010 (UTC)[reply]
That is an answer to a different question, where ratings are positive\negative rather than 1-5. Also, it gives one particular frequentist way to think about the problem, which is neither the only nor necessarily the best way. I think using the expectation of the posterior distribution will give better results. -- Meni Rosenfeld (talk) 09:38, 15 April 2010 (UTC)[reply]
The code example is for a two-point rating scale, but I believe the technique can work for 1-5. Paul (Stansifer) 14:13, 15 April 2010 (UTC)[reply]
I don't see any obvious way to extend it, and by the time we get to non-obvious ways we may as well start from scratch. -- Meni Rosenfeld (talk) 18:35, 15 April 2010 (UTC)[reply]

Equations on cartesian planes with interesting properties[edit]

Hi. The following functions can be graphed using a graphing calculator (or is there some kind of online/HTML program that can do that?), but have interesting properties, such as having Y values with no X values, looking like a seismograph, etc. My question is what are some equations with similar properties, and what causes them? This is not homework.

Thanks. ~AH1(TCU) 01:14, 12 April 2010 (UTC)[reply]

Y-values without x-values is very common. (The graph of the constant function y=0 has no x-values when y is not zero.) X-values without Y-values means that the function is not defined for some x-values. For example is y=1/x not defined when x=0. This function has a Pole for x=0. The functions in your examples have poles in the x-values where y is not defined. Bo Jacoby (talk) 13:08, 12 April 2010 (UTC).[reply]
For plotting online, try FooPlot and Wolfram Alpha. (I prefer the former for most straightforward plotting, although Alpha will do, e.g., implicit functions.) 94.168.184.16 (talk) 23:28, 12 April 2010 (UTC)[reply]

Set theory[edit]

if S = {i | 0<i<=50}, |S| = 10, A and B are subsets of S, |A|=|B|=5, sum of all integers in A is equal to that of all integers in B, then prove that there exist atleast one such pair A and B.Rajeev 26987 (talk) 07:22, 12 April 2010 (UTC)[reply]

If then . Did you mean ? -- Meni Rosenfeld (talk) 08:09, 12 April 2010 (UTC)[reply]
I guess A and B have to be disjoint? If not, let A be any subset of S, with B=A. Staecker (talk) 10:56, 12 April 2010 (UTC)[reply]
Think of the elements of A and B as pairs. For each pair Aj and Bj for j=1 to 10, if j is even, ensure that Aj=Bj+1. If j is odd, ensure that Aj=Bj-1. So, there are 5 cases where Bj is one more than Aj and 5 where it one less. They cancel each other out, making A and B sum up to the same value. -- kainaw 12:29, 12 April 2010 (UTC)[reply]

Subconical shapes?[edit]

What exactly is a subconical shape? I've read many descriptions of individual Native American mounds that speak of the mound as being subconical; however, as you can see from the picture of the subconical Dunns Pond Mound, this apparently can mean "vaguely round and slightly raised in the middle", and definitely not very much like a Cone (geometry). Nyttend (talk) 11:33, 12 April 2010 (UTC)[reply]

Let me rephrase the question — is there some official definition of "subconical" or "subcone"? Thanks. Nyttend (talk) 11:36, 12 April 2010 (UTC)[reply]
Well, Wiktionary has a definition, but it just says "somewhat cone shaped", so not very helpful; Webster's says "slightly conical", which is just as useless. Googling random examples, "subconical" seems to often used to describe the shape of a flattened or truncated cone - possibly what we maths types would call a conical frustum. Gandalf61 (talk) 13:27, 12 April 2010 (UTC)[reply]
I was thinking of the shape a powder takes when dumped at a central point onto a plane, like the sand in the bottom of an hour-glass. This yields a slightly bowed-out cone. Salt storage facilities shape their containers to match this form. Here's an example: [2]. StuRat (talk) 01:55, 13 April 2010 (UTC)[reply]

Lowering Summed Indices for Tensors[edit]

Let's say we have a tensor . The tensor is defined as where and are tensors with the inverse metric, and I want to lower indices of the inverse metric. How do I achieve that, since the inverse metric is summed over.The Successor of Physics 13:29, 12 April 2010 (UTC)[reply]

You can move those indices down and move the other indices they are contracted with up. Count Iblis (talk) 17:01, 12 April 2010 (UTC)[reply]
. Bo Jacoby (talk) 20:56, 12 April 2010 (UTC).[reply]

Probability[edit]

I have b buckets that each can hold n items, and I have i kinds of items. I fill every bucket with unique (different kinds) of items. Items need not be unique across buckets. What's the probability that there are at least l of each kind of item in the buckets? I want to get an idea of the plausibility of untrusted distributed backups but it appears I fail at mathematics. --194.197.235.240 (talk) 19:17, 12 April 2010 (UTC)[reply]

You mean you're going to scatter these items around the buckets randomly? If yes, how do you know a given kind of item doesn't occur in a bucket more than once? Really aren't you better off populating the buckets deterministically so you can control where and how often each item is backed up? Maybe you also want to read about erasure codes rather than relying on pure replication. 66.127.52.47 (talk) 23:42, 12 April 2010 (UTC)[reply]
A "bucket" is one computer, the computer downloads random unique blocks of the backup. I can't populate deterministically because I assume most of the world hates my backups and a large fraction of the participating computers are cheating. This scheme likely isn't any good, but it hurts my feelings I can't solve it. --194.197.235.240 (talk) 14:51, 13 April 2010 (UTC)[reply]
That doesn't sound all that easy to do exactly, but I'll see if I can figure it out if I get some time later. The case where the # machines is large might be easier: let's see. The probability that a given bucket has a given block is n/i. So the probability p that at least l buckets have that block is given by the cdf of the binomial distribution on b buckets. The events for the different blocks being replicated l times are not independent since replicating one such block decreases the amount of space left for the others. (There may be some fancy distribution related to urn problems that exactly accounts for this--I'm not very knowledgeable about such things). But if n*b is much larger than i, maybe you can approximate as p**i. The other thing you can easily do is run some random simulations instead of trying to calculate the probability precisely. 66.127.52.47 (talk) 21:54, 13 April 2010 (UTC)[reply]

Taylor-like Series[edit]

I was thinking about smooth functions , and I defined an operator ; given by

This operator has a nice property. It can be shown that

Working out some explicit examples, one the sine, cosine and exponential functions gives

I used Matlab to get these last identities. It doesn't seem obvious to me, for example:

I have a couple of questions:

  1. How do we prove the expression for, say, the sine function?
  2. What is the domain and range of this operator? (I think that if has a convergent power series in a n'hood of then converges in a n'hood of ).
  3. Is this operator already known and studied? If so, does it have any uses?

•• Fly by Night (talk) 19:39, 12 April 2010 (UTC)[reply]

First question: if you plug in f(x)=e^(ax) to your formula, you see that . Now note that your operator is linear in f, and sin x is a linear combination of exponentials. 129.67.37.143 (talk) 21:48, 12 April 2010 (UTC)[reply]
OK, so it's quite straightforward for the exponential case:
but since I'm only considering real functions and you need complex function to make sine from exponentials: , I don't see this can prove it. •• Fly by Night (talk) 22:11, 12 April 2010 (UTC)[reply]
For analytic functions you can sum the Taylor expansion and show that
Count Iblis (talk) 23:18, 12 April 2010 (UTC)[reply]
just to give a bit more of a hint, show that when f is x^n, then extend to power series by linearity 129.67.37.143 (talk) 08:29, 13 April 2010 (UTC)[reply]
While you may be working with real functions, it is often possible (and in many cases easier) to work in the realm of complex functions, and as long as your functions are analytic in a region including the real axis, your proofs are valid for the real function as well. The identities between exponential and trigonometric functions is a good example. (Analogously, you might have a function acting on the integers, but to prove some property it's easier to extend the function into the reals.) Confusing Manifestation(Say hi!) 00:58, 13 April 2010 (UTC)[reply]

Thanks, now what about questions 2 and 3? •• Fly by Night (talk) 19:02, 13 April 2010 (UTC)[reply]

Adomian decomposition method[edit]

Hello Math Reference Desk. I have recently come across the Adomian decomposition method in a paper I am reading. I have never heard of this technique, and our article is very vague and stubby. Can anyone enlighten me on the purpose, utility, and general procedure for Adomian decomposition? I understand it is a "semi-analytic" method to solve PDEs, but I'm not clear whether that means it's "analytic" or "numerical." Probably the most important part of my question: is this method common/widespread, or is it an esoteric technique? Thanks, Nimur (talk) 23:22, 12 April 2010 (UTC)[reply]

The Wikipedia article doesn't say much, but that other site has some links about it. 66.127.52.47 (talk) 23:50, 12 April 2010 (UTC)[reply]