Talk:Standard deviation
| This is the talk page for discussing improvements to the Standard deviation article. | |||
|---|---|---|---|
|
|
||
| Archives: 1 | |||
| WikiProject Mathematics (Rated C-Class) | ||||||
|---|---|---|---|---|---|---|
| This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks. | ||||||
| Mathematics rating: | C Class | Top Priority | Field: Probability and statistics | |||
| One of the 500 most frequently viewed mathematics articles. | ||||||
|
||||||
| WikiProject Statistics | (Rated C-class, High-importance) | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|||||||||||||||||
[edit] RMS
This is misleading since the average deviation from the mean must always be zero. I would expect even a beginner in statistics or arithmetic to be aware of that ! -
"It may be thought of as the average difference of the scores from the mean of distribution, how far they are away from the mean."
The correct comparison is "...RMS difference of the scores from the mean of distribution". The RMS (root mean square) function is familiar to many people from its use in electrical engineering, so is an appropriate, accurate and well known concept.
Andrew Smith —Preceding unsigned comment added by 82.32.50.77 (talk) 08:39, 18 October 2009 (UTC)
[edit] Error
How come both 73 and 76 inches are 190 centimeters?
73 inches (190 cm) inches), while almost all men (about 95%) have a height within 6 inches of the mean (64 inches (160 cm) – 76 inches (190 cm) —Preceding unsigned comment added by 60.240.91.73 (talk) 13:52, 20 June 2009 (UTC)
- That was done be the "convert" template. I've gotten rid of that template and corrected the arithmetic. Michael Hardy (talk) 16:57, 20 June 2009 (UTC)
[edit] Estimating population standard deviation with interquartile range
The short subsection Standard deviation#With interquartile range was added by an IP editor on 25 May 2009. I'm not sure how useful it is, or whether its final claim is correct. The factor of 1.35 surely assumes a normal distribution. The stated (but unsourced) asymptotic relative efficiency of 0.37 appears to be assuming a normal distribution too. I can't spot any justification in the reference given (doi:10.1016/j.jspi.2005.08.028; requires subscription) for the last claim that IQR/1.35 can be more efficient than the sample SD as an estimator of the population SD when the data has thick tails. Maybe an estimator based on some multiple of the IQR could be more efficient than the sample SD for a given known distribution with thick tails, but the factor of 1.35 must surely depend on the particular distribution. And if you know the distribution it would be more efficient to fit its parameters by maximum likelihood and use them to calculate the population SD. Or am I missing something? If not, i'm tempted to remove this section completely as it seems of no practical value. Qwfp (talk) 13:18, 21 June 2009 (UTC)
- I remember using this once when I was working by hand, with no computer at hand and a calculator capable only of the simplest arithmetic, and I needed a quick result. (I don't remember if 1.35 is the right number, but that was easy enough to work out by hand.) So yes: maximum likelihood is more accurate, but not always easy to do fast. Michael Hardy (talk) 05:15, 3 August 2009 (UTC)
The actual number should be 1.349, from a table of the Standard Normal, no? -- Avi (talk) 06:16, 3 August 2009 (UTC) (1.34897997243323 from Excel solver). -- Avi (talk) 06:21, 3 August 2009 (UTC)
- The more I look at this the more I'm convinced Qwfp's solution is the right one. The 1.35/1.349 estimate for the normal distribution (AKA qnorm(0.75)-qnorm(0.25)) is, as the writer has admitted, less efficient than the usual sample standard deviation estimate, and Qwfp's argument for why the IQR is rarely best seems 100% right. If anyone wants to improve this, then please do, but if nothing has changed in a couple weeks I'll delete the section, and I hope no one is offended. --Eb Oesch (talk) 21:10, 11 January 2010 (UTC)
[edit] Unbiased SD estimate
The following comment was added to the article by 149.169.176.38 (talk · contribs). --MarkSweep✍ 13:18, 14 September 2005 (UTC)
Comment: I think the formula for unbiased estimate of the standard deviation is:
the (n-1) is unbiased for variance, I don't know how to derive this, but a statistician should be able to do so.
This is an OK web site.
http://www.statistics-help-online.com/node55.html
Basically small samples tend to be distributed closer to the mean. To get a better estimate of the population std dev the sample std dev is scaled down.
- Where does that 1.5 in the denominator come from? I found http://www.math.niu.edu/~rusin/known-math/01_incoming/update_stat using it without an explanation. Is it only an educated guess? (I hope it isn't a statistician's joke on unsuspecting newbies) I came here in search for an answer why a paper even used 2 in the denominator! Stevemiller (talk) 02:50, 26 February 2008 (UTC)
-
- If an unbiased estimate of SD is required, see Unbiased estimation of standard deviation but note that it only applies for the normal distribution. The formulae disagree with the "1.5" correction. Melcombe (talk) 15:22, 14 October 2008 (UTC)
-
-
- The denominator is certainly 1... no idea where the 1.5 comes from but as mentioned before me... the standard form is for a normal distribution(which is accurate with large n, see central limit theorem), and the denominator should be n-1... —Preceding unsigned comment added by 67.159.74.87 (talk) 19:55, 3 March 2009 (UTC)
-
- The comment above is confused. Certainly n − 1 gives an unbiased estimator of the variance. Not of the standard deviation. The square root of the variance is the SD, but the square root of an unbiased estimator of the variance is NOT an unbiased estimator of the SD. Michael Hardy (talk) 20:40, 3 March 2009 (UTC)
-
- The denominator is certainly 1... no idea where the 1.5 comes from but as mentioned before me... the standard form is for a normal distribution(which is accurate with large n, see central limit theorem), and the denominator should be n-1... —Preceding unsigned comment added by 67.159.74.87 (talk) 19:55, 3 March 2009 (UTC)
-
-
-
-
-
-
- Well, if you consider an estimator of the form
- with unknown κ, then by the reasoning from χ distribution, expected value of this estimator is
- Thus the estimator is unbiased when
- an expression which quickly converges to a constant κ = 1.5. For example already at n=5 the κ is ≈ 1.47. In general the “optimal” value of κ is equal to
. ... stpasha » talk » 14:18, 2 August 2009 (UTC)
- Well, if you consider an estimator of the form
-
-
-
-
-
-
-
-
-
-
-
- If there is a good citation for this result, it would be good to include it in the Unbiased estimation of standard deviation article. Melcombe (talk) 09:21, 22 October 2009 (UTC)
-
-
-
-
-
-
[edit] Error in article
I think that the equation in this line is incorrect. The equation for a mean standard deviation from the above is s=Sum(x − m)2 / n − 1 so s=mean standard deviation x is the number in the set m= mean of the number set n= the number of numbers in the set. 128.84.217.56 (talk) 03:02, 19 October 2009 (UTC)
[edit] Usage of the word sample
in some points the word sample instead of observations. e.g.: "where N is the number of samples used to sample the mean" from the section "Relationship between standard deviation and mean". I would suggest changing this to: where N ist the samplesize. or: where N is the size of the sample used to estimate the mean. or: the number of observations in the sample used to ... this can be very confusing since the idea is to estimate the mean over several samples from just one observed sample. Ppardal (talk) 15:20, 21 October 2009 (UTC)
[edit] Limitations vs Variance article
In limitations section there is a claim, that it is impossible to compute standard deviation of whole population, given the standard deviation of this population subgroups.
In Variance article though, in section 3, Properties, there is a property of variance:
Suppose that the observations can be partitioned into equal-sized subgroups according to some second variable. Then the variance of the total group is equal to the mean of the variances of the subgroups plus the variance of the means of the subgroups
And later:
In a more general case, if the subgroups have unequal sizes, then they must be weighted proportionally to their size in the computations of the means and variances. The formula is also valid with more than two groups, and even if the grouping variable is continuous.
And if standard deviation equals sqrt(variance), then it is possible to compute standard deviation based on subgroups. Unless Variance article has some errors there.
212.160.172.70 (talk) 15:42, 19 January 2010 (UTC)
[edit] difference between 'mean difference from the mean' and standard deviation
can this be covered by the article? —Preceding unsigned comment added by 212.54.222.127 (talk) 14:55, 3 May 2010 (UTC)
- The mean difference from the mean is zero. Perhaps you're thinking of mean absolute deviation? That is mentioned at the end of the 'Worked example' section. Qwfp (talk) 15:32, 3 May 2010 (UTC)
[edit] Tampering
I have removed the claim 'It helps detect tampering of data.' If it's true it needs justification. If anyone reinstates it can they please write 'tampering with' not 'tampering of'. 81.131.57.101 (talk) 10:07, 24 May 2010 (UTC)
[edit] Height as an example
The observed distribution of heights seems like a good example to illustrate standard deviation. However it suffers because of the apparent need to use two sets of units. I think it would work better without this constant need to skip over parenthetical additions. Obviously, the next point of discussion is which set of units to employ. Presumably this has been and is being discussed elsewhere within Wikipedia with no obvious resolution. My opinion is that since most of the world uses SI and metric prefixes and the few countries stubbornly clinging to other systems officially use SI and metric prefixes in their scientific communities it should be centimetres. Obviously there is going to be strong disagreement from a part of the community on this issue. Without looking at the history of this page I would guess that there have been battles over the units in the past. Still, I think the article would be better with an example using one unit or no units. —Preceding unsigned comment added by 142.104.154.108 (talk) 21:32, 18 August 2010 (UTC)
[edit] Chebyshev's inequality
The sentence in this section on the Standard deviation page makes no grammatical sense. It appears a line was edited out or something of that nature. Given that I don't know much about this topic (hence, my reviewing the Wiki), I'm requesting the correction here.
From the article:
- "Chebyshev's inequality ensures, for all distributions for which the standard deviation is defined, the within a number of standard deviations is at least that as follows." —Preceding unsigned comment added by 65.241.18.25 (talk) 13:57, 28 August 2010 (UTC)
[edit] Terminology
The article uses the expression "standard deviation of the sample" versus "sample standard deviation" and implies that this is standard terminology. I am a professional statistician and have never seen this. Can a citation be found for this usage? From my point of view I belm²ieve it is not common, but, of course, I cannot prove a negative. I just am worried that the term is not as unambiguous as the author may think that it is. Doctorambient (talk) 20:43, 11 September 2010 (UTC) File:Example.jpg--82.178.151.228 (talk) 09:29, 7 December 2010 (UTC)ضفقذشä
[edit] standard error (deviation) of the standard deviation
Can someone include a section on estimating the standard error of the standard deviation estimate? I found an approximate formula for large N (>~16): STDDEV(sigma) = sigma / sqrt(2N) at http://davidmlane.com/hyperstat/A19196.html, but no explanation or derivation. Someone with more knowledge than I should put this section in, and with more detail. A citation of an appropriate textbook is needed.Milliemchi (talk) 01:31, 30 December 2010 (UTC)
[edit] Means of Two Sample Populations Shown in Graphic
Sorry; I'm confused re the mean of the red population shown. Surely it is less than 100? Stickeebeek (talk) 02:21, 3 February 2011 (UTC)
[edit] very confusing
"standard deviation of the sample" vs. "sample standard deviation" to refer to different quantities are painfully confusing. Are those names correct? Can we change the names to something less confusing or emphasize strongly on the subtle difference? What is being refered to here is the Standard Error of the mean(SEM)which is the sample SD divided by the square-root of the sample size. It is the standard deviation of the distribution of sample means. —Preceding unsigned comment added by 152.3.182.116 (talk) 18:46, 11 February 2011 (UTC)
[edit] Weighted calculation
n' is defined wrongly. It should be the square of the sum of the weights, divided by the sum of the squares of the weights; i.e., V1^2/V2 from http://en.wikipedia.org/wiki/Weighted_variance#Weighted_sample_variance —Preceding unsigned comment added by 174.1.36.106 (talk) 23:30, 1 March 2011 (UTC)
[edit] Rapid calculation
the correct caluclation is here, and i don't think it's equivalent:
SQRT((s0*s2-s1^2)/(s0*(s0-1)))
http://syberad.com/calculator/WebHelp/charts/statistics/definitions/standard_deviation.htm
Anyone who can double-check this and fix the formula, that would be cool.
Simul (talk) 20:33, 18 March 2011 (UTC)
[edit] Combining Standard Deviation - a reference for citation
Found a source to cite for the discrete sampling portion of Combining Standard Deviations: http://www.burtonsys.com/climate/composite_standard_deviations.html I'm rusty on Wiki editing and don't know the right style for citing these things, so I'll leave it to someone else. Feel free to remove this comment when that is done.
Risce (talk) 14:47, 25 July 2011 (UTC)
[edit] With Sample Standard Deviation - please rephrase for clarity
"The reason for this correction is that s2 is an unbiased estimator for the variance σ2 of the underlying population, if that variance exists and the sample values are drawn independently with replacement. However, s is not an unbiased estimator for the standard deviation σ; it tends to overestimate the population standard deviation."
Can someone please clarify these two sentences - they read paradoxically. If s^2 is an unbiased estimate of σ^2, then how can s be a biased estimator of σ??
Jwsadler (talk) 15:50, 4 August 2011 (UTC)
[edit] Yet Another Error
For the basic example given in the first section, shouldn't the average of the values be divided by 7 instead of 8? I have seen many times that the number of samples is reduced by one to get the variance. Shouldn't this be the case?
WeirdnSmart0309 (talk) 00:40, 14 October 2011 (UTC)
[edit] Adding link to example in "Maximum likelihood estimator" wikipedia-entry
Under the section "Continuous distribution, continuous parameter space" in the "Maximum likelihood" wikipedia entry, the calculation of the expectation of the maximum likelihood estimate of the standard deviation (for a normal distribution) is examplified. This illustrates nicely why one calls the "With sample standard deviation" a biased estimator. So maybe it would be a point to add in this "With sample standard deviation" hence clarifying why it is exactly N-1 that is chosen as the adjustment. At least, that's how I understood it. — Preceding unsigned comment added by 158.64.77.254 (talk) 15:47, 20 December 2011 (UTC)
[edit] Cauchy distribution
The remark on the Cauchy distribution, "the standard deviation of a random variable that follows a Cauchy distribution is undefined because its expected value μ is undefined" is wrong. In the Cauchy distribution, the expected value μ is is easily estimated from the sample mean. The larger the sample, the more accurate the μ estimate. Not so for the sample variance! It is the estimate of the variance, σ^2=E[(x-μ)^2], that does not converge, and consequently the standard deviation (square root of variance) does not converge either.
You can calculate the sample variance, but it does not represent the distribution because the Cauchy distribution does not have a standard distribution (the defining integral for variance does not converge). With a Cauchy distribution, the larger your sample, the larger the calculated variance, without limit! With distributions that do have a variance, the variances estimated from ever larger samples converge to that of the underlying distribution. Cauchy is a classic case of a "tail heavy" distribution.
Also, the article propagates a major computing error in statistical software:
"Thus, the standard deviation is equal to the square root of (the average of the squares less the square of the average). See computational formula for the variance for a proof of this fact, and for an analogous result for the sample standard deviation."
The above is called the "computer's formula", and is in common use because it eliminates the need to make a pass through the data to compute the mean and the need subtract the mean from each sample in a second pass through the data. However, a calculation problem arises when the mean is large compared to the standard deviation. The problem is that you are subtracting two numbers that are very nearly the same-- a major no-no in numerical methods! On your calculator, take the standard deviation of 1111, 1111.1111, and 1111.2222. The exact answer is 0.1111. Some calculators are better than others. There is an simple and accurate algorithm for managing this problem that requires two memory registers, just like the grossly defective computer's formula. Details upon request to
richard1941@gmail.com. — Preceding unsigned comment added by 98.151.182.233 (talk) 04:45, 2 January 2012 (UTC)
-
- The contribution "In the Cauchy distribution, the expected value μ is is easily estimated from the sample mean. The larger the sample, the more accurate the μ estimate" in the above is wrong. It is well-known that if the distribution of a single sample value is Cauchy, the distribution of the sample mean of any number of such (independent) sample values has exactly the sample Cauchy distribution ... ther distribution of the sample mean does not become more concentrated about the population centre and hence saying "The larger the sample, the more accurate the μ estimate" is incorrect. Melcombe (talk) 23:03, 6 January 2012 (UTC)
[edit] Link to Mean
How about linking to an article about the "mean", or at least explaining it. The article assumes the author knows about it.
156.34.68.113 (talk) 17:30, 20 January 2012 (UTC)
- There's a link to 'mean' in the second sentence of the article. Qwfp (talk) 17:33, 20 January 2012 (UTC)
[edit] Standard deviation of the mean should be Standard error
I have never seen the terminology "standard deviation of the mean" used in statistics literature. "Standard deviation of the mean" describes the standard error of a measurement and I think that the section talking about the standard deviation of the mean should be restructured to reflect that. This would avoid confusion for readers unfamiliar with the subject and would redirect them to the standard error page where they could get more information.
In Cntrl (talk) 04:19, 9 February 2012 (UTC)
- The "Standard deviation of the mean" is a well-defined population-based quantity, while "Standard error of the mean" is a sample-based estimate of the "Standard deviation of the mean". See Standard error (statistics)#Standard error of the mean. Melcombe (talk) 01:29, 14 February 2012 (UTC)




. ...