Talk:Median

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Mathematics (Rated B-class, High-importance)
WikiProject Mathematics
This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Mathematics rating:
B Class
High Importance
 Field: Probability and statistics
One of the 500 most frequently viewed mathematics articles.
WikiProject Statistics (Rated B-class, Top-importance)
WikiProject icon

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

B-Class article B  This article has been rated as B-Class on the quality scale.
 Top  This article has been rated as Top-importance on the importance scale.
 

Optimality property[edit]

Does someone know how to demonstrate this optimality property ?

REPLY: This is a typical homework problem, whose solution can be found by visiting standard intermediate textbooks (only some of which have the solution). Kiefer.Wolfowitz (talk) 18:17, 3 June 2009 (UTC)

The posted definition is partly wrong!![edit]

Part 2 about the nonuniqueness is directly contradicted by the first external link at the bottom. Unless someone can find some documentation supporting not interpolating the median then this section needs to be removed. —Preceding unsigned comment added by Jdparker28 (talkcontribs)

Sorry, you fail to make your case. That particular external link is not cogent unless perhaps it's an instructor telling students how he wished exercises done, or otherwise there for some extraneous reason. The basic definition of "median" given in that same external link logically entails that what this article says is right. So do innumerable books. Certainly in some situations it makes sense to interpolate like that, but that doesn't mean the article is wrong to point out non-uniqueness of the median in certain cases. Michael Hardy 19:43, 17 October 2006 (UTC)

Okay, I will show you mine, and you show me yours for better understanding.

Biometry by Sokal and Rolf, 3rd edition, pages 44 to 46, specifically box 4.1

Statisitics Explained by Mckillup 2006, page 74 and 75.

You are correct that I am an instructor telling students how to do exercises and it is very confusing when the posting emphasizes this nonstandard, and as far as I can tell unaccepted interpretation of the definition. Are there two definitions of median?? The middle variable(s) and the middle of the distribution? Even if section 2 of the article does follow logically, I have never seen anyone defy convention and state that there are two medians and to not interpolate. Can you please provide some sort of citaion where this nonuniqueness of the median idea is presented? If not, it might be good to pull it infavour of the interpolating definition to avoid confusing students. If you are correct and there are solid sources then we should add a bit on the interpolating convention in section 2 as a warning.


The best precise definition of "the median" of an even-cardinality multiset is not easy to discern. I have seen several good discussions and I'll try to find some references. The bottom line is that it seems not many practitioners care; but it can matter in automated software -- see the penultimate edition of Numerical Recipes versus the latest edition. I have a quibble about the definition in the Preamble where it says "If there are an even number of observations, one often takes the mean of the two middle values.” – but this last is ambiguous: Is the median of {0,1,1,2,2,2} 1.5, the average of numbers 1 & 2, or, rather and perhaps better, 1.6, the weighted mean of the sub-multiset formed of the two middle values {1,1,2,2,2}? The Preamble perhaps means (sorry!) to say "one often takes the average of the values of the two middle observations". But I like the subtlety of the other definition a little better, somehow; and it ought not to be too hard to construct an example of a discrete distribution limiting to a continuous one, in which the first definition behaves, in the limit, worse than the second. 75.36.232.78 11:19, 11 December 2006 (UTC)

"or, rather and perhaps better, 1.6, the weighted mean of the sub-multiset formed of the two middle values {1,1,2,2,2}"

This is not better, and has weird consequences. For example, the median of {0,1,1,2,2,2} (1.6) is then larger than the median of {0,1,1,2,3,3} (1.333...). 132.230.10.6 (talk) 11:37, 27 August 2012 (UTC)

For some discussions of this issue you might want to read: Hyndman, R. J. and Fan, Y. (1996) Sample quantiles in statistical packages, _American Statistician_, *50*, 361-365. Hadleywickham 03:30, 6 March 2007 (UTC)

The sample median[edit]

Completed the example for an even number of observations as defined for a sample median. If there is a question regarding the uniqueness of a sample median for an even number of observations, please discuss first before changing the information. JackOL31 (talk) 01:55, 22 November 2009 (UTC)

Does anyone know what a "weighted median" is?[edit]

I've heard of a "smoothed weighted median", defined something like this. Given a list of numbers xi for i = 1, ..., n, first consider a bell-shaped curve centered at each one. Take a weighted average of those bell-shaped functions, getting a probability density. Finally, take the median of that distribution. The bell curves are the "smoothing".
But here's my guess as to the general meaning of "weighted median". Assign weights, i.e., non-negative numbers whose sum is 1, to all the numbers in your list; different numbers may carry different weights. Take the median of the resulting discrete probablity distribution. Michael Hardy 21:52, 5 Jan 2005 (UTC)

Median = 50th percentile[edit]

I just wanted to check, the median's equal to the 50th percentile, right? I think this would be helpful to have in the defintion of median (assuming it's right).

Correct. Michael Hardy 23:04, 20 Apr 2005 (UTC)
I think the page currently incorrectly says that median is 50% percentile, rather than 50th percentile. I'm not an expert in statistics, so this is really just a question: Isn't this equivalent to 0.5-quantile (50%-quantile)? -- JKľ 2006-04-24

Efficient Computation[edit]

Do you mean that although sorting time is O(nlogn) the median can be found in O(1) time if the list is sorted? Or does it mean that for an unsorted list the median can be found in O(n) time? I feel this needs clarification in the article 203.33.164.42 02:26, 2 April 2006 (UTC)

Why hasn't this issue been addressed yet? Something must be very wrong if the introduction of the section promises an O(n) algorithm but the following algorithm is O(n log n). --Roerd (talk) 19:19, 20 January 2011 (UTC)

This is Wikipedia. The selection algorithm is desribed in a linked, separate article. Note that Wikipedia does not mean to include pseudocode for all possible algorithms, this is still a dictionary, not a programming book --87.174.120.8 (talk) 08:22, 21 January 2011 (UTC)
Thanks, I understand now. I think it might preferable to move the "Efficient computation" part below the "Easy explanation" part. The way it is now is confusing. Roerd (talk) 23:50, 22 January 2011 (UTC)

Mode[edit]

I think the relation between the median and mode is missing. The 'popular explanation' might actually be misunderstood to describe the mode, and the difference between the two concept is not given. Junuxx 03:35, 1 November 2006 (UTC)


does somebody have a picture? Andries (talk) 19:13, 9 December 2007 (UTC) can you please include and intermediate definition of this. from and inteermidiate student!

Note, however, that c is not always unique, and therefore not well defined in general.[edit]

http://en.wikipedia.org/wiki/Median#Theoretical_properties says "Note, however, that c is not always unique, and therefore not well defined in general." Can someone give me an example where c is not unique? -72.221.120.191 (talk) 10:04, 15 June 2008 (UTC)

Medians in Computer Science[edit]

I recently rewrote the Computer Science median section. The original contribution strayed from the referenced article. The referenced article did not specifically deal with the "even case," and the original editor incorrectly used the term underflow.

The overall implementation of a median in Computer Science still needs a good source. Much appreciation to anyone who can find a good link.

Also, the even/odd cases are not currently dealt with (they weren't explained properly in the original section either).

JoshuaSchaeffer (talk) 23:24, 28 November 2008 (UTC)

I should clarify that the original section was technically correct, except for the use of "underflow." However, the editor took an article about index size overflow when finding the "median" index during a binary search, and applied the same concept to his own median calculation using actual values (not indexes). An additional section about calculating an actual median needs to be added (hopefully with a proper citation :) 68.0.255.35 (talk) 02:47, 29 November 2008 (UTC)

I was somehow logged out, the above IP is mine. My last edit to the Median page was also under this IP. Sorry for the mess. JoshuaSchaeffer (talk) 02:59, 29 November 2008 (UTC)

Wouldn't the proposed alternative median calculation of using A + ((B − A)/2) also overflow, if for instance A is MAXINT? In any case I don't see why the implementation or the possible overflows should be mentioned in this article. 190.245.13.228 (talk) 10:25, 13 April 2009 (UTC)

I agree. This section of the article is so trivial, and the result it gives so unimportant and obvious, that I think it should be cut entirely. Ben Finn (talk) 13:14, 12 May 2009 (UTC)

SUGGESTION: K. C. Kiwiel has written several articles in recent years on median selection (and then with reference to resource-allocation problems, etc.), which are extremely detailed and point out errors in the literature (even by outstanding computer scientists). He is a meticulous programmer and mathematician, and I would recommend his articles to interested readers. Kiefer.Wolfowitz (talk) 18:27, 3 June 2009 (UTC)

An inequality relating means and medians[edit]

The article says For continuous probability distributions, the difference between the median and the mean is less than or equal to one standard deviation. However, I think this holds for any distribution, not just continuous one. —Preceding unsigned comment added by Matumba (talkcontribs) 14:22, 19 January 2009 (UTC)

REPLY: I would suggest looking at the textbook by Casella and Berger, for some discussion of results on the mean-median-mode inequality and the 6 sigma inequality, with references to the literature (e.g. the article by Pukelsheim et alia); see also Joag-Dev and Dharmadhikari's Unimodality, Convexity and Applications. Kiefer.Wolfowitz (talk) 18:21, 3 June 2009 (UTC)

History?? Laplace versus Fechner[edit]

I quote

History[edit]

Gustav Fechner introduced the median into the formal analysis of data.[1]

Laplace was older then Fechner and used the median frequently. Therefore, I suggest using the alternative wording

Laplace used the sample median (to estimate the population median). Gustav Fechner further popularized the median for the statistical analysis of data.[1]

or deleting this section, since I added a short discussion of median-unbiased estimators and the absolute-value loss function (again following the ideas of Laplace).Kiefer.Wolfowitz (talk) 18:22, 3 June 2009 (UTC)

Citation Needed for obvious fact[edit]

"Median is the middle value after arranging data by any order[citation needed]." This obviously follows from the definition, does a definitive source have to be cited in this case? --99.39.111.144 (talk) 23:07, 25 October 2009 (UTC)

Hmm... I'd say "no". To me, the concept of the median is an axiom, so a source is not needed. You don't see citations in the "addition" article neither. But if a source could be provided, it wouldn't hurt.--Nwinther (talk) 09:37, 14 January 2010 (UTC)

Find Mean,Median,Mode in Grouped data[edit]

Who can tell me ? i do not know how to find it. —Preceding unsigned comment added by 219.76.99.194 (talk) 09:15, 3 November 2009 (UTC)

Definition (Nov 2009)[edit]

The first and second paragraphs are mostly incorrect. First, one must understand the difference between the median of a population and the median of a set of observations (aka sample median). This is similar to knowing the difference between the mean of a population [aka Expected value or µ = E(X)] and the mean of a sample [x-bar = Σ(i)/n]. Median of a population is not the same as the median of a sample. The median of a population is given in the section "Medians of probability distributions" and the median need not be unique. However, given an ordered sample of n observations where n is odd, the sample median is defined as the [(n+1)/2]th observation. If n is even, then the sample median is the arithmetic mean of the (n/2)th and [(n/2)+1]th observations. The median of a sample is unique (by definition). Also, the statement, "...The median may not be unique, as there may be a number of observations with the same value occupying the middle range of a distribution" mixes terminology of observations used in samples with members used in distributions. In other words, observations is to sample as members is to distribution. A sample has observations and a population has members. Starting at the second sentence thru the end of the second paragraph, the article needs to have the word distribution changed to sample and population changed to observations. None of it is referring to a distribution (population). The above definitions need to be included in the "Medians in descriptive statistics" with the correct definition of a unique median. I will make the appropriate changes after after discussion and concurrence. JackOL31 (talk) 21:24, 24 November 2009 (UTC)

So much wrong, I got distracted. The statement, "...The median may not be unique, as there may be a number of observations with the same value occupying the middle range of a "distribution" (sic). First, "distribution" should be written as "sample" and the sample median is unique by definition. Secondly, the median will be unique when there are observations of the same value, for example {1, 4, 4, 5} yields a sample median of 4. Maybe I don't know what is being said here. JackOL31 (talk) 11:42, 22 November 2009 (UTC)
Corrected terminology in the "Easy explanation of the sample median" section. The is no "distribution" given nor are we working with numbers. We are working with (sample) observations with measured values, i.e. length of growth, scores for a quiz, etc. —Preceding unsigned comment added by JackOL31 (talkcontribs) 00:35, 22 November 2009 (UTC)
Melcombe did a fantastic job of cleaning this up. JackOL31 (talk) 21:37, 24 November 2009 (UTC)

Computer median, revisited.[edit]

There are a few citation needed markers there, where I'm not sure if they are really needed. I hadn't read the article before, but I find these parts rather obvious. Albeit they could probably be rewritten to be easier to understand.

Let's establish a few things first, to make sure we are at least agreeing on these:

  • in a population of integers, the median may eventually not be an integer:
    For example, the median of {0, 1, 2, 3} on real numbers is by definition 1.5.
    However, if you restrict this to the integer domain, there is no 1.5
  • computer integers are in a finite range, but the range is complete without gaps
  • computer floats are usually just a subset of real numbers (well, usually plus infinity, plus NaN, and with two zeros), with tons of gaps

Are we good on these problems?

The paragraph packed with "citation needed" markers probably tries to provide solutions for all these.

  • Since in the example above 1.5 is not integer, it's not useful. But you can use a "lower median" or "upper median" then, by rounding appropriately (i.e. the "lower integer median" would be 1, the "upper integer median" would be 2.
  • Since computer integers are complete within their range, and the median by definition is neither larger than the largest element nor smaller than the smallest element, an "integer median" of computer integers, is a computer integer. (e.g. in a byte context, the integer median will never be outside the byte value range)
  • computer floating point numbers have precision issues. Depending on where the floating point values lie, the mathematical median may not be representable in computer floating point due to precision limitations. Again there will probably some rounding happen, and you could probably again define a lower and upper median (although I bet most people will just use whatever the FPU computes for them).

The fourth "citation needed" probably is a reminder that you can do this on an arbitrary ordering of the values. I could order numbers by their prime divisor representation, resulting in an order such as, {1, 2, 4, 8, 3, 6, 12, 9, 5, 10, 7, 11, 13} (based on {1, 2, 2*2, 2*2*2, 3, 3*2, 3*2*2, 3*3, 5, 5*2, 7, 11, 13}, which is in fact a well-defined order of the positive integers derived from their ordered prime divisor sequences). This ordered sequence obviously has the median 12, while in the natural order it would be 7.

Anyone up for rewriting this section to make it more understandable? --Chire2 (talk) 17:02, 17 May 2010 (UTC)

A couple of thoughts:
  • In the article for quantiles it is noted that there are number of ways in existing software packages (R, SAS) for computing quantiles. (Disclaimer: I have contributed to that page.) These apply for the first 2-quantile, which we call the median. In particular, the median in your first example need not be 1.5, but could be 1 or 2. Also, I don't think that this is an integer vs. real number issue; For example, if the set were {1.1, 2.2, 3.3, 4.4} then the median could be 2.75, 2.2, or 3.3.
  • Although there are indubitably circumstances in which the median of some integers must be an integer, I suspect that in scientific practice in general, with an even number of values, the mean of the two middle values is used whether or not it is an integer. (But I have no citations to back me up!)
  • While it is true that floating point numbers are not as precise as the mathematical real numbers, I offer that this distinction is not sufficiently relevant for the present article.
  • I like your clarification that the median depends upon the ordering, not only the elements. It especially well highlights the fact that taking the average of the two middle elements is not always sensible, even when the numbers are real numbers.
Just my 2¢. Quantling (talk) 19:30, 17 May 2010 (UTC)
well, the mathematical (or lets say, numerical; see the Mathworld reference in the main article) is defined with the average on a tie situation. I believe that there is a weaker definition of the median, which is about the following:
A median of a set S is an element m such that |\left\{s\in S|s > m\right\}| \leq \frac{|S|}{2} and |\left\{s\in S|s < m\right\}| \leq \frac{|S|}{2}
or in words: A median (note that there could be more than one) is an element such that less than half of the elements are larger and less than half of the element are smaller.
I'd call this a "set theoretic median definition", but I currently do not have a citation for it, it's solely based on my intuition and probably on things I've learned some years ago.
Note that the mathematical median should satisfy this condition. However there can be more than one "median" according to this definition (indeed, in the {1,2,3,4} case any of 2, 2.5, 2.75 3 should satisfy this condition), while in the mathematical definition it is uniquely determined.
One more thing: no, having an integer median for a set of integers CAN be quite important. But in just as many cases, you will require that the median actually is an element of the set itself. In this setting, the set {1, 2, 4, 5} would have two medians, 2 and 4. This happens a lot when the objects are complex and you cannot "interpolate" between objects. Say, we have four players, each with scores and points. We can sort them by their score, then determine the "median" players points. Sounds too constructed? Well, k-means clustering can be generalized to non-metric-spaces by using this kind of median instead of the mean. This is then called "k-median clustering"[2]
P.S. it also works for booleans. The boolean median is basically 0 when there are more 0s and 1 when there are more 1s in the set. Also relevant for computing the mean over computer data to stay in the right domain. Yet another use is data anonymization. When the values don't belong to the original domain, they can easily be identified as anonymized/fake/modified/whatever. --Chire2 (talk) 20:48, 17 May 2010 (UTC)
While closely related to the concept of the medoid - which basically is the set member closest to the median - this median definition suggested here is not the same. For example, given the vector set {(2,3),(3,1),(1,2)} the x-median is 2, the y-median is 2, resulting in a combined median of (2,2); this is not the medoid (either (1,2) or (2,3) is). --87.174.82.90 (talk) 21:14, 17 May 2010 (UTC)

U.S. Census use[edit]

Should mention that U.S. census reports make very extensive use of medians where people might expect averages to be used... AnonMoos (talk) 20:18, 1 November 2010 (UTC)

This is a nice illustration of the 2-dimensional marginal median, in which the median is taken by components. (The marginal median should generally be avoided when the number of observations exceeds the dimension squared, as a rule of thumb). This application is discussed in recent papers on the multivariate median. Just search for "spatial median" and "United States" and "population center".  Kiefer.Wolfowitz 21:48, 7 May 2012 (UTC)

Running median or median filter[edit]

I vaguely recall that a running median or median filter was included by Tukey et al. as part of "exploratory data analysis". Can anyone contribute text and reference for this? The present content covers image processing only, but this would have been for a time-series context. I see that there is an article on median polish, which is somewhat different it seems, and I suppose something on this might also be included here. Melcombe (talk) 13:03, 7 May 2012 (UTC)

Look in 2004(?) Statistical Science, which had a special issue on nonparametric statistics. The last article was on nonparametric multivariate time series. There are some nice articles using spatial medians in chemometrics and image processing, for example.  Kiefer.Wolfowitz 21:52, 7 May 2012 (UTC)
I have found (via google) Evans pdf (1981) on seismology which at least confirms Tukey as a basic origin for running medians and it has a range of interesting results. But I expect there are better sources. Melcombe (talk) 23:22, 7 May 2012 (UTC)

Multivariate median[edit]

A discussion of multivariate extensions of the univariate median is needed. Since this is a statistics article (rather than a computer-science algorithmics article), this should have priority over the extensive discussion of sorting. (I agree that it should come after a discussion of a simple example). I would suggest that a discussion of multivariate medians should discuss the following concepts:

  • Marginal median (Puri and Sen)
  • Spatial median (emphasizing the Euclidean norm: other norms could be mentioned). The proper definition of the spatial median should be given for the population distribution. The current heuristic definition in terms of absolute loss is limited to L1, as noted.
  • Perhaps the Oja median could be discussed. A discussion of data depth functionals, etc., would probably exceed the depth desirable for this article!

I would guess that User:David Eppstein could donate lecture notes or text from an article.

Thanks!  Kiefer.Wolfowitz 16:50, 7 May 2012 (UTC)

Multivariate median should be a separate article. Just the Oja et al. literature alone would fill more than one wikipedia article. In fact, I almost started one but did not have the time to work on it. User:Mathstat/MVMedian If anyone wants to work on it ... Mathstat (talk) 17:30, 7 May 2012 (UTC)
Thanks for your helpful edits, here, as elsewhere, Mathstat. I shall have to look at your draft. (Epstein also has an interest in zonotopes, I'd bet! C.f. Shapley-Folkman lemma).
I think that the marginal median and the spatial median should be mentioned, and their properties briefly noted, in a WP:Summary fashion. I trust that the arithmetic mean notes that it is defined and used for multivariate populations.
The other medians should probably go in another article. I am glad that we agree about Oja, etc. Oja's median is too complicated, to sketch here. You should look at Oja's homepage, which used to have a hilarious, self-deprecating quotation of a slam from a referee! ;)
Best regards,  Kiefer.Wolfowitz 19:09, 7 May 2012 (UTC)
On "spatial median", in any changes/splitting please recall that Spatial median is presently a redirect, somehow using the anchor presently within this article. On more general "multivariate median" changes, do note that there is an early mention of the multivariate case in the section on "An inequality relating means and medians" that may need to be put in a more logical place, or given a forward-reference. Melcombe (talk) 21:32, 7 May 2012 (UTC)
Thank you for the helpful comment, and for catching my stray anchor.  Kiefer.Wolfowitz 21:45, 7 May 2012 (UTC)

Median-unbiased[edit]

Until recent edits, the text in this article was very similar to that in Bias_of_an_estimator#Median_unbiased_estimators, and that here has now been improved. I haven't looked to see what was copied where, but I would suggest that that other aricle is the better place to have a longer outline of median-unbiasedness, with perhaps a reduction in the present article concentrating on whether/why the sample median is median-unbiased rather than median-unbiasedness itself. Melcombe (talk) 09:06, 9 May 2012 (UTC)

Calculating by hand etc.[edit]

I wonder if it would be appropriate to add a couple ways of determining the median (in nonproblematical cases). For example, in an unordered and uncounted list, start with the largest and smallest numbers, and then the next pairs etc. OR, in a spread sheet, take the number of entries, divide by two.... 211.225.33.104 (talk) 01:42, 16 January 2014 (UTC)

Creating a section that is more readable for the mathematically challenged[edit]

Just looked at the entry to learn more but found the piece a bit hard going with a liberal arts education.

Isn't it part of Wikipedia's mission to make information more accessible? An easier to read introduction would help a lot here.


Petercascio (talk) 20:45, 5 April 2014 (UTC)

  1. ^ a b Keynes, John Maynard; A Treatise on Probability (1921), Pt II Ch XVII §5 (p 201).
  2. ^ http://scholar.google.de/scholar?q=k-median+clustering