# Talk:Outlier

WikiProject Statistics (Rated B-class, Mid-importance)

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

B  This article has been rated as B-Class on the quality scale.
Mid  This article has been rated as Mid-importance on the importance scale.

## Renee Masse

What's a Renee Masse? (In the first sentence).

—Preceding unsigned comment added by 212.183.70.147 (talk) 15:04, 15 October 2007 (UTC)

Adopted orphan redirects for Google: inner fence, outer fence, mild outlier, Extreme outlier

In sans-serif font, 1.5 IQR looks like a division. Patrick 11:15 Dec 23, 2002 (UTC)

I did it that way so I wouldn't have to use * or x or &times; for multiplication. What would you prefer? dcljr 13:53 Dec 23, 2002 (UTC)

When using multi-letter variables a multiplication sign avoids ambiguity, and in this case coincidentally there was even a little more ambiguity.

Any of the three is fine with me, &times; is neatest, but more cumbersome to write. - Patrick 14:26 Dec 23, 2002 (UTC)

## Outliers as 2 s.d.s away from men

Is there not a second defintion of outliers, as lying more than two standard deviations away from the mean? Or am I mixing other things up? I am a physicist, and it is a long time since I did "real" statistics... Batmanand | Talk 09:48, 28 September 2006 (UTC)

This is a poor definition of outliers as it changes upon recursion, i.e. the standard deviation is highly dependent on the outliers. Check boxplot for a simple but easy to understand definition that is not distribution dependent.

## Boats

Isn't an outlier (German: Ausleger, Swedish: utliggare) also a supporting 2nd keel for a canoe or sailing boat that makes it almost a catamaran? Hmm... apparently this is called a outrigger on an outrigger canoe in English. Other languages would use "rig" for things that have sails. --LA2 23:04, 1 August 2007 (UTC)

-No, Ausleger is not outlier, that's a false friend. Outrigger, as you say, is the right term. —Preceding unsigned comment added by 212.183.70.147 (talk) 15:07, 15 October 2007 (UTC)

## Definition

I'm opposed to the "Mathematical definition" section in the article. Determining whether an observation is an outlier is ultimately a subjective decision, and any definition based on measures such as standard deviation or interquartile range is completely arbitrary.

If this section need be kept, perhaps it could be renamed? The term "mathematical" implies a logical certainty, which doesn't apply in this case. -3mta3 (talk) 11:49, 14 April 2008 (UTC)

## Methods for identifying outliers

I want to know what is the source of the method of using the Interquantile range mentioned in the text?? Is it just a rule of thumb, or does it have a more objective explanation?--Forich (talk) 21:33, 9 June 2008 (UTC)

It is a popular method, known as "Interquartile range" (IQR), where k = 3 is suposed to identify extreme outliers, and k = 1.5 mild outliers. This method has no scientific basis; it belongs to the category of Mumbo Jumbo methods in statistics.  --Lambiam 04:51, 10 June 2008 (UTC)

i hate wiki —Preceding unsigned comment added by 71.250.133.163 (talk) 21:54, 9 September 2008 (UTC)

## Pointing a citation to its source

The second citation

"2. ^ Grubbs, F. E.: 1969, Procedures for detecting outlying observations in samples. Technometrics 11, 1–21."

I googled it and found it on jstor at http://www.jstor.org/pss/1266761

Would it be a good idea to link directly to the article in the reference section? I didn't know how and I didn't know if that was appropriate or not. It seems appropriate though.

--Ted Wheeland (talk) 21:38, 3 January 2010 (UTC)

## Pronunciation?

How is it spoken ?

Like "Out Lier" or "ootlee-er".109.150.237.200 (talk) 09:46, 6 December 2012 (UTC)

Former. Hear it at http://www.merriam-webster.com/dictionary/outlier Glrx (talk) 22:21, 7 December 2012 (UTC)

## Identifying Outliers

Much of this section is directly plagiarized from A Survey of Outlier Detection Methodologies (2004) by Hodge & Austin. For example,

"Type 1 - Determine the outliers with no prior knowledge of the data. This is essentially a learning approach analogous to unsupervised clustering. The approach processes the data as a static distribution, pinpoints the most remote points, and flags them as potential outliers." is word-for-word identical, as are the definitions of the following 2 types. At the absolute minimum their work should be cited. — Preceding unsigned comment added by 208.105.82.93 (talk) 21:48, 7 March 2013 (UTC)

## Outliers exclusion citation

I think the phrase "Deletion of outlier data is a controversial practice frowned on by many scientists and science instructors" needs some kind of citation or should be reformulated / removed. I mean - why is it controversial? What can happen? Examples of bad things that happend because of outliers exclusion? — Preceding unsigned comment added by 89.120.104.106 (talk) 10:26, 18 July 2013 (UTC)

## Outlier as measurement error

The whole article refers to outliers as possible errors (eg "An outlier may be due to variability in the measurement or it may indicate experimental error"). This is not right or actually incomplete. An outlier can also point at something real going on which is unusual. As a colleague pointed out, it could be your next Nobel prize. If I remember well, NASA had detected the ozone hole in data before Joe Farman published it, but NASA had excluded those. So, I would propose to change the tone of this article and emphasize that statistics helps to identify outliers, which are especially interesting points pointing at measurement errors, unusual distributions (the tails) and possibly new phenomena. — Preceding unsigned comment added by Pjtverheijen (talkcontribs) 05:58, 12 March 2014 (UTC)