|This article is of interest to the following WikiProjects:|
|This article has an assessment summary page.|
This is a toy example
Why does it say "This is a toy example" in the article? There must be some kind of mistake or vandalism. But I can't quite make sense of it enough to fix it. --Vajrapoppy (talk) 20:24, 26 February 2015 (UTC)
- Why don't you ask User talk:Visnut who added that text in this edit. -- DanielPenfield (talk) 01:13, 27 February 2015 (UTC)
Error: "Doane's formula"
This little part was introduced in [this_edit]
... and it's been wrong ever since. Doane's formula doesn't refer to kurtosis at all (if you're going to link a reference, read it for goodness sake!).
Indeed, clearly the formula has been taken from some other source, because it's been changed in several ways from Doane's paper.
Doane discusses skewness. Since I don't know what reference the given formula actually came from, I am going to change it to Doane's actual formula, and that way formula and reference will match.
Error: Histogram vs. Bar Chart
Most of the images shown on the page are bar charts, not histograms. This could lead to confusion amongst readers. —Preceding unsigned comment added by 126.96.36.199 (talk) 19:10, 8 January 2010 (UTC)
The main difference between bar charts and histograms is that there is no natural separation between the rectangles. The bars in a bar chart are seperated to clarify the seperation of classes. In fact the graphics in the article are histograms, not bar charts. Phill779 (talk) 09:20, 6 April 2011 (UTC)
- I understand what you are saying, no need to rexplain it to me, and a vast number of people may agree with you, but it makes no particular sense for that to be the definition of histogram, it's just plotting points and then arbitrarily drawing rectangles; alternatively, you could connect the points with line segments and call it a line chart, because your definition of bar chart and line chart are the same as your definition of histogram. A much more useful definition of histogram (and what I was taught was a histogram at a Major University(TM) a long time ago) is the one with the widths varying such that the area of each "bar" comes out the same. The benefit of this histogram is that it more meaningfully represents a population density function which is what these charts are trying to represent, representing equally well many points sampled near any meaty part of the distribution, and with a wider bar encompassing the same number of sparser points sampled in low density areas, smoothing out random variation in a natural way. (the distinction in the sets of numbers you apply your histograms vs bar charts to is a valid distinction, but the same distinction applies to my/our definition of histogram also, so it's not a meaningful one, save for the meaning to the datasets.) 188.8.131.52 (talk) 20:25, 20 February 2015 (UTC)
I agree "histogram" is the common term for these things that have bar heights equal to data counts, as shown in the black cherry tree example. But the description describes something else: "A histogram consists of tabular frequencies, shown as adjacent rectangles, erected over discrete intervals (bins), with an area equal to the frequency of the observations in the interval. The height of a rectangle is also equal to the frequency density of the interval, i.e., the frequency divided by the width of the interval. The total area of the histogram is equal to the number of data." One could define a plot this way, but in typical use and in the cherry tree example, the height is equal to the count, and the area is equal to the count times the bin width. — Preceding unsigned comment added by Tplane (talk • contribs) 15:20, 13 July 2011 (UTC)
ERROR in description of census histogram
In this Wikipedia article we read: Table 2 below shows the absolute number of people who responded with travel times "at least 15 but less than 20 minutes" is higher than the numbers for the categories above and below it. This is likely due to people rounding their reported journey time. VERY LIKELY that sentence should be substituted by: Table 2 below shows the absolute number of people who responded with travel times "at least 25 but less than 30 minutes" is lower than the numbers for the categories above and below it. This is likely due to people rounding their reported journey time, because if the exact value is 28 minutes, they have in mind something as "about half of an hour" and put it in the class beginning from 30. — Preceding unsigned comment added by 184.108.40.206 (talk) 09:15, 3 March 2012 (UTC)
The whole "This is likely due to people rounding their reported journey time. The problem of reporting values as somewhat arbitrarily rounded numbers is a common phenomenon when collecting data from people." blather makes it sound like this is a defect in respondents, when actually it's a flaw in the data-collection methodology. The census form presumably gave people a set of bins to chose from or asked them to report their average (which the census bureau then binned). People whose journey time varies from day to day so as to fall sometimes in one bin, sometimes in another, are then obliged to pick one of the applicable bins, as if their journey time didn't vary that much. If asked for an average, most simply haven't gathered the data from which to compute it. Either way, they're going to give an answer which is quite properly rounded according to the rules for giving only as many significant digits of a value as are not rendered invalid by the error bar, here larger than the bin-size. (In binary, a time varying between a quarter hour and thrice that, for example, is properly rounded to a half; respondents are more or less certainly doing exactly this, without thinking consciously in binary, when they round to half an hour.) A better reporting methodology would be to ask respondents (without using technical jargon) for their quickest, slowest and typical journey-times; you'll still get silly reporting issues but at least the respondent won't feel obliged to give an answer they consider wrong. Those analysing the data then have to actually use their brains somewhat to decide how to combine responses with wildly different error bars, but at least they have a more meaningful data-set. -- Eddy 220.127.116.11 (talk) 12:16, 9 October 2013 (UTC)
Interval explanation not complete
The article does not express the interval conventions for bins. For example, the article includes an example with bins "10.5–20.5" and "20.5–33.5", but it is not clear if 20.5 would be counted in the 10.5-20.5 bin or the 20.5-33.5 bin. Essentially, I think the article needs clarifcation about whether the intervals are by convention [X1, X2) or (X1, X2] when interpreting the notation "X1-X2" Thelema418 (talk) 01:09, 15 July 2013 (UTC)
Is the word bin truly an abbreviation for binary, in this context?
At the time of this writing, the article includes the following statement:
- "To construct a histogram, the first step is to "bin" (standing for Binary) the range of values ... ."
Bin is an Old English word. The meaning of the word bin (without reference to binary) aligns precisely with the way in which the word bin is used in describing the creation of a histogram. For example, consider bins containing different sizes of nails in a hardware store, or grain bins. Any number of bins can be used in creating a histogram. Is there support for stating that the word bin in the context of a histogram necessarily comes from the word binary? Mecanoge (talk) 15:18, 23 November 2014 (UTC)