Talk:Histogram

From Wikipedia, the free encyclopedia
Jump to: navigation, search
          This article is of interest to the following WikiProjects:
WikiProject Business (Rated B-class, Mid-importance)
WikiProject icon This article is within the scope of WikiProject Business, a collaborative effort to improve the coverage of business articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
B-Class article B  This article has been rated as B-Class on the project's quality scale.
 Mid  This article has been rated as Mid-importance on the project's importance scale.
 
WikiProject Mathematics (Rated B-class, High-importance)
WikiProject Mathematics
This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Mathematics rating:
B Class
High Importance
 Field: Probability and statistics
One of the 500 most frequently viewed mathematics articles.
WikiProject Statistics (Rated B-class, Top-importance)
WikiProject icon

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

B-Class article B  This article has been rated as B-Class on the quality scale.
 Top  This article has been rated as Top-importance on the importance scale.
 

This is a toy example[edit]

Why does it say "This is a toy example" in the article? There must be some kind of mistake or vandalism. But I can't quite make sense of it enough to fix it. --Vajrapoppy (talk) 20:24, 26 February 2015 (UTC)

Why don't you ask User talk:Visnut who added that text in this edit. -- DanielPenfield (talk) 01:13, 27 February 2015 (UTC)

Error: Histogram vs. Bar Chart[edit]

Most of the images shown on the page are bar charts, not histograms. This could lead to confusion amongst readers. —Preceding unsigned comment added by 129.170.241.32 (talk) 19:10, 8 January 2010 (UTC)

The main difference between bar charts and histograms is that there is no natural separation between the rectangles. The bars in a bar chart are seperated to clarify the seperation of classes. In fact the graphics in the article are histograms, not bar charts. Phill779 (talk) 09:20, 6 April 2011 (UTC)

I understand what you are saying, no need to rexplain it to me, and a vast number of people may agree with you, but it makes no particular sense for that to be the definition of histogram, it's just plotting points and then arbitrarily drawing rectangles; alternatively, you could connect the points with line segments and call it a line chart, because your definition of bar chart and line chart are the same as your definition of histogram. A much more useful definition of histogram (and what I was taught was a histogram at a Major University(TM) a long time ago) is the one with the widths varying such that the area of each "bar" comes out the same. The benefit of this histogram is that it more meaningfully represents a population density function which is what these charts are trying to represent, representing equally well many points sampled near any meaty part of the distribution, and with a wider bar encompassing the same number of sparser points sampled in low density areas, smoothing out random variation in a natural way. (the distinction in the sets of numbers you apply your histograms vs bar charts to is a valid distinction, but the same distinction applies to my/our definition of histogram also, so it's not a meaningful one, save for the meaning to the datasets.) 68.173.49.156 (talk) 20:25, 20 February 2015 (UTC)

I agree "histogram" is the common term for these things that have bar heights equal to data counts, as shown in the black cherry tree example. But the description describes something else: "A histogram consists of tabular frequencies, shown as adjacent rectangles, erected over discrete intervals (bins), with an area equal to the frequency of the observations in the interval. The height of a rectangle is also equal to the frequency density of the interval, i.e., the frequency divided by the width of the interval. The total area of the histogram is equal to the number of data." One could define a plot this way, but in typical use and in the cherry tree example, the height is equal to the count, and the area is equal to the count times the bin width. — Preceding unsigned comment added by Tplane (talkcontribs) 15:20, 13 July 2011 (UTC)

It's not a case of "one could define it in this way", this is the definition of a histogram! Only when the bin width equals 1 does the numerical value of the bar height equal the frequency. Incidentally this is a serious problem with all the histograms in this article: the vertical axes are labelled "frequency" or "count", even for non-unity widths. The second paragraph of the article even explicitly warns against doing this: "The vertical axis is not frequency but density...". 129.67.149.107 (talk) 13:34, 18 March 2016 (UTC)

"The total area of the histogram is equal to the number of data." does not compute - an area cannot be "equal" to a number. ITOtto (talk) 05:56, 2 April 2013 (UTC)

ERROR in description of census histogram[edit]

In this Wikipedia article we read: Table 2 below shows the absolute number of people who responded with travel times "at least 15 but less than 20 minutes" is higher than the numbers for the categories above and below it. This is likely due to people rounding their reported journey time. VERY LIKELY that sentence should be substituted by: Table 2 below shows the absolute number of people who responded with travel times "at least 25 but less than 30 minutes" is lower than the numbers for the categories above and below it. This is likely due to people rounding their reported journey time, because if the exact value is 28 minutes, they have in mind something as "about half of an hour" and put it in the class beginning from 30. — Preceding unsigned comment added by 151.49.204.251 (talk) 09:15, 3 March 2012 (UTC)

The whole "This is likely due to people rounding their reported journey time. The problem of reporting values as somewhat arbitrarily rounded numbers is a common phenomenon when collecting data from people." blather makes it sound like this is a defect in respondents, when actually it's a flaw in the data-collection methodology. The census form presumably gave people a set of bins to chose from or asked them to report their average (which the census bureau then binned). People whose journey time varies from day to day so as to fall sometimes in one bin, sometimes in another, are then obliged to pick one of the applicable bins, as if their journey time didn't vary that much. If asked for an average, most simply haven't gathered the data from which to compute it. Either way, they're going to give an answer which is quite properly rounded according to the rules for giving only as many significant digits of a value as are not rendered invalid by the error bar, here larger than the bin-size. (In binary, a time varying between a quarter hour and thrice that, for example, is properly rounded to a half; respondents are more or less certainly doing exactly this, without thinking consciously in binary, when they round to half an hour.) A better reporting methodology would be to ask respondents (without using technical jargon) for their quickest, slowest and typical journey-times; you'll still get silly reporting issues but at least the respondent won't feel obliged to give an answer they consider wrong. Those analysing the data then have to actually use their brains somewhat to decide how to combine responses with wildly different error bars, but at least they have a more meaningful data-set. -- Eddy 84.215.6.238 (talk) 12:16, 9 October 2013 (UTC)

Is the word bin truly an abbreviation for binary, in this context?[edit]

At the time of this writing, the article includes the following statement:

"To construct a histogram, the first step is to "bin" (standing for Binary) the range of values ... ."

Bin is an Old English word. The meaning of the word bin (without reference to binary) aligns precisely with the way in which the word bin is used in describing the creation of a histogram. For example, consider bins containing different sizes of nails in a hardware store, or grain bins. Any number of bins can be used in creating a histogram. Is there support for stating that the word bin in the context of a histogram necessarily comes from the word binary? Mecanoge (talk) 15:18, 23 November 2014 (UTC)

I agree; 'bin' here is a mathematical use of the word not that far removed from the English one. I've removed the misleading parenthetical addition.--JohnBlackburnewordsdeeds 15:41, 23 November 2014 (UTC)

Assessment comment[edit]

The comment(s) below were originally left at Talk:Histogram/Comments, and are posted here for posterity. Following several discussions in past years, these subpages are now deprecated. The comments may be irrelevant or outdated; if so, please feel free to remove this section.

This is a decent article but to reach "good article" status it could do with some general clean-up and other improvements, e.g.:
  • Make clear that although bin widths can vary, in practice this is unusual.
  • Better figures for the "travel time" example — show rectangles not just outlines (some stats software can show histograms for varying bin width) and clearer caption Yes check.svg Done --Qwfp (talk) 13:50, 23 February 2008 (UTC)
  • See also kernel density estimation needs to be in the main text perhaps instead of kernel near the start
  • Better referencing.

This article gets a lot of views so seems worth some effort. I may have an edit myself at some point...

Qwfp (talk) 12:37, 22 February 2008 (UTC)

Last edited at 13:50, 23 February 2008 (UTC). Substituted at 17:58, 29 April 2016 (UTC)