Talk:Histogram

From Wikipedia, the free encyclopedia
Jump to: navigation, search
          This article is of interest to the following WikiProjects:
WikiProject Business (Rated B-class, Low-importance)
WikiProject icon This article is within the scope of WikiProject Business, a collaborative effort to improve the coverage of business articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
B-Class article B  This article has been rated as B-Class on the project's quality scale.
 Low  This article has been rated as Low-importance on the project's importance scale.
 

This article has comments here.

WikiProject Statistics (Rated B-class, Top-importance)
WikiProject icon

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

B-Class article B  This article has been rated as B-Class on the quality scale.
 Top  This article has been rated as Top-importance on the importance scale.
 
WikiProject Mathematics (Rated B-class, High-importance)
WikiProject Mathematics
This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Mathematics rating:
B Class
High Importance
 Field: Probability and statistics
One of the 500 most frequently viewed mathematics articles.
This article has comments.
This article has an assessment summary page.

Error: "Doane's formula"[edit]

This little part was introduced in [this_edit]

... and it's been wrong ever since. Doane's formula doesn't refer to kurtosis at all (if you're going to link a reference, read it for goodness sake!).

Indeed, clearly the formula has been taken from some other source, because it's been changed in several ways from Doane's paper.

Doane discusses skewness. Since I don't know what reference the given formula actually came from, I am going to change it to Doane's actual formula, and that way formula and reference will match.

Glenbarnett (talk) 05:23, 5 April 2013 (UTC)

Error: Histogram vs. Bar Chart[edit]

Most of the images shown on the page are bar charts, not histograms. This could lead to confusion amongst readers. —Preceding unsigned comment added by 129.170.241.32 (talk) 19:10, 8 January 2010 (UTC)

The main difference between bar charts and histograms is that there is no natural separation between the rectangles. The bars in a bar chart are seperated to clarify the seperation of classes. In fact the graphics in the article are histograms, not bar charts. Phill779 (talk) 09:20, 6 April 2011 (UTC)

I agree "histogram" is the common term for these things that have bar heights equal to data counts, as shown in the black cherry tree example. But the description describes something else: "A histogram consists of tabular frequencies, shown as adjacent rectangles, erected over discrete intervals (bins), with an area equal to the frequency of the observations in the interval. The height of a rectangle is also equal to the frequency density of the interval, i.e., the frequency divided by the width of the interval. The total area of the histogram is equal to the number of data." One could define a plot this way, but in typical use and in the cherry tree example, the height is equal to the count, and the area is equal to the count times the bin width. — Preceding unsigned comment added by Tplane (talkcontribs) 15:20, 13 July 2011 (UTC)

"The total area of the histogram is equal to the number of data." does not compute - an area cannot be "equal" to a number. ITOtto (talk) 05:56, 2 April 2013 (UTC)

Algorithm[edit]

I'm not convinced the recent added algorithm is appropriate for the article, In the first place I think an algoritm is not informative, in the second place it does not help much, and in the third place does this algorithm not count for different class widths. Nijdam (talk) 22:55, 1 October 2011 (UTC)

I agree with your points. The article describes the components of the histogram well, but these cannot be stuffed in a simple Python example. But an exhaustive routine (e.g. in R: hist.default) would be off-scope for an encyclopedia article. +mt 08:22, 2 October 2011 (UTC)
I agree too. I'm a big proponent of Python examples where they are essentially pseudocode and where they add something. This doesn't add anything -- you'll use a library to make a histogram if you are programming -- and it gets into details of Python syntax, so isn't just pseudocode. —Ben FrantzDale (talk) 12:54, 3 October 2011 (UTC)

ERROR in description of census histogram[edit]

In this Wikipedia article we read: Table 2 below shows the absolute number of people who responded with travel times "at least 15 but less than 20 minutes" is higher than the numbers for the categories above and below it. This is likely due to people rounding their reported journey time. VERY LIKELY that sentence should be substituted by: Table 2 below shows the absolute number of people who responded with travel times "at least 25 but less than 30 minutes" is lower than the numbers for the categories above and below it. This is likely due to people rounding their reported journey time, because if the exact value is 28 minutes, they have in mind something as "about half of an hour" and put it in the class beginning from 30. — Preceding unsigned comment added by 151.49.204.251 (talk) 09:15, 3 March 2012 (UTC)

The whole "This is likely due to people rounding their reported journey time. The problem of reporting values as somewhat arbitrarily rounded numbers is a common phenomenon when collecting data from people." blather makes it sound like this is a defect in respondents, when actually it's a flaw in the data-collection methodology. The census form presumably gave people a set of bins to chose from or asked them to report their average (which the census bureau then binned). People whose journey time varies from day to day so as to fall sometimes in one bin, sometimes in another, are then obliged to pick one of the applicable bins, as if their journey time didn't vary that much. If asked for an average, most simply haven't gathered the data from which to compute it. Either way, they're going to give an answer which is quite properly rounded according to the rules for giving only as many significant digits of a value as are not rendered invalid by the error bar, here larger than the bin-size. (In binary, a time varying between a quarter hour and thrice that, for example, is properly rounded to a half; respondents are more or less certainly doing exactly this, without thinking consciously in binary, when they round to half an hour.) A better reporting methodology would be to ask respondents (without using technical jargon) for their quickest, slowest and typical journey-times; you'll still get silly reporting issues but at least the respondent won't feel obliged to give an answer they consider wrong. Those analysing the data then have to actually use their brains somewhat to decide how to combine responses with wildly different error bars, but at least they have a more meaningful data-set. -- Eddy 84.215.6.238 (talk) 12:16, 9 October 2013 (UTC)

"Seven Basic Tools of Quality"[edit]

Am I the only person who doesn't care for the prominence given to the "Seven Basic Tools of Quality" phrase. Certainly, histograms may be used in quality control, but the phrase "Seven Basic Tools of Quality" has the distinct feel of being somebody's marketing gimmick for a commercial (or semi-commercial) package of consulting services (something like "Total Quality Management", or "Six Standard Deviations"). Or perhaps someone's written a book called "The Seven Basic Tools of Quality" (by analogy with "The Seven Habits of Highly Effective People" or "The Five Disciplines" or "NASA's 101 Rules" or whatever). In any case, I'm nervous about this kind of sloganeering. What do other people think? RomanSpa (talk) 13:11, 15 December 2012 (UTC)

With regards to the "perhaps someone's written a book", you can click through the link to the article, look at the "references" section and see that in fact Kaoru Ishikawa devoted at least part of a book or two to the subject and the American Society for Quality has a web page devoted to it.
Regardless, "Seven Basic Tools of Quality" appears three times within the histogram article:
Are you objecting to all three mentions? Just the infobox? Just the lead mention? Just the "see also" mention? -- DanielPenfield (talk) 13:40, 15 December 2012 (UTC)
I agree with RomanSpa. The "Seven Basic Tools of Quality" could be listed once, perhaps, in an applications section. I think the caption on the graph at the topic is really out of place. Histograms are, after all, one of the "basic tools" of any type of data analysis. It is one of the basic tools of Exploratory data analysis, density estimation, distribution fitting, etc. but there is not room to provide a slogan for all of these. When did histogram get taken over by quality control? Mathstat (talk) 22:02, 16 December 2012 (UTC)
Agree and YesY Removed from the infobox. However the template is Template:Infobox quality tool -can't we use a more generic infobox for statistical charts and diagrams? Is there one? --Cyclopiatalk 17:45, 17 December 2012 (UTC)

Travel times[edit]

The table of travel times does not clearly show the intervals. Nijdam (talk) 11:57, 9 January 2013 (UTC)

Interval explanation not complete[edit]

The article does not express the interval conventions for bins. For example, the article includes an example with bins "10.5–20.5" and "20.5–33.5", but it is not clear if 20.5 would be counted in the 10.5-20.5 bin or the 20.5-33.5 bin. Essentially, I think the article needs clarifcation about whether the intervals are by convention [X1, X2) or (X1, X2] when interpreting the notation "X1-X2" Thelema418 (talk) 01:09, 15 July 2013 (UTC)